Article

Is That New Procedure Proven? MedTech Billing Codes and Evidence-Based Medicine


Introduced by the AMA in 2001, Category III CPT codes aimed to streamline financial reporting. Instead they became entangled in a politically driven, zero-sum reimbursement game.

In 2001 the American Medical Association’s (AMA) developed Category III Current Procedural Terminology (CPT) billing codes, which were conceived as a pragmatic solution to an evidence-access bottleneck: allowing emerging procedures to be tracked and reimbursed in a limited way while developers generated the stronger clinical data required for full Category I status. They were designed to be temporary tracking codes for emerging medical technologies, enabling physicians to offer a medical procedure without waiting for a permanent, reimbursable Category I code. But nearly two decades in, this well-intentioned system has devolved into a labyrinth of politics, economic maneuvering, and confusion that often sidelines evidence-based medicine. This outcome is not some quirk of bureaucracy. It is a predictable political economy outcome of a system in which lobbying, budget constraints, and revenue protection compete with clinical evidence as the currency of decision-making.

When our team at Consilium Scientific, a nonprofit dedicated to advancing integrity in healthcare and clinical research, received funding from Arnold Ventures to study breakthrough non-pharmaceutical medical technologies, the goal was clear: evaluate clinical evidence behind emerging Category III procedures compared with a matched set of widely adopted, more clinically established Category I procedures. The findings were intended to inform the development of an automated tool to identify all publicly available clinical trials and evaluate the quality and quantity of evidence supporting novel medical devices and diagnostics.

For industry, Category III codes are easier to secure than Category I codes because they require less clinical evidence and move through review more quickly. In principle, that flexibility could help promising procedures gather real-world use data while stronger studies are completed. In practice, however, the pathway often weakens the incentive to run rigorous trials early. Companies can enter billing workflows first and postpone difficult evidence-generation decisions until later, if they make them at all.

As our team moved from data collection to coding decisions, a troubling pattern emerged. Evidence mattered, but it was only one force in a larger system shaped by reimbursement politics, specialty-society incentives, insurer discretion, and legislative pressure. Based on interviews with a former CPT director at the AMA and a former Medicare medical director we found that in practice, coding outcomes often reflect institutional bargaining as much as clinical merit. These interviews present a juxtaposition between a developer and user’s experience with the coding system. That gap between formal evidence standards and real-world decision-making is the central problem this article examines.

Origins: A Bridge for Innovation

Introduced by the AMA in 2001, Category III CPT codes aimed to streamline financial reporting under the Health Insurance Portability and Accountability Act (HIPAA), widely known for its confidentiality and privacy provisions, by replacing the multitude of local payer codes (from commercial payers like Blue Cross Blue Shield, UnitedHealthcare, and state Medicaid and Medicare programs) with a national standard. As such, all payers could use these standardized codes for tracking new procedures and technologies that had not yet been substantiated through research, replacing the patchwork of local codes and facilitating nationwide electronic claims processing and reporting. The vision was clear: give experimental technologies a provisional Category III code, tracking use while evidence catches up. But how well did the original concept work?

The Political Quagmire

In practice, Category III became entangled in Medicare’s zero-sum reimbursement game. Michael Beebe, former CPT director at the AMA and one of the founders of Category III codes, explains that upgrading a code from Category III to Category I is politically-driven: “any new technologies coming into the Medicare program can potentially reduce payments for existing procedures, making medical specialty societies reluctant to move Category III codes into Category I, as this could impact their overall compensation due to budget neutrality rules”. An orthopedic specialty society, Beebe recalls, delayed reclassifying a computer navigation technology for total hip or knee replacements to protect reimbursements for the high-volume procedures the surgeons perform despite strong clinical evidence.

In fact, conversion rates are bleak. Although the AMA does not publish detailed information on conversion rates by year, Consilium estimates that less than 20% of Category III codes ever advance to Category I. Laurence Clark, MD., a former Medicare medical director, is blunt: “Category III codes are very political”. He describes a scenario where proprietary interests lobby to keep codes in temporary status indefinitely, avoiding the evidence requirements of Category I. Some companies even exploit this limbo, using the 5-year window to “rake in what they could and then fold” without investing in rigorous trials.

“So right now, hard decisions are not being made as they are being made in Europe”, says Dr. Clark. The AMA gives a code, but because there is no centralized authority in the US healthcare system, each insurer can make its own decision, resulting in a fragmented and often complicated process. Final coverage decisions are “punted” to the insurers (private or Medicare), who have the ultimate authority, but also experience immense conflicting pressures from legislators to cover an unproven technology for the sake of economic growth (job-creation and state revenue). Dr. Clark recounts being contacted by a Senator’s staff, who urged him to approve coverage for a lab he had previously declined, emphasizing the creation of 200 jobs as a priority over evidence-based evaluation. The introduction of Category III coding was meant to alleviate this type of political pressure: if a company was denied Category I it could potentially receive Category III, which requires less evidence and serves as a shortcut to coverage without “broad acceptance” across the country.

The Evidence Disconnect

It could have been logical to assume that Category III coding could serve as an incentive for evidence generation and subsequently, broader acceptance. But this is not so. According to Beebe, while the codes help collect data that could be used for evidence development, their main intent is to support claims processing, compliance, and monitoring, rather than to directly incentivize or require the generation of new clinical evidence.

Moreover, the 21st Century Cures Act, created to enhance the Medicare position of accelerating development and delivery of medical treatments, eliminated blanket non-coverage policies (to include CAT III coded services) and required a written review of evidence in denial decisions. As a result, many new technologies are now approved for coverage in the absence of specific denial concerns, reducing the incentive for companies to invest in generating additional supportive evidence, Dr. Clark explains.

Category III codes might be easier to obtain because of less stringent clinical data requirements and a shorter review timeframe than for the widely performed but nebulous utilization requirements (e.g. how often a code is billed) creates a “chicken and egg” problem. In order to obtain a Category I code, there must be evidence of widespread use, however, it is difficult to achieve widespread use without first having a code that allows physicians to report and perform the procedure. This creates a cycle where you need a code to drive utilization, but you need utilization to justify the creation of a code. In effect, utilization data becomes a proxy for efficacy, even though “it doesn’t reflect whether the technology works,” Dr. Leeza Osipenko observes. The physicians that go at risk for non-payment to support the widespread use criteria place an economic hurdle on themselves for trying to advance innovation in patient care. For pharmaceuticals, the situation is quite different: whereas for drugs, evidence thresholds are front-loaded before widespread reimbursement, for procedures and devices, evidence is often back-loaded after billing access is granted.

The frustration is palpable. Rather than demanding stronger trials up front, Category III code opens the door for manufacturers to rely on ‘real-world evidence’ – a post-hoc patch for the rigorous testing that should happen before patients are exposed. Even non-FDA approved services can be approved by Medicare under specific circumstances if the service is considered reasonable and necessary for a patient’s condition. “How much is known about safety, clinical performance, clinical validity and clinical utility of medical devices and diagnostic tests before they are cleared for widespread patient use? How much is enough? How would a user know this without investigating detailed reports from the FDA, if these are even available to the public?”, says Dr. Osipenko, who is accustomed to the European perspective of demonstrating evidence before a medical product is reimbursed from her time working at NICE and being a longtime resident and consumer of England’s healthcare system.

Dr. Cleary, who has a US-healthcare perspective, also feels underserved by the current system, where FDA approval gets a device on the market, but a Category III code decides if it can find its way into practice, turning patient use into the evidence base. “When a friend of mine shared that she was considering getting an experimental procedure done, the alarm bells went off in my head even before I learned that it was once a Category III code. Had it been sufficiently tested or did someone lobby for it to get through? Is being given the FDA-stamp of approval enough? I want to be able to celebrate innovation, but at a time when confidence in the agency is wavering, I want to have a second opinion I can trust.”

A Path Forward?

Reforming Category III requires disentangling it from reimbursement politics. Beebe explains that when new medical technologies earn their own billing codes, Medicare’s payment committee (the Relative Value Scale Update Committee or RUC) often rushes to revalue the older, related procedures. Instead, he suggests that the Centers for Medicaid & Medicare Services (CMS) and the RUC wait until the regular 3- to 5-year review cycle to reconsider values. Waiting will reduce the conflict between new medical technologies and payments for existing services and procedures, and will give specialty societies, the RUC and CMS the opportunity to see how the new codes are used.

In our view, the accumulation of sufficient clinical evidence (both the quality of studies and enough patients exposed) should determine whether a technology is worthy of a Category I upgrade. Until then, Category III reflects another aspect of a complicated US healthcare system where good intentions collide with vested interests. Until evidence, not politics, drives the process, this coding system – which can allow poorly evidenced technologies to gain traction and well-evidenced ones to sit on the sidelines – will not be able to reach its full potential to serve the patient.

Share your perspective