Home / Diagnostics and Treatment
Home / Medical Technologies

Lookup Helps AI Accurately Code Diagnoses

Lookup Helps AI Accurately Code Diagnoses

A recent investigation by researchers at the Mount Sinai Health System demonstrates that adding a simple lookup step can dramatically increase the precision of AI systems tasked with assigning diagnostic codes, outperforming many clinicians. The results, published in NEJM AI, suggest the method could streamline clinicians’ documentation, reduce billing mistakes, and elevate the overall standard of patient records.

“Earlier work showed the most advanced AI still produced incorrect, often nonsensical codes when it relied entirely on inference,” comments co‑corresponding senior author Eyal Klang, MD, Chief of Generative AI at the Icahn School of Medicine. “This time we let the model review similar past cases before finalizing a code, and that small adjustment led to a strong performance boost.”

Physicians in the United States devote countless hours each week to coding ICD numbers—sequences that capture everything from minor sprains to major cardiac events. Large language models such as ChatGPT usually struggle with the precision needed. The research team addressed this by implementing a “lookup‑before‑coding” protocol: first, the AI expresses the diagnosis in plain language, then selects the most appropriate code from a curated list of real‑world examples.

The study involved 500 emergency‑department visits at Mount Sinai hospitals. For each encounter, the investigators supplied the physician’s note to nine AI variants, ranging from small open‑source systems to larger commercial models. Initially, each model produced a candidate ICD description. Using a retrieval mechanism, the description was matched to ten similar ICD entries drawn from a database of over one million records, including occurrence frequencies. In a second phase, the model leveraged this retrieved data to pick the most accurate ICD label and code.

Independent reviewers—including emergency physicians and two separate AI testing frameworks—assessed the resulting coding without knowing whether the entries came from a human or machine.

Across the board, models that incorporated the retrieval phase outperformed those that did not, and in many instances surpassed physician‑assigned codes. Remarkably, even modest open‑source AI performed well once given access to comparable examples.

“We’re aiming for smarter assistance, not blind automation,” notes co‑corresponding senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health. “If we can cut the time clinicians spend on coding, limit billing errors, and improve data quality—all through an affordable, transparent system—that’s a win for both patients and providers.”

The authors stress that the retrieval‑enhanced approach is meant to support rather than replace human judgment. While not yet approved for billing and tested only on primary diagnoses from discharged emergency visits, the method shows strong clinical promise. Short‑term applications could include real‑time code suggestions in electronic health records or pre‑billing error alerts.

The team is now integrating the technique into Mount Sinai’s electronic health record platform for pilot testing and plans to extend it to other care settings and to include secondary and procedural codes in future iterations.

“AI has the potential to reshape patient care. When technology lifts administrative burdens, clinicians can devote more time to patients—benefiting doctors, patients, and health systems of all sizes,” remarks David L. Reich, MD, Chief Clinical Officer of the Mount Sinai Health System. “Using AI in this way enhances compassionate care, strengthening the foundation of hospitals everywhere.”

More Articles