Artificial intelligence (AI) feedback improved the quality of physician notes written during patient visits, with better documentation improving the ability of care teams to make diagnoses and plan for patients’ future needs, a new study finds.
Since 2021, NYU Langone Health has been using pattern-recognizing, machine-learning AI systems to grade the quality of doctors’ clinical notes. At the same time, NYU Langone created data informatics dashboards that monitor hundreds of measures of safety and the effectiveness of care. The informatics team over time trained the AI models to track in dashboards how well doctors’ notes achieved the “5 Cs”: completeness, conciseness, contingency planning, correctness, and clinical assessment.
Now, a new case study, published online April 17 in NEJM Catalyst Innovations in Care Delivery, shows how notes improved by AI, in combination with dashboard innovations and other safety initiatives, resulted in an improvement in care quality across four major medical specialties: internal medicine, pediatrics, general surgery, and the intensive care unit.
This includes improvements across the specialties of up to 45 percent in note-based clinical assessments (that is, determining diagnoses) and reasoning (making predictions when diagnoses are unknown). In addition, contingency planning to address patients’ future needs saw improvements of up to 34 percent.
Last year, NYU Langone added to this long-standing effort a newer form of AI that develops likely options for the next word in any sentence based on how billions of people used language on the internet over time. A result of this next-word prediction is that generative AI chatbots like GPT-4 can read physician notes and make suggestions. In a pilot within the case study, the research team supercharged their machine-learning AI model, which can only give physicians a grade on their notes, by integrating a chatbot that added an accurate written narrative of issues with any note.
The NYU Langone case study also showed that GPT-4 or other large language models could provide a method for assessing the 5Cs across medical specialties without specialized training in each. Researchers say that the “generalizability” of GPT-4 for evaluating note quality supports its potential for application at many health systems.
“Our study provides evidence that AI can improve the quality of medical notes, a critical part of caring for patients,” said lead study author Jonah Feldman, MD, medical director of clinical transformation and informatics within NYU Langone’s Medical Center Information Technology (MCIT) Department of Health Informatics. “This is the first large-scale study to show how a healthcare organization can use a combination of AI models to give note feedback that significantly improves care quality.”
National Need
Poor note quality in healthcare has been a growing concern since the enactment of the Health Information Technology for Economic and Clinical Health (HITECH) Act in 2009. The act gave incentives to healthcare systems to switch from paper to electronic health records (EHR), enabling improved patient safety and coordination between healthcare providers.
A side effect of EHR adoption, however, has been that physician clinical notes are now four times longer on average in the United States than in other countries. Such “note bloat” has been shown to make it harder for collaborating clinicians to understand diagnoses described by their colleagues, say the study authors. Issues with note quality has been shown in the field to lead to missed diagnoses and delayed treatments, and there is no universally accepted methodology for measuring it. Further, evaluation of note quality by human peers is time-consuming and hard to scale up to the organizational level, the researchers say.
The effort captured in the new NYU Langone case study outlines a structured approach for organizational development of AI-based note quality measurement, a related system for process improvement, and a demonstration of AI-fostered clinician behavioral change in combination with other safety programs. The study also details how AI-generated note quality measurement helped to foster adoption of standard workflows, a significant driver for quality improvement.
Each of the four medical specialties that participated in the study achieved the institutional goal, which was that more than 75 percent of inpatient history and physical exams and consult notes, were being completed using standardized workflows that drove compliance with quality metrics. This represented an improvement from the previous share of less than 5 percent.
“Our study represents the founding stage of what will undoubtedly be a national trend to leverage cutting-edge tools to ensure clinical documentation of the highest quality—measurably and reproducibly,” said study author Paul A. Testa, MD, JD, MPH, chief medical information officer for NYU Langone. “The clinical note can be a foundational tool—if accurate, accessible, and effective—to truly influence clinical outcomes by meaningfully engaging patients while ensuring documentation integrity.”
Along with Dr. Feldman and Dr. Testa, the current study’s authors from NYU Langone were Katherine Hochman, MD, MBA, Benedict Vincent Guzman, Adam J. Goodman, MD, and Joseph M. Weisstuch, MD.
Media Inquiries
Greg Williams
Phone: 212-404-3500
Gregory.Williams@NYULangone.org