A.I. Systems Diagnosing Sepsis: Is It Ready for Prime Time?

Related articles

EPIC is arguably THE electronic health record system in the US with the most significant market share (56% of all patient records). Countless millions of federal money have passed into their corporate coffers during our transition to digital record keeping. Artificial intelligence, which is more sizzle than steak, at least medical care has been held out as a grail where the data held in electronic health records could be fashioned to improve medical care. A study from JAMA updates us on that particular marriage.

Sepsis remains one of the most costly and deadly of medical conditions. Sepsis is not a disease per se, but a syndrome, a collection of signs and symptoms, that indicated the presence of an overwhelming infection. Many, if not all, severely ill patients with COVID-19 had viral sepsis. Bacterial causes are more common, but sepsis in all its microbial forms carries a high mortality. Academics have long tortured clinical hospital data to find some statistical means of identifying sepsis or its incipient signs, because early intervention is associated with better outcomes.

With a vast installation based on electronic hospital records and information on patient diagnostics and outcomes, EPIC was ideally suited to develop its own statistical predictor. Using machine learning techniques, 400,000 patients, and data collected every half hour, they created an “add-on” to their software to predict sepsis. The accuracy of its predictions is calculated from a curve that compares true and false-positive predictions using the AUC, the area underneath the curve. A value of 1 indicates perfect separation; in this case, all the cases of sepsis are identified, none are missed, and there are no false alarms. One is a mystical value, and predictive algorithms often have lower values; in the EPIC model, the AUC was between 0.76 and 0.83 – good enough to be helpful. Of course, because EPIC is a corporation, the algorithm is proprietary; it is black box medicine.

Researchers in Michigan are the first to report on how well that algorithm works in the real world. Spoiler alert – its performance is probably better than a medical student, not as good as a resident physician.

Epic Sepsis Model (ESM) scores were calculated for all hospitalized patients in the University of Michigan health system for six months in 2019. EPIC does allow each hospital to “tune” the ESM, increasing its ability to identify all positive cases but at the cost of more false alarms. The researchers used a setting in the mid-range consistent with the value used by the health system. They looked at how ESM scores calculated every 15 minutes accurately predicted the development of sepsis in a hospitalized patient – they reported the AUC for the aggregate of hospitalized patients. Sepsis occurred in 7% of the 28,000 patients with 38,000 admissions during the study period.

The AUC was 0.63, far lower than advertised. Translated into terms we can consider, 95% of the time, it was correct in reporting that a patient was not septic; it was only correct 12% of the time when identifying a septic patient. Moreover, that identification was about 2.5 hours ahead of the determination of sepsis by physicians (as measured by instituted antibiotic therapy or laboratory testing to confirm the presence of sepsis).

Despite only a 7% incidence of sepsis, ESM identified 18% of patients with the condition. These false positives all require investigation. At a minimum, eight patients need to be re-evaluated for each genuine case of sepsis identified. This is a lot of hay to go through to find the needle, even though finding the needle can be lifesaving.

Sepsis is, as I mentioned, a time-sensitive concern – early intervention produces better outcomes. But as the researchers point out, “ESM-driven alerts reflect the presence of sepsis already apparent to clinicians.”

“The increase and growth in deployment of proprietary models has led to an underbelly of confidential, non–peer-reviewed model performance documents that may not accurately reflect real-world model performance. Owing to the ease of integration within the EHR and loose federal regulations, hundreds of US hospitals have begun using these algorithms.” 

The bottom line to the research is that EPIC's sepsis algorithm was found wanting when taken out for a test drive. It is a questionably effective screening tool deployed with no effective, transparent feedback – a medical device set free without any formal post-marketing surveillance. Who will be held liable when a sepsis patient, not identified by the system or an overworked physician or nurse, brings malpractice litigation? Certainly not EPIC; they only offer “clinical support,” responsibility rests with the user. Of course, if the user was aware of how the tool performs, they might not use it at all.

For the safety of our patients, we should insist that these programs update metrics of their positive and negative predictive values in real-time and in real settings. Certainly, the machine learning and other AI techniques applied to creating these algorithms can be focused on measuring their value and should be part and parcel of the package. Our cars have “check engine” lights; shouldn’t our medical algorithms have something similar?


Source: External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients JAMA Internal Medicine DOI:10.1001/jamainternmed.2021.2626