AI can probably solve diagnostic errors in Healthcare. If it inherited the right things from us.
On November 1st 1951, a report was published by the Army Research Laboratory in Fort Knox, Kentucky. According to Report A800086, the extent and degree of cold injuries suffered by Allied Forces in the then-nascent Korean War were unmatched. Even when compared to troops stationed in subarctic regions during World War II. The majority of the report contained a detailed analysis of the contextual factors at the time of injury. What was the time of the day? What was the rank? Race? In what climate did the serviceman previously reside? Many of these were found to be contributory. Context defined each case.
My most treasured possession is a 1951 US Military-issued MQ-1 Field Cap with a thick black angora wool lining that I inherited from my grandfather. He would have been classified as high-risk for cold injury. Based in tropical Singapore for many years, an ethnic minority, enlisted. Like him, the outer shell of his MQ-1 was unusual. Made of thick woven fabric (not the standard smooth cotton). Perhaps a quartermaster’s consideration of context resulted in him being issued a slightly better/warmer kit.
“I don’t know anything about pemphigoid,” Dr. Mark Graber admitted. The admission had been deliberate; meant to create a safe space for his provider audience. Statistic after appalling statistic followed. Internists will misdiagnose 13% of common conditions. Radiologists and pathologists will miss 2 to 5% of important findings. Chart reviews will result in 1 in 20 patients with misdiagnoses. The result? Up to 80, 000 deaths a year in the United States alone. Estimated financial cost of $17 to $100 billion. Diagnostics errors are so prevalent, that it seems statistically very unlikely for us not to have caused one as a healthcare professional or not to have experienced one as a patient or family member, or both.
Graber attributed the vast majority of errors to providers overusing a mental “shortcut” called availability heuristic. Simply put, providers will strongly favor and choose diagnoses that they are most familiar with. The resultant bias is so strong that even when available, important contextual evidence, which might otherwise have led to the correct diagnosis, can become ignored. This results in “premature closure”, because any consideration of other differential diagnoses is shut down before it even begins.
Is this where AI comes to the rescue?
The idea that AI can help providers make more accurate diagnoses is not a new one. The field of Computer-Aided Diagnosis (CAD) had begun in the 1950s and has continued since. EKG machines have had automated diagnostic capabilities since the 1970s. Yet, some providers remain suspicious of CAD due to the conventional wisdom that humans are better at qualitative analysis. This is quickly eroding. Clinical studies conclusively demonstrating AI to match — in some cases even surpass — human experts at making accurate diagnoses are becoming commonplace in fields as disparate as radiology and dermatology.
Such recent advancements have been made possible by modern machine learning theory which seeks to mimic the human brain down to individual neuron. Yet, by making our AI models in our image, we also allow them to develop our cognitive biases and flaws. Neural networks don’t inherently get rid of things like availability heuristic dependency and premature closure. In fact, just the opposite can happen. They can make them even more efficient and even more pronounced. They can make them worse.
The key is this. We need to teach our AI models to appropriately and extensively consider context.
Thus far, efforts have largely focused on using Natural Language Processing (NLP) technology to tokenize medical records. These word fragments are filtered for medical relevancy. They are then used as “features” to train supervised machine-learning models. So if a medical document contains the features/words “Chest pain” and “Shortness of breath”, the presence of Myocardial Infarction would be suggested and assigned a likelihood score, based on how frequently the AI model saw those words in historical patients known to have MI.
I am not sure that I agree entirely with this one-size-fits-all approach. Doesn’t it lead us back to premature closure?
Contextual data has to be taken in context. Let’s take the example of credit card fraud alerts. Deviation from norm is what triggers an alert. If one usually makes $50 purchases and suddenly makes a $3000 purchase, that is anomalous and should result in an alert. However, if one typically makes $3000 purchases and then makes a $3500 purchase, that is probably within normal limits and should not trigger an alert. Baseline is important.
Perhaps we should think about replacing the concept of Diagnosis Synthesis with the concept of Detecting Anomalous Deviation(s) from Baseline. That changes things quite a bit. “Chest Pain” and “Shortness of Breath” may not mean Myocardial Infarction at all. If a patient has baseline COPD, these “anomalies” might actually be signs of Pneumonia. Making this shift would profoundly change the underlying machine learning techniques that we would use. Instead of supervised models, we would favor unsupervised techniques such as Cluster Analysis. These are typically built with less assumptions. In other words, they may be less prone to inheriting our cognitive biases and flaws.
Like avoiding cold injuries on the Korean Peninsula in 1951, the contextual consideration of context is paramount in Healthcare. In order for AI to help, we have to make sure it inherits the right things from us.
“Solving problems in Healthcare with AI”:
1. “3 Problems”: https://lnkd.in/eHBZdJQ
2. “The Inheritance”(this article): https://lnkd.in/eB6qQPM
3. “Empathy”: https://lnkd.in/ek7ynM3
4. “One Trillion Dollars”: https://lnkd.in/eGX3Cgb