Machine learning models are increasingly being used in healthcare settings to predict patient outcomes and identify critical health conditions. However, a recent study from Virginia Tech has revealed that these models are falling short when it comes to detecting key health deteriorations.
Published in Communications Medicine, the study found that machine learning models for in-hospital mortality prediction are failing to recognize 66% of injuries. This means that these models are not able to accurately predict the likelihood of a patient dying in the hospital, which is crucial information for healthcare providers.
Lead researcher Danfeng “Daphne” Yao, a professor in the Department of Computer Science at Virginia Tech, emphasized the importance of accurate predictions in healthcare. “Predictions are only valuable if they can accurately recognize critical patient conditions and alert doctors promptly,” Yao said. “Our study found serious deficiencies in the responsiveness of current machine learning models.”
To conduct the research, Yao and her team collaborated with computer science Ph.D. student Tanmoy Sarkar Pias and other researchers. They developed new testing approaches to evaluate the performance of machine learning models in responding to critical or deteriorating health conditions.
The team found that patient data alone is not sufficient to teach models how to determine future health risks. By calibrating these models with “test patients,” they were able to reveal the true abilities and limitations of the models. This included using techniques like gradient ascent methods and neural activation maps to assess how well the models react to worsening patient conditions.
In addition to in-hospital mortality prediction, the study also evaluated the responsiveness of machine learning models for breast and lung cancer prognosis. The results showed that these models were also failing to generate adequate risk scores for all test cases.
Yao emphasized the need for incorporating medical knowledge into clinical machine learning models to improve their accuracy and effectiveness. She also highlighted the importance of diversifying training data and leveraging synthetic samples to enhance prediction fairness for minority patients.
Moving forward, Yao’s group is actively testing other medical models, including large language models, for their safety and efficacy in time-sensitive clinical tasks like sepsis detection. The goal is to ensure that these AI models are transparent, objective, and capable of protecting people’s lives.
The findings of this study have significant implications for the future of healthcare research using machine learning and artificial intelligence. By addressing the deficiencies in current models, researchers can improve the accuracy and reliability of predictive tools for healthcare providers and ultimately improve patient outcomes.