A novel, standardised approach to balancing effectiveness, efficiency and utility of surveillance AI prediction models for hospitalised patients using sepsis prediction as an exemplar
Objective: To introduce a novel, standardised approach to evaluating AI prediction models in balancing effectiveness, efficiency and utility, using a sepsis prediction model case study.
Materials and Methods: Retrospective patient data from electronic medical records of 7 public hospitals was used to retrain and evaluate a machine learning sepsis prediction model. Four conventional metrics—area under the receiver operating curve (AUROC), sensitivity, positive predictive value, and specificity—were compared with a novel graphical display integrating metrics of predictive accuracy (effectiveness), alert burden (efficiency) and lead time of alerts relative to clinical events (utility) for different alert thresholds.
Results: The dataset comprised 977,506 inpatient admissions. The novel methodology produced a plot of four vertically aligned graphs that enables decision-makers to identify an alert threshold that optimally balances effectiveness, efficiency and utility (EEU) at the level of an entire admission, and which differs from that derived using conventional metrics.
Discussion: Conventional evaluation metrics do not consider alert timing relative to clinical events and are often applied to different evaluation datasets (sample and admission level), introducing bias and confusion. In contrast, the EEU methodology (i) generates admission level evaluations at different alert thresholds; (ii) measures alert timing relative to clinical events; and (iii) provides a visual display that enables identification of the alert threshold that optimally balances EEU factors.
Conclusion: Evaluations of prediction models for adverse events in hospitalised patients should incorporate the EEU approach in assessing model suitability and selecting alert thresholds.