Learning to Care: using machine learning to improve prediction of COPD admissions

Talk Code: 
Brian McKinstry
Hilary Pinnock, Felix Agakov, Peter Orchard, Anna Agakova, Mary Paterson , Lucy McCloughan Chris Burton, Stuart Anderson
Author institutions: 
University of Edinburgh, Pharmatics LTD


Telehealth aims to predict exacerbations in order to facilitate prompt action to prevent admissions. However, recent randomised trials fail to demonstrate reductions in admissions when telehealth is applied. The systems under trial also caused large numbers of alerts that do not require clinical intervention (false positives). Pilot work suggests that current algorithms, based on simple additive methods, are poor predictors of exacerbations. More advanced algorithms have the potential to estimate risk of COPD-related hospital admissions with better specificity/sensitivity.


Our starting point is a telemonitoring dataset of 133 COPD patients monitored on average for 430 days. We linked this to the patients’ baseline data from the randomised controlled trial (including demographic and assessments of illness severity) and data extracted from their electronic health record. Using this enhanced dataset, we developed a probabilistic machine-learning algorithm to predict next-day admissions due to COPD. The algorithm uses 44 variables extracted from time series telemonitoring measurements. The majority of patients in our dataset have 0-2 admissions due to COPD per annum, so the algorithm is designed to address classification of imbalanced data. We considered the complete­case and imputed scenarios. We evaluated the quality of predictions using 10­fold nested cross-validation. Test folds included only previously unseen patients, and were used neither for feature selection nor for tuning parameters of the predictive algorithm.


We compared our machine-learning algorithm with two standard symptom-counting algorithms. One predicts next-day admission when the number of symptoms observed exceeds a pre-defined threshold. The other predicts admission when two days of elevated symptom-based scores are preceded by two days of normal scores. For both the complete-case and the imputed scenarios, our algorithm demonstrates significant improvements in the prediction of future admissions over these symptom-counting methods. In the imputed scenario, the two standard symptom-counting algorithms result in the test AUC of approximately 0.50, with the slightly different confidence intervals of [CI95% 0.49, 0.50] and [0.46, 0.53]. Our algorithm resulted in the test AUC=0.73 [CI95% 0.67, 0.79].


Our machine-learning algorithm significantly improves the ability of telemonitoring to predict COPD admissions. This offers the potential to improve the effectiveness both of telehealth and COPD self-management. Our new approach also has the potential to include external potential exacerbating factors such as pollution, local weather and circulating viral load data. On-going research includes testing and refining this algorithm using other similar datasets and exploring the impact on the accuracy of the prediction by the addition of contemporaneous meteorological data. Machine learning techniques are likely to be helpful in developing management algorithms in many other conditions where multiple data-points are available.

Submitted by: 
Brian McKinstry
Funding acknowledgement: 
MRC CIC fund