Can dementia risk be predicted using routine electronic health records?

Talk Code: 
Catharine Morgan
Darren M Ashcroft, Evan Kontopantelis, Daniel Stamate, David Reeves
Author institutions: 
The University of Manchester, University of London


Primary care is the main route through which individuals are identified or subsequently diagnosed with dementia by a GP or specialist referral services. Evidence suggests numerous risk factors are associated with development of Dementia and many multi-factorial prognostic dementia risk factor models have been proposed. However, few are based on risk factors routinely captured from electronic patient records (EHR), none incorporate longitudinal trends in health, and none have utilised the potential of Machine Learning (ML) approaches. We aim to develop an improved healthcare record-based tool for estimating patient risk of developing dementia with the opportunity of earlier identification of those at risk.


The Clinical Practice Research Datalink (CPRD) is an anonymised primary care electronic patient record database capturing events from healthcare interactions. We will identify patients aged 60-95 years contributing to CPRD between 01/01/2005 and 31/12/2017 along with the subset of these who received a diagnosis of dementia over the period. Potential predictors will be identified from published systematic reviews, relevant individual research studies, and newly emerging items proposed by dementia experts. Clinical Readcode lists for each candidate risk factor will be developed. Model building, with a randomly selected subset of the cohort, will be approached using both traditional logistic regression analysis and machine learning (ML) techniques. The remaining cohort will be used for model validation.


Between 01/01/2005 and 31/12/2017, 2,005,756 adults aged ≥60 years contributed to CPRD and fulfilled inclusion criteria. Of this cohort, 70,621 (3.4%) were identified as having a dementia diagnosis. From the research literature we have identified 100 plus reported individual risk factors for dementia, broadly classified into demographic and social factors, physical and mental health status, consulting patterns, and treatments received. Specification of each risk factor as a Readcode list is ongoing, with the modelling exercise to begin in the coming months. ML will be carried out by co-investigators of the University of London in parallel to the traditional modelling approach based in the University of Manchester.


A tool for calculating an individual’s 3, 5 and 10 year risk of developing dementia from electronic health records may potentially be used in primary care to identify high-risk patients for early intervention or more detailed assessment. Such a tool is also greatly needed to identify high-risk individuals for invitation into clinical trials of promising treatments. Success in developing a markedly improved tool for the prediction of dementia may also lead to utilising the same techniques to develop improved risk tools for many other health conditions.

Submitted by: 
Catharine Morgan
Funding acknowledgement: 
Alzheimer's Research UK