Area-Level Linkages to Enrich Primary Care Electronic Health Data for Research

Talk Code: 
Tarita Murray-Thomas
Susan Hodgson, Elizabeth Crellin, Kirsty Syder, Sonam Sadarangani, Shivani Padmanabhan
Author institutions: 
Clinical Practice Research Datalink (CPRD), Medicines and Healthcare products Regulatory Agency, London UK


Clinical Practice Research Datalink (CPRD) collects de-identified patient data from a network of general practitioner (GP) practices across the UK. These longitudinal data, encompassing >35 million patient lives, are available for research into drug safety, use of medicines, health policy, health care delivery and disease risk factors. Linkage of these primary care data to a range of health and health-relevant contextual data further enriches their research value. Here we describe area-level linkages recently made available in CPRD and review their application in research studies of drug safety, care utilisation and public health.


Patient and GP practice postcodes were mapped to lower layer super output area (LSOA, England/Wales, average 1,600 population), super output area (SOA, Northern Ireland, average 2,100 population) or datazone (DZ, Scotland, population 500-1,000). Linkage to practice-level Rural-Urban classification was made to support research where access to services, employment and educational opportunities might be an important confounder. Inclusion of the individual domains of the Index of Multiple Deprivation (IMD) - housing, employment, income, access to services, education, crime, and living environment – was implemented to facilitate research requiring a more nuanced adjustment for aspects of material deprivation; within England, correlations between practice-level quintiles of IMD and the IMD domains range from 0.36-0.89. Linkage to Carstairs 2011 Index provided an alternative index of material deprivation, which, unlike IMD, is comparable between England, Wales and Scotland, with application in studies drawing on populations from across these countries.


1615 GP practices contributing to CPRD’s primary care database (January 2019) were linked to these area-level measures, representing 19.5, 18.2, 10.8, and 7.9% of GP practices in England, Wales, Scotland and Northern Ireland, respectively. Since being made available in June 2018, protocols requesting linkage to these area-level variables have included a study looking at the influence of patient and practice-level factors associated with vaccine uptake, and a study assessing the social and demographic characteristics of high cost patients in primary and secondary care. The next release of linkage data will include linkage of these area-level measures to patient postcode for eligible patients, permitting further exploration of the influence of these important contextual variables on population health, care delivery and policy outcomes, at a more granular level.


The research value of electronic health datasets, like those held by CPRD, can be enhanced via linkage to other health and health-relevant datasets. Area-level data can provide a context within which health care is delivered, act as a proxy for socioeconomic status, and support the planning and targeting of health and social care services.

Submitted by: 
Susan Hodgson
Funding acknowledgement: 
All authors are employed full time by CPRD, which receives cost-recovery funding from external organisations for access to research data and services outside the remit of the submitted work