How can the Clinical Practice Research Datalink (CPRD) link primary care data from multiple GP software systems without duplication?

Talk Code: 
P2.65
Presenter: 
Rebecca Ghosh
Co-authors: 
Shivani Padmanabhan, Rachael Williams, Puja Myles
Author institutions: 
Clinical Practices Research Datalink (CPRD)

Problem

CPRD provides anonymised UK primary care electronic healthcare records linked to a range of datasets including hospitalisations, death certificates, disease registries and deprivation indices. In December 2017 a new primary care database, CPRD Aurum, based on EMIS software was launched in addition to the long running CPRD GOLD database based on Vision software. Over time general practices undergo structural changes such as closure, merging or splitting and may change software vendor. Therefore, some practices will at different times, have contributed data to both databases. The challenge for CPRD is not only to provide linked datasets to researchers for both databases separately but also to provide de-duplicated linked data to researchers wishing to combine the CPRD GOLD and CPRD Aurum databases.

Approach

Linkage is performed under appropriate governance conditions on patients from consenting practices via a trusted third party (TTP) organisation (NHS Digital) using NHS number, post code, full date of birth and gender. NHS Digital uses these identifiers to link the datasets using a sequential eight stage deterministic algorithm and provide CPRD with anonymised linked cohort files. The practice identifiers are provided so that relevant data from both primary care databases and corresponding linked datasets can be extracted. CPRD creates two separate linkage data sets for CPRD GOLD and CPRD Aurum with metadata detailing linkage validity and match quality. Up-to-date primary care and linked data are provided for currently contributing Vision and EMIS practices. For practices that participated in linkage but stopped contributing data to CPRD, only linked data for patients in the practice at the time of the last data collection is available.

Findings

There were 232 English practices with 6.6 million patients in CPRD Aurum and 411 English practices with 10.6 million patients in CPRD GOLD participating in the linkage scheme. CPRD produced a common linkage practice file containing practices that have moved system providers and were in both CPRD primary care databases. There were 47 practices duplicated in both databases, which when removed, left 596 unique practices and 15.9 million unique patients.

Consequences

To ensure the new CPRD Aurum database is as useful to researchers as the GOLD database, linked data must be provided for both. Use of duplicated patient information may bias results by artificially inflating data provided from duplicated practices. This can be avoided by identifying duplicated practices and removing them, allowing researchers to use the maximum available primary care and linked data regardless of the primary care software system used.

Submitted by: 
Rebecca Ghosh
Funding acknowledgement: 
The Clinical Practice Research Datalink (CPRD) is a governmental, not-for-profit research service, jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA), a part of the Department of Health.