Identifying and validating pregnancy episodes in primary care electronic health records.
Problem
Large datasets of primary care electronic health records (EHR) such as the Clinical Practice Research Datalink (CPRD) represent a unique opportunity to assess the safety and effectiveness of vaccines and medicines given in pregnancy. However, there are appreciable methodological challenges in identifying the start and end of pregnancies in EHR and determining relative exposure timings. A major advance in this area has been achieved by the London School of Hygiene and Tropical Medicine and the CPRD, using the CPRDs large database of anonymised primary care EHRs. The collaboration has resulted in the production of a Pregnancy Register, comprising approximately 5.7 million pregnancies recorded in the CPRD database, and including the start and end dates of each pregnancy and related outcomes. Validation by comparison to other linked data sources has indicated that pregnancies are well captured within the Register. However, there are ~1 million pregnancies for which no outcome has been determined. The objective of this study is to investigate possible reasons why the algorithm used to generate the Register identifies pregnancy episodes with no associated outcome, and to use this information to attempt to reduce the occurrence of these episodes.
Approach
We identified potential scenarios to explain why pregnancies without determined outcomes appear in the Register. Scenarios are based on the algorithm’s logic and the underlying data structure. Analyses were conducted using an algorithmic approach to query the data (including other data sources linked to CPRD primary care data) and look for supporting evidence for each scenario. Evidence will be tabulated to give a clearer understanding of the impact of the algorithm rules on the identification of pregnancies and their outcomes.
Findings
Thirteen scenarios have been identified which could explain the missing outcomes. The scenarios can be grouped into categories: (i) real pregnancies for which the outcome was not recorded in the primary care record, (ii) ongoing pregnancies at the end of available follow-up, (iii) the patient may not have been pregnant, or (iv) the pregnancy episode may comprise records which in fact correspond to another pregnancy. Initial results have shown evidence to support each category. Results will be presented on the frequency of pregnancies which meet the criteria for each scenario, the numbers of pregnancies with evidence of outcomes in the linked data, and the profile of the pregnancies without outcomes in relation to other pregnancies recorded.
Consequences
We have identified evidence-based scenarios as to why pregnancies without outcomes may appear in the Pregnancy Register. Comparison of the frequency and underlying causes of these will enable the Register to be improved. This important new data resource should enhance continued monitoring of the benefits and risks associated with vaccination and drugs given in pregnancy to support clinical recommendations.