Towards earlier diagnosis: can we identifying new predictors of easily missed diagnoses in GP records?

Talk Code: 
Chris Burton
Lisa Iversen, Sohinee Bhattacharya, Dolapo Ayansina, Lucky Saraswat, Derek Sleeman
Author institutions: 
University of Sheffield (CDB), University of Aberdeen (LI,SB,DA,DS), NHS Grampian (LS)


Several conditions are common in specialist care but relatively uncommon in general practice. Some of these - including endometriosis, crohns disease and ankylosing spondylitis – are associated with long delays between first presentation of symptoms and diagnosis. We hypothesised that GP records contain subtle information which could be used to heighten GPs suspicion of these diagnoses. This abstract describes work to identify new predictors from clinical records using the example of endometriosis


We conducted a retrospective case control study, using a database of anonymised electronic records data from 60 GP practices in Scotland between 1994 and 2009. Data contained reasons for consultation, prescriptions and diagnostic codes. We used expert knowledge to identify features, including novel combinations or repetitions of items. We examined the association of both conventional and novel features in the three years before diagnosis of endometriosis by conditional logistic regression. We also examined the appearance of features over time by repeating the regression in different 3 year periods, at increasing distances before the diagnosis. We reported odds ratios at diagnosis and at specified time points before diagnosis.


We included data from 376 Cases of endometriosis and two sets of age and GP practice matched controls (a) 1489 randomly selected women (b) 884 women whose records contained codes indicating consultation for gynaecological symptoms, Data from cases and controls was censored at the data of diagnosis.We identified several novel composite features in the data which were predictive of endometriosis: including pain and menstrual symptoms within the same year (OR 4.3, 95% CI 2.2 to 8.4) and lower gastrointestinal symptoms occurring within 90 days of gynaecological pain (OR 3.1, 95% CI 1.6 to 6.0). The analysis of trends in odds ratio prior to the diagnosis showed contrasting patterns: infertility was only “predictive” within 12-18 months of diagnosis whereas the odds ratios for several pain related features were significant several years before diagnosis. Adding new composite predictors to conventional ones increases the predictive value (area under the curve).


This method of deriving and testing novel composite predictors “enriches” existing data in GP records. It generates new predictors based on patterns in data rather than simply the presence or absence of features. As it builds on expert knowledge it is likely to remain plausible to clinicians in a way which purely data-driven models may not. The next steps are to test this approach in a larger dataset and to design ways to make “nudges” available based on data in the record in a way which is clinically useful in changing diagnostic behaviour.

Submitted by: 
Chris Burton
Funding acknowledgement: 
This work was funded by Chief Scientist Office Health Informatics Challenge (reference HICG/1/25)