MED GPT – Use of AI to Transcribe and Summarise GP Consultations
Problem
This study aims to develop a program based on Large Language Model and Natural Language Processing technologies to transcribe and generate clinically accurate and complete summaries of general practice consultations in real-time.Artificial intelligence has seen many recent advancements, making it more accessible and implementable within the healthcare profession. New Large Language Models LLMs such as Chat-GPT and BioBart have presented the possibility of developing tools to handle common administrative and documentation tasks in general practice, with some options already appearing on the market without any peer-reviewed evaluations. A general practitioner in the UK, on average, spends 12.8% of their day on clinical administrative tasks, and greater administrative burden is known to be associated with increased physician burnout.
Approach
This study quantitatively compares 4 Large Language Models (LLM) trained on a bank of 1700 patient-doctor dialogues to develop a model capable of listening in real-time to patient consultations to auto-summarise the consultation. Of the four models, a single best-performing model was selected and assessed on its output of 50 pre-recorded mock consultations; each summary was viewed by two reviewers and marked on a questionnaire based on a modified form of the QNote Score (to assess quality of documentation in electronic health records) and a question assessing whether LLM had captured broader social context if it was discussed during the consultation.
Findings
At the time of writing, the results have yet to be fully completed. From preliminary findings, the evaluation of the three locally run models was stopped early due to the incompleteness of their outputs, with each producing truncated and incomplete summaries. The final model, dubbed 'Med GPT' built on GPT -4's API, will be assessed on whether it can consistently generate accurate and complete summaries of patient consultations without human input or correction. The final evaluation looks at whether the notes were 'unacceptable' due to false information or 'incomplete,' 'partially complete,' or 'fully complete,' as well as looking at the reviewer's notes and topic of the consultation for trends or common issues.
Consequences
This study will discuss whether the findings demonstrate that an LLM/AI-driven model could reliably generate complete and accurate summaries for general practice consultations in real time. The barriers to its implementation are also discussed, such as issues with high energy and infrastructure requirements associated with the use of LLM, as well as concerns around data security for such applications. The possible future of the technology is also discussed, including trials with real patient groups. A limitation of the study is the gap in GP practice scenarios, including patients with risk of health inequalities due to their social, economic, and wider contextual situation, which potentially could be trained into future AI tools reinforcing bias.