From Language Models to Medical Diagnoses: Assessing the Potential of GPT-4 and GPT-3.5-Turbo in Digital Health
From Language Models to Medical Diagnoses: Assessing the Potential of GPT-4 and GPT-3.5-Turbo in Digital Health

dc.contributor.author | Roos, Jonas | |
dc.contributor.author | Wilhelm, Theresa Isabelle | |
dc.contributor.author | Martin, Ron | |
dc.contributor.author | Kaczmarczyk, Robert | |
dc.date.accessioned | 2025-08-08T10:54:18Z | |
dc.date.available | 2025-08-08T10:54:18Z | |
dc.date.issued | 02.12.2024 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11811/13331 | |
dc.description.abstract | Background: Large language models (LLMs) like GPT-3.5-Turbo and GPT-4 show potential to transform medical diagnostics through their linguistic and analytical capabilities. This study evaluates their diagnostic proficiency using English and German medical examination datasets. Methods: We analyzed 452 English and 637 German medical examination questions using GPT models. Performance metrics included broad and exact accuracy rates for primary and three-model generated guesses, with an analysis of performance against varying question difficulties based on student accuracy rates. Results: GPT-4 demonstrated superior performance, achieving up to 95.4% accuracy when considering approximate similarity in English datasets. While GPT-3.5-Turbo showed better results in English, GPT-4 maintained consistent performance across both languages. Question difficulty was correlated with diagnostic accuracy, particularly in German datasets. Conclusions: The study demonstrates GPT-4's significant diagnostic capabilities and cross-linguistic flexibility, suggesting potential for clinical applications. However, further validation and ethical consideration are necessary before widespread implementation. | en |
dc.format.extent | 13 | |
dc.language.iso | eng | |
dc.rights | Namensnennung 4.0 International | |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.subject | AI | |
dc.subject | LLM | |
dc.subject | medical examination | |
dc.subject | ChatGPT | |
dc.subject.ddc | 004 Informatik | |
dc.subject.ddc | 610 Medizin, Gesundheit | |
dc.title | From Language Models to Medical Diagnoses: Assessing the Potential of GPT-4 and GPT-3.5-Turbo in Digital Health | |
dc.type | Wissenschaftlicher Artikel | |
dc.publisher.name | MDPI | |
dc.publisher.location | Basel | |
dc.rights.accessRights | openAccess | |
dcterms.bibliographicCitation.volume | 2024, vol. 5 | |
dcterms.bibliographicCitation.issue | iss. 4 | |
dcterms.bibliographicCitation.pagestart | 2680 | |
dcterms.bibliographicCitation.pageend | 2692 | |
dc.relation.doi | https://doi.org/10.3390/ai5040128 | |
dcterms.bibliographicCitation.journaltitle | AI | |
ulbbn.pubtype | Zweitveröffentlichung | |
dc.version | publishedVersion | |
ulbbn.sponsorship.oaUnifund | OA-Förderung Universität Bonn |
Files in this item
This item appears in the following Collection(s)
-
Publikationen (4)