Yanjun Gao, PhD

Assistant Professor, Biomedical Informatics

Faculty Photo
Graduate School
  • PhD, Pennsylvania State University - University Park Campus (2021)
Languages
English
Department
Biomedical Informatics

Professional Titles

  • Assistant Professor

Research Interests

My research centers on developing and evaluating foundational natural language processing (NLP) methods, particularly large language models (LLMs), to convert complex data, such as electronic health records (EHRs), into actionable insights for improving decision-making in healthcare and beyond. I explore broader questions of how both humans and machines understand and utilize language, aiming to develop systems that not only enhance decision-making across various domains but also contribute to a future where artificial intelligence (AI) is safe, trustworthy, and reliably aligned with human needs.

Publications

  • Zhao X, Blotske K, Cargile M, Tilley A, Murray B, Gao Y, Henry K, Smith SE, Barreto EF, Bauer S, Sohn S, Liu T, Bennett T, Cohen M, Sikora A. Rx-LLM: a benchmarking suite to evaluate safe large language model performance for medication-related tasks. medRxiv. 2025 Dec 30. PubMed PMID: 41404284
  • Croxford E, Gao Y, First E, Pellegrino N, Schnier M, Caskey J, Oguss M, Wills G, Chen G, Dligach D, Churpek MM, Mayampurath A, Liao F, Goswami C, Wong KK, Patterson BW, Afshar M. Evaluating clinical AI summaries with large language models as judges. NPJ Digit Med. 2025 Nov 5;8(1):640. PubMed PMID: 41193667
  • Kruse M, Afshar M, Khatwani S, Mayampurath A, Chen G, Gao Y. Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification. Proc Conf Empir Methods Nat Lang Process. 2025 Nov;2025:30481-30492. PubMed PMID: 41399801
  • Kruse M, Hu S, Derby N, Wu Y, Stonbraker S, Yao B, Wang D, Goldberg E, Gao Y. Large Language Models with Temporal Reasoning for Longitudinal Clinical Summarization and Prediction. Find ACL EMNLP. 2025 Nov;2025:20715-20735. PubMed PMID: 41399802
  • Nycklemoe S, Devarapu S, Gao Y, Carey K, Kuehnel N, Munjal N, Jani P, Churpek M, Dligach D, Afshar M, Mayampurath A. Explaining alerts from a pediatric risk prediction model using clinical text. J Am Med Inform Assoc. 2025 Sep 1;32(9):1445-1453. PubMed PMID: 40700686
  • Sethi R, Caskey J, Gao Y, Churpek MM, Miller TA, Mayampurath A, Afshar ES, Afshar M, Dligach D. Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care. medRxiv. 2025 Aug 12. PubMed PMID: 40832420
  • Eslami B, Afshar M, Tootooni S, Miller TA, Churpek MM, Gao Y, Dligach D. Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study. medRxiv. 2025 Aug 1. PubMed PMID: 40766145
  • Croxford E, Gao Y, Pellegrino N, Wong K, Wills G, First E, Schnier M, Burton K, Ebby C, Gorski J, Kalscheur M, Khalil S, Pisani M, Rubeor T, Stetson P, Liao F, Goswami C, Patterson B, Afshar M. Development and validation of the provider documentation summarization quality instrument for large language models. J Am Med Inform Assoc. 2025 Jun 1;32(6):1050-1060. PubMed PMID: 40323321
  • Croxford E, Gao Y, First E, Pellegrino N, Schnier M, Caskey J, Oguss M, Wills G, Chen G, Dligach D, Churpek MM, Mayampurath A, Liao F, Goswami C, Wong KK, Patterson BW, Afshar M. Automating Evaluation of AI Text Generation in Healthcare with a Large Language Model (LLM)-as-a-Judge. medRxiv. 2025 May 6. PubMed PMID: 40313300
  • Gao Y, Li R, Croxford E, Caskey J, Patterson BW, Churpek M, Miller T, Dligach D, Afshar M. Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study. JMIR AI. 2025 Feb 24;4:e58670. PubMed PMID: 39993309
  • Croxford E, Gao Y, Pellegrino N, Wong K, Wills G, First E, Liao F, Goswami C, Patterson B, Afshar M. Current and future state of evaluation of large language models for medical summarization tasks. Npj Health Syst. 2025;2. PubMed PMID: 40124388
  • Myers S, Miller TA, Gao Y, Churpek MM, Mayampurath A, Dligach D, Afshar M. Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies. J Am Med Inform Assoc. 2025 Feb 1;32(2):357-364. PubMed PMID: 39703187
  • Gao Y, Myers S, Chen S, Dligach D, Miller T, Bitterman DS, Chen G, Mayampurath A, Churpek MM, Afshar M. Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability. JAMIA Open. 2025 Feb;8(1):ooae154. PubMed PMID: 39802674
  • Li R, Gao Y. Anchored answers: Unravelling positional bias in gpt-2’s multiple-choice questions. InFindings of the Association for Computational Linguistics: ACL 2025 2025 Jul (pp. 2439-2465).
  • Afshar M, Caskey J, Gao Y, Churpek MM, Mayampurath A, Dligach D, Sethi R. Large Language Models to Detect Stigmatizing Language in Critically Ill Patients With Substance Use Disorders. American Journal of Respiratory and Critical Care Medicine. 2025 May 16;211(Abstracts):A5385-.
  • Afshar, M., Tootooni, M.S., Mayampurath, A., Miller, T., Churpek, M.M., Gao, Y., Dligach, D. and Eslami, B., 2025. Large Language Model-Derived Digital Twins for Predicting Medication Treatments in the Intensive Care Unit. American Journal of Respiratory and Critical Care Medicine, 211(Abstracts), pp.A7181-A7181.
  • Gao, Yanjun, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A. Miller, Danielle Bitterman, Matthew Churpek, and Majid Afshar. "When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?." Findings of Empirical Methods in Natural Language Processing (EMNLP 2024).
  • Afshar M, Gao Y, Wills G, Wang J, Churpek MM, Westenberger CJ, Kunstman DT, Gordon JE, Goswami C, Liao FJ, Patterson B. Prompt engineering with a large language model to assist providers in responding to patient inquiries: a real-time implementation in the electronic health record. JAMIA Open. 2024 Oct;7(3):ooae080. PubMed PMID: 39166170
  • Afshar M, Gao Y, Gupta D, Croxford E, Demner-Fushman D. On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models. J Biomed Inform. 2024 Sep;157:104707. PubMed PMID: 39142598
  • Croxford E, Gao Y, Patterson B, To D, Tesch S, Dligach D, Mayampurath A, Churpek MM, Afshar M. Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses. medRxiv. 2024 Apr 9. PubMed PMID: 38562730
  • Gao Y, Mahajan D, Uzuner Ö, Yetisgen M. Clinical natural language processing for secondary uses. J Biomed Inform. 2024 Feb;150:104596. PubMed PMID: 38278312
  • Croxford, E., Gao, Y., Pellegrino, N. et al. Current and future state of evaluation of large language models for medical summarization tasks. npj Health Syst. 2, 6 (2025). https://doi.org/10.1038/s44401-024-00011-2
  • Gao Y, Myers S, Chen S, Dligach D, Miller T, Bitterman D, Churpek M, Afshar M. When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?. InFindings of the Association for Computational Linguistics: EMNLP 2024 2024 Nov (pp. 5414-5428).
  • Li R, Gao Y. Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions. arXiv preprint arXiv:2405.03205. 2024 May 6.
  • Chen X, Huang H, Gao Y, Wang Y, Zhao J, Ding K. Learning to Maximize Mutual Information for Chain-of-Thought Distillation. InFindings of the Association for Computational Linguistics ACL 2024 2024 Aug (pp. 6857-6868).
  • Zhou W, Yetisgen M, Afshar M, Gao Y, Savova G, Miller TA. Improving model transferability for clinical note section classification models using continued pretraining. Journal of the American Medical Informatics Association. 2024 Jan 1;31(1):89-97.
  • Eslami B, Afshar M, Tootooni MS, Miller T, Churpek M, Gao Y, Dligach D. Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study. medRxiv. 2024 Dec 28:2024-12.
  • Chen S, Gallifant J, Guevara M, Gao Y, Afshar M, Miller T, Dligach D, Bitterman DS. Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data. arXiv preprint arXiv:2403.19511. 2024 Mar 28
  • Gao Y, Mahajan D, Uzuner Ö, Yetisgen M. Clinical natural language processing for secondary uses. J Biomed Inform. 2024 Feb;150:104596. PubMed PMID: 38278312
  • Majid Afshar, Yanjun Gao, Graham Wills, Jason Wang, Matthew M Churpek, Christa J Westenberger, David T Kunstman, Joel E Gordon, Cherodeep Goswami, Frank J Liao, Brian Patterson, Prompt engineering with a large language model to assist providers in responding to patient inquiries: a real-time implementation in the electronic health record, JAMIA Open, Volume 7, Issue 3, October 2024, ooae080, https://doi.org/10.1093/jamiaopen/ooae080
  • Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, and Ke Ding. 2024. Learning to Maximize Mutual Information for Chain-of-Thought Distillation. In Findings of the Association for Computational Linguistics ACL 2024, pages 6857–6868, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
  • Afshar M, Gao Y, Gupta D, Croxford E, Demner-Fushman D. On the role of the UMLS in supporting diagnosis generation differential diagnoses proposed by Large Language Models. Journal of Biomedical Informatics. 2024 Aug 13:104707.
  • Gao Y, Dligach D, Miller T, Churpek MM, Uzuner O, Afshar M. Progress Note Understanding - Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 shared task. J Biomed Inform. 2023 Jun;142:104346. PubMed PMID: 37061012
  • Zhou W, Dligach D, Afshar M, Gao Y, Miller TA. Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles. Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:125-130. PubMed PMID: 37786810
  • Gao Y, Dligach D, Miller T, Churpek MM, Afshar M. Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes. Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:461-467. PubMed PMID: 37583489
  • Sharma B, Gao Y, Miller T, Churpek MM, Afshar M, Dligach D. Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning. Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023(ClinicalNLP):78-85. PubMed PMID: 37492270
  • Zhou W, Yetisgen M, Afshar M, Gao Y, Savova G, Miller TA. Improving model transferability for clinical note section classification models using continued pretraining. J Am Med Inform Assoc. 2023 Dec 22;31(1):89-97. PubMed PMID: 37725927
  • Gao Y, Dligach D, Christensen L, Tesch S, Laffin R, Xu D, Miller T, Uzuner O, Churpek MM, Afshar M. A scoping review of publicly available language tasks in clinical natural language processing. J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. PubMed PMID: 35923088
  • Yetisgen M, Uzuner O, Gao Y, Mahajan D. Call for papers: Special issue on clinical natural language processing for secondary use applications. J Biomed Inform. 2022 Sep;133:104152. PubMed PMID: 35985622
  • Gao Y, Miller T, Xu D, Dligach D, Churpek MM, Afshar M. Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. Proc Int Conf Comput Ling. 2022 Oct;2022:2979-2991. PubMed PMID: 36268128
  • Gao Y, Dligach D, Miller T, Tesch S, Laffin R, Churpek MM, Afshar M. Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding. LREC Int Conf Lang Resour Eval. 2022 Jun;2022:5484-5493. PubMed PMID: 35939277
  • Khatwani, Saksham, et al. "Brittleness and Promise: Knowledge Graph–Based Reward Modeling for Diagnostic Reasoning." The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance. 2025.
  • Li R, Chen C, Hu Y, Gao Y, Wang X, Yilmaz E. Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation. arXiv preprint arXiv:2505.16415. 2025 May 22.
  • Zhao X, Blotske K, Cargile M, Tilley A, Murray B, Gao Y, Henry K, Smith SE, Barreto EF, Bauer S, Sohn S. Rx-LLM: a benchmarking suite to evaluate safe large language model performance for medication-related tasks. medRxiv. 2025 Dec 2:2025-12.
  • Blotske K, Zhao X, Henry K, Gao Y, Tilley A, Cargile M, Murray B, Smith SE, Barreto EF, Bauer S, Sohn S. Drug-drug interaction identification using large language models. medRxiv. 2025 Dec 4:2025-12.
  • Myers Q, Gao Y. Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes. arXiv preprint arXiv:2506.20822. 2025 Jun 25.
View All (46 Total) View Less

School of Medicine

CU Anschutz

Fitzsimons Building

13001 East 17th Place

Campus Box C290

Aurora, CO 80045


303.724.5375