English
Related papers

Related papers: Large Language Models Encode Clinical Knowledge

200 papers

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to…

This paper introduces MedExQA, a novel benchmark in medical question-answering, to evaluate large language models' (LLMs) understanding of medical knowledge through explanations. By constructing datasets across five distinct medical…

Computation and Language · Computer Science 2024-07-04 Yunsoo Kim , Jinge Wu , Yusuf Abdulle , Honghan Wu

Large Language Models (LLMs) have achieved remarkable performance on a wide range of Natural Language Processing (NLP) benchmarks, often surpassing human-level accuracy. However, their reliability in high-stakes domains such as medicine,…

Clinical problem-solving requires processing of semantic medical knowledge such as illness scripts and numerical medical knowledge of diagnostic tests for evidence-based decision-making. As large language models (LLMs) show promising…

Evaluating large language models (LLMs) in medicine is crucial because medical applications require high accuracy with little room for error. Current medical benchmarks have three main types: medical exam-based, comprehensive medical, and…

Recently, Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering situations, these models frequently struggle…

Computation and Language · Computer Science 2023-08-28 Chaoyi Wu , Weixiong Lin , Xiaoman Zhang , Ya Zhang , Yanfeng Wang , Weidi Xie

The paper introduces a framework for the evaluation of the encoding of factual scientific knowledge, designed to streamline the manual evaluation process typically conducted by domain experts. Inferring over and extracting information from…

Computation and Language · Computer Science 2024-10-21 Magdalena Wysocka , Oskar Wysocki , Maxime Delmas , Vincent Mutel , Andre Freitas

Accurate and efficient question-answering systems are essential for delivering high-quality patient care in the medical field. While Large Language Models (LLMs) have made remarkable strides across various domains, they continue to face…

Computation and Language · Computer Science 2025-01-22 Hang Yang , Hao Chen , Hui Guo , Yineng Chen , Ching-Sheng Lin , Shu Hu , Jinrong Hu , Xi Wu , Xin Wang

There is a lack of benchmarks for evaluating large language models (LLMs) in long-form medical question answering (QA). Most existing medical QA evaluation benchmarks focus on automatic metrics and multiple-choice questions. While valuable,…

Computation and Language · Computer Science 2024-11-21 Pedram Hosseini , Jessica M. Sin , Bing Ren , Bryceton G. Thomas , Elnaz Nouri , Ali Farahanchi , Saeed Hassanpour

The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical…

There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that…

Computation and Language · Computer Science 2024-06-07 Anand Subramanian , Viktor Schlegel , Abhinav Ramesh Kashyap , Thanh-Tung Nguyen , Vijay Prakash Dwivedi , Stefan Winkler

In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver…

We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage,…

Computation and Language · Computer Science 2025-05-23 Bowen Jiang , Runchuan Zhu , Jiang Wu , Zinco Jiang , Yifan He , Junyuan Gao , Jia Yu , Rui Min , Yinfan Wang , Haote Yang , Songyang Zhang , Dahua Lin , Lijun Wu , Conghui He

Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support, which has been demonstrated by their competitive performances…

Computation and Language · Computer Science 2024-11-12 Iñigo Alonso , Maite Oronoz , Rodrigo Agerri

Large language models (LLMs) have excelled across domains, also delivering notable performance on the medical evaluation benchmarks, such as MedQA. However, there still exists a significant gap between the reported performance and the…

Computation and Language · Computer Science 2024-06-06 Yuxuan Zhou , Xien Liu , Chen Ning , Ji Wu

Although large language models (LLMs) have been assessed for general medical knowledge using licensing exams, their ability to support clinical decision-making, such as selecting medical calculators, remains uncertain. We assessed nine…

Computation and Language · Computer Science 2025-03-25 Nicholas Wan , Qiao Jin , Joey Chan , Guangzhi Xiong , Serina Applebaum , Aidan Gilson , Reid McMurry , R. Andrew Taylor , Aidong Zhang , Qingyu Chen , Zhiyong Lu

As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning.…

In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in…

Computation and Language · Computer Science 2024-06-11 Juraj Vladika , Phillip Schneider , Florian Matthes

Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited…

Computation and Language · Computer Science 2025-01-14 Haochun Wang , Sendong Zhao , Zewen Qiang , Zijian Li , Nuwa Xi , Yanrui Du , MuZhen Cai , Haoqiang Guo , Yuhan Chen , Haoming Xu , Bing Qin , Ting Liu

Large language models (LLMs) have demonstrated remarkable performance on various medical benchmarks, but their capabilities across different cognitive levels remain underexplored. Inspired by Bloom's Taxonomy, we propose a…

Computation and Language · Computer Science 2025-06-11 Yuxuan Zhou , Xien Liu , Chenwei Yan , Chen Ning , Xiao Zhang , Boxun Li , Xiangling Fu , Shijin Wang , Guoping Hu , Yu Wang , Ji Wu
‹ Prev 1 2 3 10 Next ›