English
Related papers

Related papers: LLM-based HSE Compliance Assessment: Benchmark, Pe…

200 papers

Recent advances in large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking. However, whether LLMs possess genuine fluid intelligence (i.e., the ability to reason abstractly and…

Artificial Intelligence · Computer Science 2025-09-30 Yue Yang , MingKang Chen , Qihua Liu , Mengkang Hu , Qiguang Chen , Gengrui Zhang , Shuyue Hu , Guangtao Zhai , Yu Qiao , Yu Wang , Wenqi Shao , Ping Luo

Recent advancements in reasoning-enhanced large language models (LLMs), such as DeepSeek-R1 and OpenAI-o3, have demonstrated significant progress. However, their application in professional medical contexts remains underexplored,…

Computation and Language · Computer Science 2025-03-11 Pengcheng Qiu , Chaoyi Wu , Shuyu Liu , Weike Zhao , Zhuoxia Chen , Hongfei Gu , Chuanjin Peng , Ya Zhang , Yanfeng Wang , Weidi Xie

Logical reasoning is a fundamental aspect of human intelligence and an essential capability for multimodal large language models (MLLMs). Despite the significant advancement in multimodal reasoning, existing benchmarks fail to…

Artificial Intelligence · Computer Science 2025-05-28 Jiakang Yuan , Tianshuo Peng , Yilei Jiang , Yiting Lu , Renrui Zhang , Kaituo Feng , Chaoyou Fu , Tao Chen , Lei Bai , Bo Zhang , Xiangyu Yue

Structure reasoning is a fundamental capability of large language models (LLMs), enabling them to reason about structured commonsense and answer multi-hop questions. However, existing benchmarks for structure reasoning mainly focus on…

Computation and Language · Computer Science 2025-03-04 Zhuohang Jiang , Pangjing Wu , Ziran Liang , Peter Q. Chen , Xu Yuan , Ye Jia , Jiancheng Tu , Chen Li , Peter H. F. Ng , Qing Li

Large language models (LLMs) show strong performance across natural language processing (NLP), mathematical reasoning, and programming, and recent large reasoning models (LRMs) further emphasize explicit reasoning. Yet their computational…

Artificial Intelligence · Computer Science 2025-10-13 Hyundong Jin , Joonghyuk Hahn , Yo-Sub Han

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a…

Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets or formal mathematical proof systems such as the…

Artificial Intelligence · Computer Science 2025-11-03 Nikolaus Holzer , William Fishell , Baishakhi Ray , Mark Santolucito

We introduce seqBench, a parametrized benchmark for probing sequential reasoning limits in Large Language Models (LLMs) through precise, multi-dimensional control over several key complexity dimensions. seqBench allows systematic variation…

Artificial Intelligence · Computer Science 2025-09-23 Mohammad Ramezanali , Mo Vazifeh , Paolo Santi

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast,…

Machine Learning · Computer Science 2025-06-11 Zhanke Zhou , Xiao Feng , Zhaocheng Zhu , Jiangchao Yao , Sanmi Koyejo , Bo Han

Large language models (LLMs) have achieved strong performance on medical exam-style tasks, motivating growing interest in their deployment in real-world clinical settings. However, clinical decision-making is inherently safety-critical,…

Computation and Language · Computer Science 2026-04-13 Xiaohan Ren , Chenxiao Fan , Wenyin Ma , Hongliang He , Chongming Gao , Xiaoyan Zhao , Fuli Feng

Existing benchmarks for evaluating the clinical reasoning capabilities of large language models (LLMs) often lack a clear definition of "clinical reasoning" as a construct, fail to capture the full breadth of interdependent tasks within a…

Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have…

Artificial Intelligence · Computer Science 2025-09-17 Marylou Fauchard , Florian Carichon , Margarida Carvalho , Golnoosh Farnadi

Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices overwhelmingly report single-run accuracy while…

Artificial Intelligence · Computer Science 2025-12-09 Nearchos Potamitis , Lars Klein , Akhil Arora

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general…

Artificial Intelligence · Computer Science 2024-01-18 Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang Liu

Large language models (LLMs) have demonstrated remarkable progress in understanding long-context inputs. However, benchmarks for evaluating the long-context reasoning abilities of LLMs fall behind the pace. Existing benchmarks often focus…

Computation and Language · Computer Science 2025-11-19 Zhan Ling , Kang Liu , Kai Yan , Yifan Yang , Weijian Lin , Ting-Han Fan , Lingfeng Shen , Zhengyin Du , Jiecao Chen

Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks. While emerging benchmarks have been proposed to evaluate LLMs in various domains such as mathematics and computer…

Artificial Intelligence · Computer Science 2024-10-28 Junnan Dong , Zijin Hong , Yuanchen Bei , Feiran Huang , Xinrun Wang , Xiao Huang

Despite the remarkable advancements and widespread applications of deep neural networks, their ability to perform reasoning tasks remains limited, particularly in domains requiring structured, abstract thought. In this paper, we investigate…

Computation and Language · Computer Science 2025-09-16 Satyam Goyal , Soham Dan

Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited to one-step interactions. Some of the existing…

Machine Learning · Computer Science 2025-12-01 Huanyu Li , Zongyuan Li , Wei Huang , Xian Guo

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going…

Computation and Language · Computer Science 2024-07-16 Anni Zou , Wenhao Yu , Hongming Zhang , Kaixin Ma , Deng Cai , Zhuosheng Zhang , Hai Zhao , Dong Yu

Test-time scaling has significantly improved large language model performance, enabling deeper reasoning to solve complex problems. However, this increased reasoning capability also leads to excessive token generation and unnecessary…

‹ Prev 1 2 3 10 Next ›