Related papers: ARCHE: A Novel Task to Evaluate LLMs on Latent Rea…

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast,…

Machine Learning · Computer Science 2025-06-11 Zhanke Zhou , Xiao Feng , Zhaocheng Zhu , Jiangchao Yao , Sanmi Koyejo , Bo Han

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using…

Computation and Language · Computer Science 2024-11-26 Seungpil Lee , Woochang Sim , Donghyeon Shin , Wongyu Seo , Jiwon Park , Seokki Lee , Sanha Hwang , Sejin Kim , Sundong Kim

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Large Language Models (LLMs) gain substantial reasoning and decision-making capabilities from thought structures. However, existing methods such as Tree of Thought and Retrieval Augmented Thoughts often fall short in complex tasks due to…

Computation and Language · Computer Science 2024-12-24 Jinghan Zhang , Xiting Wang , Weijieying Ren , Lu Jiang , Dongjie Wang , Kunpeng Liu

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks. While emerging benchmarks have been proposed to evaluate LLMs in various domains such as mathematics and computer…

Artificial Intelligence · Computer Science 2024-10-28 Junnan Dong , Zijin Hong , Yuanchen Bei , Feiran Huang , Xinrun Wang , Xiao Huang

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation

Recent advances in large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking. However, whether LLMs possess genuine fluid intelligence (i.e., the ability to reason abstractly and…

Artificial Intelligence · Computer Science 2025-09-30 Yue Yang , MingKang Chen , Qihua Liu , Mengkang Hu , Qiguang Chen , Gengrui Zhang , Shuyue Hu , Guangtao Zhai , Yu Qiao , Yu Wang , Wenqi Shao , Ping Luo

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Accurate information extraction from specialized texts is a critical challenge for automated rule checking (ARC) in the architecture, engineering, and construction (AEC) domain. While large language models (LLMs) possess strong reasoning…

Computation and Language · Computer Science 2026-01-29 Jian Chen , Jiabao Dou

LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights

We present Legal Argument Reasoning (LAR), a novel task designed to evaluate the legal reasoning capabilities of Large Language Models (LLMs). The task requires selecting the correct next statement (from multiple choice options) in a chain…

Computation and Language · Computer Science 2024-10-18 Odysseas S. Chlapanis , Dimitrios Galanis , Ion Androutsopoulos

Intelligence Analysis of Language Models

In this project, we test the effectiveness of Large Language Models (LLMs) on the Abstraction and Reasoning Corpus (ARC) dataset. This dataset serves as a representative benchmark for testing abstract reasoning abilities, requiring a…

Artificial Intelligence · Computer Science 2024-07-30 Liane Galanti , Ethan Baron

MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification

Leveraging outputs from multiple large language models (LLMs) is emerging as a method for harnessing their power across a wide range of tasks while mitigating their capacity for making errors, e.g., hallucinations. However, current…

Computation and Language · Computer Science 2025-08-05 Ming Pok Ng , Junqi Jiang , Gabriel Freedman , Antonio Rago , Francesca Toni

ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search

Large language models (LLMs) have demonstrated impressive capabilities and are receiving increasing attention to enhance their reasoning through scaling test--time compute. However, their application in open--ended, knowledge--intensive,…

Artificial Intelligence · Computer Science 2025-05-27 Yize Zhang , Tianshu Wang , Sirui Chen , Kun Wang , Xingyu Zeng , Hongyu Lin , Xianpei Han , Le Sun , Chaochao Lu

LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements

Health, Safety, and Environment (HSE) compliance assessment demands dynamic real-time decision-making under complicated regulations and complex human-machine-environment interactions. While large language models (LLMs) hold significant…

Computation and Language · Computer Science 2025-05-30 Jianwei Wang , Mengqi Wang , Yinsi Zhou , Zhenchang Xing , Qing Liu , Xiwei Xu , Wenjie Zhang , Liming Zhu

Enabling Large Language Models to Generate Text with Citations

Large language models (LLMs) have emerged as a widely-used tool for information seeking, but their generated outputs are prone to hallucination. In this work, our aim is to allow LLMs to generate text with citations, improving their factual…

Computation and Language · Computer Science 2023-11-01 Tianyu Gao , Howard Yen , Jiatong Yu , Danqi Chen

Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models

We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blueprints. While LLMs demonstrate…

Computation and Language · Computer Science 2025-03-06 Bar Karov , Dor Zohar , Yam Marcovitz

Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning

Large Language Models (LLMs) have shown impressive performance on complex tasks through Chain-of-Thought (CoT) reasoning. However, conventional CoT relies on explicitly verbalized intermediate steps, which constrains its broader…

Computation and Language · Computer Science 2025-11-04 Xinghao Chen , Anhao Zhao , Heming Xia , Xuan Lu , Hanlin Wang , Yanjun Chen , Wei Zhang , Jian Wang , Wenjie Li , Xiaoyu Shen

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

The context window of large language models (LLMs) has been extended significantly in recent years. However, while the context length that the LLM can process has grown, the capability of the model to accurately reason over that context…

Computation and Language · Computer Science 2024-10-07 Huayang Li , Pat Verga , Priyanka Sen , Bowen Yang , Vijay Viswanathan , Patrick Lewis , Taro Watanabe , Yixuan Su

Modelling and Classifying the Components of a Literature Review

Previous work has demonstrated that AI methods for analysing scientific literature benefit significantly from annotating sentences in papers according to their rhetorical roles, such as research gaps, results, limitations, extensions of…

Computation and Language · Computer Science 2026-02-11 Francisco Bolaños , Angelo Salatino , Francesco Osborne , Enrico Motta

Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models

Large language models (LLMs) can perform reasoning computations both internally within their latent space and externally by generating explicit token sequences like chains of thought. Significant progress in enhancing reasoning abilities…

Computation and Language · Computer Science 2025-04-16 Thilo Hagendorff , Sarah Fabi

Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories

Large Language Models (LLMs) trained via Reinforcement Learning (RL) have recently achieved impressive results on reasoning benchmarks. Yet, growing evidence shows that these models often generate longer but ineffective chains of thought…

Machine Learning · Computer Science 2025-07-02 Jhouben Cuesta-Ramirez , Samuel Beaussant , Mehdi Mounsif

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for…

Computation and Language · Computer Science 2025-06-03 Hieu Tran , Zonghai Yao , Junda Wang , Yifan Zhang , Zhichao Yang , Hong Yu

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Large Language Models (LLMs) have demonstrated significant performance improvements across various cognitive tasks. An emerging application is using LLMs to enhance retrieval-augmented generation (RAG) capabilities. These systems require…

Computation and Language · Computer Science 2025-01-28 Satyapriya Krishna , Kalpesh Krishna , Anhad Mohananey , Steven Schwarcz , Adam Stambler , Shyam Upadhyay , Manaal Faruqui