Related papers: PLSemanticsBench: Large Language Models As Program…

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference

Large Language Models (LLMs) are increasingly being used to automate programming tasks. Yet, LLMs' capabilities in reasoning about program semantics are still inadequately studied, leaving significant potential for further exploration. This…

Programming Languages · Computer Science 2025-05-30 Thanh Le-Cong , Bach Le , Toby Murray

Evaluating Program Semantics Reasoning with Type Inference in System F

Large Language Models (LLMs) are increasingly integrated into the software engineering ecosystem. Their test-time compute (TTC) reasoning capabilities show significant potential for understanding program logic and semantics beyond mere…

Computation and Language · Computer Science 2025-10-22 Yifeng He , Luning Yang , Christopher Castro Gaw Gonzalo , Hao Chen

Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

Large Language Models (LLMs) have been shown to achieve breakthrough performance on complex logical reasoning tasks. Nevertheless, most existing research focuses on employing formal language to guide LLMs to derive reliable reasoning paths,…

Computation and Language · Computer Science 2025-05-23 Jin Jiang , Jianing Wang , Yuchen Yan , Yang Liu , Jianhua Zhu , Mengdi Zhang , Xunliang Cai , Liangcai Gao

On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks

Large language models (LLMs) have shown to be valuable tools for tackling process mining tasks. Existing studies report on their capability to support various data-driven process analyses and even, to some extent, that they are able to…

Databases · Computer Science 2025-05-01 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

Large Language Models are Interpretable Learners

The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack…

Artificial Intelligence · Computer Science 2024-06-26 Ruochen Wang , Si Si , Felix Yu , Dorothea Wiesmann , Cho-Jui Hsieh , Inderjit Dhillon

Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL

Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can…

Computation and Language · Computer Science 2024-05-13 Ning Cheng , Zhaohui Yan , Ziming Wang , Zhijie Li , Jiaming Yu , Zilong Zheng , Kewei Tu , Jinan Xu , Wenjuan Han

LLMs for Relational Reasoning: How Far are We?

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general…

Artificial Intelligence · Computer Science 2024-01-18 Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang Liu

Explaining Large Language Model-Based Neural Semantic Parsers (Student Abstract)

While large language models (LLMs) have demonstrated strong capability in structured prediction tasks such as semantic parsing, few amounts of research have explored the underlying mechanisms of their success. Our work studies different…

Computation and Language · Computer Science 2023-02-01 Daking Rai , Yilun Zhou , Bailin Wang , Ziyu Yao

Evaluating statistical language models as pragmatic reasoners

The relationship between communicated language and intended meaning is often probabilistic and sensitive to context. Numerous strategies attempt to estimate such a mapping, often leveraging recursive Bayesian models of communication. In…

Computation and Language · Computer Science 2023-05-03 Benjamin Lipkin , Lionel Wong , Gabriel Grand , Joshua B Tenenbaum

Can Large Language Models Understand Symbolic Graphics Programs?

Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models…

Machine Learning · Computer Science 2025-05-28 Zeju Qiu , Weiyang Liu , Haiwen Feng , Zhen Liu , Tim Z. Xiao , Katherine M. Collins , Joshua B. Tenenbaum , Adrian Weller , Michael J. Black , Bernhard Schölkopf

Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing

A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play…

Software Engineering · Computer Science 2023-12-11 Robson Santos , Italo Santos , Cleyton Magalhaes , Ronnie de Souza Santos

EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking

As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e.,…

Machine Learning · Computer Science 2025-09-23 Anjiang Wei , Jiannan Cao , Ran Li , Hongyu Chen , Yuhui Zhang , Ziheng Wang , Yuan Liu , Thiago S. F. X. Teixeira , Diyi Yang , Ke Wang , Alex Aiken

CodeMind: Evaluating Large Language Models for Code Reasoning

Large Language Models (LLMs) have been widely used to automate programming tasks. Their capabilities have been evaluated by assessing the quality of generated code through tests or proofs. The extent to which they can reason about code is a…

Software Engineering · Computer Science 2026-04-08 Changshu Liu , Yang Chen , Reyhaneh Jabbarvand

Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text

While large language models (LLMs), such as GPT-3, appear to be robust and general, their reasoning ability is not at a level to compete with the best models trained for specific natural language reasoning problems. In this study, we…

Computation and Language · Computer Science 2023-07-18 Zhun Yang , Adam Ishay , Joohyung Lee

Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?

With the widespread adoption of vibe coding, understanding the reasoning and robustness of Large Language Models (LLMs) is critical for their reliable use in programming tasks. While recent studies assess LLMs' ability to predict program…

Software Engineering · Computer Science 2026-05-08 Pedro Orvalho , Marta Kwiatkowska

LLMs' Understanding of Natural Language Revealed

Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. Despite their utility in a number of downstream NLP tasks, ample research has shown that LLMs are…

Artificial Intelligence · Computer Science 2024-08-05 Walid S. Saba

LLMs Can Also Do Well! Breaking Barriers in Semantic Role Labeling via Large Language Models

Semantic role labeling (SRL) is a crucial task of natural language processing (NLP). Although generative decoder-based large language models (LLMs) have achieved remarkable success across various NLP tasks, they still lag behind…

Computation and Language · Computer Science 2025-06-09 Xinxin Li , Huiyao Chen , Chengjun Liu , Jing Li , Meishan Zhang , Jun Yu , Min Zhang

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some…

Computation and Language · Computer Science 2024-07-03 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

Representing LLMs in Prompt Semantic Task Space

Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant…

Computation and Language · Computer Science 2025-11-13 Idan Kashani , Avi Mendelson , Yaniv Nemcovsky

Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies

Using Large Language Models (LLMs) for Process Mining (PM) tasks is becoming increasingly essential, and initial approaches yield promising results. However, little attention has been given to developing strategies for evaluating and…

Databases · Computer Science 2024-07-01 Alessandro Berti , Humam Kourani , Hannes Hafke , Chiao-Yun Li , Daniel Schuster