Related papers: Discovering High Level Patterns from Simulation Tr…

SimLM: Can Language Models Infer Parameters of Physical Systems?

Several machine learning methods aim to learn or reason about complex physical systems. A common first-step towards reasoning is to infer system parameters from observations of its behavior. In this paper, we investigate the performance of…

Computation and Language · Computer Science 2024-02-07 Sean Memery , Mirella Lapata , Kartic Subr

Playing Psychic: Using Thought Trees to Predict Reasoning Models Accuracy on Coding Tasks

Recent advances in large language models (LLMs) have shown that test-time scaling can substantially improve model performance on complex tasks, particularly in the coding domain. Under this paradigm, models use a larger token budget during…

Artificial Intelligence · Computer Science 2026-04-21 Jiaxin Fang , Runyuan He , Sahil Bhatia , Neel Gajare , Alvin Cheung

Code Simulation Challenges for Large Language Models

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can…

Machine Learning · Computer Science 2024-06-13 Emanuele La Malfa , Christoph Weinhuber , Orazio Torre , Fangru Lin , Samuele Marro , Anthony Cohn , Nigel Shadbolt , Michael Wooldridge

Representing LLMs in Prompt Semantic Task Space

Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant…

Computation and Language · Computer Science 2025-11-13 Idan Kashani , Avi Mendelson , Yaniv Nemcovsky

Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

Large language models (LLMs) are increasingly used as raters for evaluation tasks. However, their reliability is often limited for subjective tasks, when human judgments involve subtle reasoning beyond annotation labels. Thinking traces,…

Artificial Intelligence · Computer Science 2026-02-23 Xingjian Zhang , Tianhong Gao , Suliang Jin , Tianhao Wang , Teng Ye , Eytan Adar , Qiaozhu Mei

Grounding Language Plans in Demonstrations Through Counterfactual Perturbations

Grounding the common-sense reasoning of Large Language Models (LLMs) in physical domains remains a pivotal yet unsolved problem for embodied AI. Whereas prior works have focused on leveraging LLMs directly for planning in symbolic spaces,…

Robotics · Computer Science 2024-12-10 Yanwei Wang , Tsun-Hsuan Wang , Jiayuan Mao , Michael Hagenow , Julie Shah

Large Language Models as Realistic Microservice Trace Generators

Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources. Since real-world traces are hard to obtain, synthetic trace generation is a promising alternative. This paper…

Software Engineering · Computer Science 2025-09-23 Donghyun Kim , Sriram Ravula , Taemin Ha , Alexandros G. Dimakis , Daehyeok Kim , Aditya Akella

Evaluating Spatial Understanding of Large Language Models

Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying…

Computation and Language · Computer Science 2024-04-16 Yutaro Yamada , Yihan Bao , Andrew K. Lampinen , Jungo Kasai , Ilker Yildirim

Explaining Large Language Model-Based Neural Semantic Parsers (Student Abstract)

While large language models (LLMs) have demonstrated strong capability in structured prediction tasks such as semantic parsing, few amounts of research have explored the underlying mechanisms of their success. Our work studies different…

Computation and Language · Computer Science 2023-02-01 Daking Rai , Yilun Zhou , Bailin Wang , Ziyu Yao

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

State-of-the-art supervised NLP models achieve high accuracy but are also susceptible to failures on inputs from low-data regimes, such as domains that are not represented in training data. As an approximation to collecting ground-truth…

Computation and Language · Computer Science 2023-06-29 Parikshit Bansal , Amit Sharma

Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL

Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can…

Computation and Language · Computer Science 2024-05-13 Ning Cheng , Zhaohui Yan , Ziming Wang , Zhijie Li , Jiaming Yu , Zilong Zheng , Kewei Tu , Jinan Xu , Wenjuan Han

Unlocking Reasoning Capability on Machine Translation in Large Language Models

Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning. However, their impact on machine translation (MT) remains underexplored. We…

Computation and Language · Computer Science 2026-02-17 Sara Rajaee , Sebastian Vincent , Alexandre Berard , Marzieh Fadaee , Kelly Marchisio , Tom Kocmi

Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

Large language models (LLMs) have demonstrated remarkable capabilities in handling complex dialogue tasks without requiring use case-specific fine-tuning. However, analyzing live dialogues in real-time necessitates low-latency processing…

Computation and Language · Computer Science 2025-03-10 Xuanqing Liu , Luyang Kong , Wei Niu , Afshin Khashei , Belinda Zeng , Steve Johnson , Jon Jay , Davor Golac , Matt Pope

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

In the rapidly evolving field of Explainable Natural Language Processing (NLP), textual explanations, i.e., human-like rationales, are pivotal for explaining model predictions and enriching datasets with interpretable labels. Traditional…

Computation and Language · Computer Science 2025-11-12 Mahdi Dhaini , Juraj Vladika , Ege Erdogan , Zineb Attaoui , Gjergji Kasneci

Language Models Struggle to Use Representations Learned In-Context

Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research: creating an artificial system that can adapt its…

Computation and Language · Computer Science 2026-05-04 Michael A. Lepori , Tal Linzen , Ann Yuan , Katja Filippova

Probing the Difficulty Perception Mechanism of Large Language Models

Large language models (LLMs) are increasingly deployed on complex reasoning tasks, yet little is known about their ability to internally evaluate problem difficulty, which is an essential capability for adaptive reasoning and efficient…

Computation and Language · Computer Science 2025-10-14 Sunbowen Lee , Qingyu Yin , Chak Tou Leong , Jialiang Zhang , Yicheng Gong , Shiwen Ni , Min Yang , Xiaoyu Shen

LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

Most studies on machine learning in sensing systems focus on low-level perception tasks that process raw sensory data within a short time window. However, many practical applications, such as human routine modeling and occupancy tracking,…

Artificial Intelligence · Computer Science 2024-04-01 Xiaomin Ouyang , Mani Srivastava

Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation

Understanding a program's runtime reasoning behavior, meaning how intermediate states and control flows lead to final execution results, is essential for reliable code generation, debugging, and automated reasoning. Although large language…

Software Engineering · Computer Science 2025-12-02 Mohammad Abdollahi , Khandaker Rifah Tasnia , Soumit Kanti Saha , Jinqiu Yang , Song Wang , Hadi Hemmati

Evaluating the Use of LLMs for Documentation to Code Traceability

Large Language Models (LLMs) offer new potential for automating documentation-to-code traceability, yet their capabilities remain underexplored. We present a comprehensive evaluation of LLMs (Claude 3.5 Sonnet, GPT-4o, and o3-mini) in…

Software Engineering · Computer Science 2025-08-08 Ebube Alor , SayedHassan Khatoonabadi , Emad Shihab

Can Large Language Models Understand Symbolic Graphics Programs?

Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models…

Machine Learning · Computer Science 2025-05-28 Zeju Qiu , Weiyang Liu , Haiwen Feng , Zhen Liu , Tim Z. Xiao , Katherine M. Collins , Joshua B. Tenenbaum , Adrian Weller , Michael J. Black , Bernhard Schölkopf