Related papers: Outdated Issue Aware Decoding for Reasoning Questi…

RECKONING: Reasoning through Dynamic Knowledge Encoding

Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered…

Computation and Language · Computer Science 2023-11-07 Zeming Chen , Gail Weiss , Eric Mitchell , Asli Celikyilmaz , Antoine Bosselut

DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models

With the rapid advancement of neural language models, the deployment of over-parameterized models has surged, increasing the need for interpretable explanations comprehensible to human inspectors. Existing post-hoc interpretability methods,…

Artificial Intelligence · Computer Science 2024-11-08 Zijian Zhang , Vinay Setty , Yumeng Wang , Avishek Anand

DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning

Reinforcement learning with verifiable rewards has emerged as a promising paradigm for enhancing the reasoning capabilities of large language models particularly in mathematics. Current approaches in this domain present a clear trade-off:…

Computation and Language · Computer Science 2026-02-03 Batuhan K. Karaman , Aditya Rawal , Suhaila Shakiah , Mohammad Ghavamzadeh , Mingyi Hong , Arijit Biswas , Ruida Zhou

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing

Knowledge editing, which aims to update the knowledge encoded in language models, can be deceptive. Despite the fact that many existing knowledge editing algorithms achieve near-perfect performance on conventional metrics, the models edited…

Computation and Language · Computer Science 2025-05-20 Jiakuan Xie , Pengfei Cao , Yubo Chen , Kang Liu , Jun Zhao

Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts

The knowledge within large language models (LLMs) may become outdated quickly. While in-context editing (ICE) is currently the most effective method for knowledge editing (KE), it is constrained by the black-box modeling of LLMs and thus…

Computation and Language · Computer Science 2024-05-22 Baolong Bi , Shenghua Liu , Lingrui Mei , Yiwei Wang , Pengliang Ji , Xueqi Cheng

DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models

Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windows, their accuracy often deteriorates as input length grows. In practice, models…

Computation and Language · Computer Science 2026-04-17 Xi Ye , Wuwei Zhang , Fangcong Yin , Howard Yen , Danqi Chen

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for…

Computation and Language · Computer Science 2024-02-01 Wenyue Hua , Jiang Guo , Mingwen Dong , Henghui Zhu , Patrick Ng , Zhiguo Wang

DISCO: Distilling Counterfactuals with Large Language Models

Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at…

Computation and Language · Computer Science 2023-06-07 Zeming Chen , Qiyue Gao , Antoine Bosselut , Ashish Sabharwal , Kyle Richardson

Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing

Knowledge editing aims to efficiently update Large Language Models (LLMs) by modifying specific knowledge without retraining the entire model. Among knowledge editing approaches, in-context editing (ICE) offers a lightweight solution by…

Computation and Language · Computer Science 2025-06-03 Changyue Wang , Weihang Su , Qingyao Ai , Yujia Zhou , Yiqun Liu

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-08 Xilin Jiang , Yinghao Aaron Li , Nima Mesgarani

Knowledge Editing through Chain-of-Thought

Knowledge Editing is a technique that updates large language models (LLMs) with new information to maintain their world knowledge. This approach avoids the need to rebuild the model from scratch, thereby addressing the high costs associated…

Computation and Language · Computer Science 2025-09-09 Changyue Wang , Weihang Su , Qingyao Ai , Yichen Tang , Yiqun Liu

Distilling Reasoning Capabilities into Smaller Language Models

Step-by-step reasoning approaches like chain of thought (CoT) have proved to be very effective in inducing reasoning capabilities in large language models. However, the success of the CoT approach is fundamentally tied to the model size,…

Machine Learning · Computer Science 2023-05-19 Kumar Shridhar , Alessandro Stolfo , Mrinmaya Sachan

Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation

With the rapid development of deep learning, large language models have shown strong capabilities in complex reasoning tasks such as mathematical equation solving. However, their substantial computational and storage costs hinder practical…

Machine Learning · Computer Science 2025-11-25 Fengming Yu , Qingyu Meng , Haiwei Pan , Kejia Zhang

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical…

Computation and Language · Computer Science 2023-06-06 Soumya Sanyal , Yichong Xu , Shuohang Wang , Ziyi Yang , Reid Pryzant , Wenhao Yu , Chenguang Zhu , Xiang Ren

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated…

Computation and Language · Computer Science 2024-05-31 Chengwei Dai , Kun Li , Wei Zhou , Songlin Hu

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method…

Computation and Language · Computer Science 2020-01-24 Kun Xu , Linfeng Song , Yansong Feng , Yan Song , Dong Yu

UNDO: Understanding Distillation as Optimization

Knowledge distillation has emerged as an effective strategy for compressing large language models' (LLMs) knowledge into smaller, more efficient student models. However, standard one-shot distillation methods often produce suboptimal…

Computation and Language · Computer Science 2025-04-04 Kushal Jain , Piyushi Goyal , Kumar Shridhar

Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint

Large language models internalize enormous parametric knowledge during pre-training. Concurrently, realistic applications necessitate external contextual knowledge to aid models on the underlying tasks. This raises a crucial dilemma known…

Artificial Intelligence · Computer Science 2024-07-29 Xiaowei Yuan , Zhao Yang , Yequan Wang , Shengping Liu , Jun Zhao , Kang Liu

DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge

Commonsense knowledge is crucial for artificial intelligence systems to understand natural language. Previous commonsense knowledge acquisition approaches typically rely on human annotations (for example, ATOMIC) or text generation models…

Computation and Language · Computer Science 2021-02-19 Tianqing Fang , Hongming Zhang , Weiqi Wang , Yangqiu Song , Bin He

DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

Combinatorial Optimization (CO) problems are fundamentally important in numerous real-world applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite recent…

Artificial Intelligence · Computer Science 2025-06-10 Hang Zhao , Kexiong Yu , Yuhang Huang , Renjiao Yi , Chenyang Zhu , Kai Xu