LLMSurgeon: Diagnosing Data Mixture of Large Language Models
Computation & Language · 2026-05 · arXiv:2605.30348
Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang +4
MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings
Computation & Language · 2026-05 · arXiv:2605.30295
Valentina Bui Muti, Eugénie Dulout, Ziquan Fu
Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection
Computation & Language · 2026-05 · arXiv:2605.30274
Yutong Wang, Xuebo Liu, Derek F. Wong, Zhilin Li +5
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
Computation & Language · 2026-05 · arXiv:2605.30260
Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3
Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models
Computation & Language · 2026-05 · arXiv:2605.30251
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5
CommunityFact: A Dynamic, Multilingual, Multi-domain Benchmark for Misinformation Detection in the Wild
Computation & Language · 2026-05 · arXiv:2605.30241
Sahajpreet Singh, Insyirah Mujtahid, Min-Yen Kan, Kokil Jaidka
Do Language Models Track Entities Across State Changes?
Computation & Language · 2026-05 · arXiv:2605.30233
Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya +3
GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German
Computation & Language · 2026-05 · arXiv:2605.30214
Fabian Mewes, Anne Lauscher, Vagrant Gautam
A Dual-Path Architecture for Scaling Compute and Capacity in LLMs
Computation & Language · 2026-05 · arXiv:2605.30202
Markus Frey, Behzad Shomali, Joachim Koehler, Mehdi Ali
Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
Computation & Language · 2026-05 · arXiv:2605.30152
Xiaoze Liu, Ruowang Zhang, Amir H. Abdi, Michel Galley +4
CCS: Clinical Consensus Selection for Radiology Report Generation
Computation & Language · 2026-05 · arXiv:2605.30131
Xi Zhang, Yingshu Li, Zaiqiao Meng, Jake Lever +1
Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking
Computation & Language · 2026-05 · arXiv:2605.30107
Songbo Hu, Yinhong Liu, Ej Zhou, Evgeniia Razumovskaia +4
SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?
Computation & Language · 2026-05 · arXiv:2605.30104
Jiamin Chen, Yidi Wu, Qiexiang Wang, Qianben Chen +5
DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation
Computation & Language · 2026-05 · arXiv:2605.30090
Jiamin Chen, Qianben Chen, Jiawen Zhang, Yidi Wu +4
Adaptive Targeted Dynamic Chunking for Tokenization-Free Hierarchical Model
Computation & Language · 2026-05 · arXiv:2605.30080
Thang Dang, Akira Nakagawa, Kenichi Kobayashi, Koichi Shirahata
UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering
Computation & Language · 2026-05 · arXiv:2605.30076
Yingdong Shi, Ruiming Zhang, Changming Li, Zhiyu Yang +3
HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?
Computation & Language · 2026-05 · arXiv:2605.30058
Weihan Peng, Chenxu Zhang, Qianao Wang, Yuling Shi +6
Who Am I? History-Aware Profiles for Student Simulation in Tutoring Dialogues
Computation & Language · 2026-05 · arXiv:2605.30051
Zhangqi Duan, Shuyan Huang, Alexander Scarlatos, Jaewook Lee +2
Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders
Computation & Language · 2026-05 · arXiv:2605.30022
Pierre-Antoine Lequeu, Camille Barboule, Benjamin Piwowarski
Latent Performance Profiling of Large Language Models
Computation & Language · 2026-05 · arXiv:2605.30018
Tanmoy Chakraborty, Ayan Sengupta, Suparna Bhattacharya, Partha Pratim Chakrabarti +6
Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation
Computation & Language · 2026-05 · arXiv:2605.29992
M. Ali Bayram, Banu Diri, Savaş Yıldırım
Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning
Computation & Language · 2026-05 · arXiv:2605.29971
Zhenghao Herbert Zhou, R. Thomas McCoy, Robert Frank
Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents
Computation & Language · 2026-05 · arXiv:2605.29927
Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1
ExCAM: Explainable Cultural Awareness Metrics
Computation & Language · 2026-05 · arXiv:2605.29897
Christoph Leiter, Haiyue Song, Hour Kaing, Jin Tei +3
Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate
Computation & Language · 2026-05 · arXiv:2605.29889
David Fraile Navarro, Berardino Como, Jialei Sheng, Soundariya Ananthan +1
CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation
Computation & Language · 2026-05 · arXiv:2605.29886
Wenhan Xiao, Ziwei Zhang, Chuanyue Yu, Xingcheng Fu +3
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation
Computation & Language · 2026-05 · arXiv:2605.29861
Chenghao Zhang, Guanting Dong, Yufan Liu, Tong Zhao +1
EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation
Computation & Language · 2026-05 · arXiv:2605.29847
Xin Guan, Xiaomeng Hu, Shen Huang, Zhenyi Wang +5
Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models
Computation & Language · 2026-05 · arXiv:2605.29826
Leijiang Gu, Zhen Zeng, Feng Li, Xinjian Gao +1
ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation
Computation & Language · 2026-05 · arXiv:2605.29791
Yutong Yang, Chenxi Miao, Weikang Li, Yunfang Wu
DySem: Uncovering Dynamic Semantic Components via Multilingual Consensus for Calculating Semantic Textual Similarity
Computation & Language · 2026-05 · arXiv:2605.29751
Kaijie Zheng, Weiqin Wang, Yile Wang, Hui Huang