Computer Science

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment,…

Robotics · Computer Science 2026-05-29 Jusuk Lee , Seungjae Lee , Jonghun Shin , Hoseong Jung , Sungha Kim , Daesol Cho , H. Jin Kim , Jia-Bin Huang , Furong Huang

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination…

Computation and Language · Computer Science 2026-05-29 Yaxin Luo , Jiacheng Cui , Xiaohan Zhao , Xinyi Shang , Jiacheng Liu , Xinyue Bi , Zhaoyi Li , Zhiqiang Shen

Unlocking the Working Memory of Large Language Models for Latent Reasoning

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to autoregressive generation and thereby…

Computation and Language · Computer Science 2026-05-29 Lukas Aichberger , Sepp Hochreiter

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is…

Machine Learning · Computer Science 2026-05-29 Alaa Khamis , Alaa Maalouf

Fairness-Aware Federated Learning with Trajectory Shapley Value

Federated learning is an emerging distributed paradigm that addresses the challenges posed by heterogeneous, privacy-sensitive data. It enables multiple clients to train a model collaboratively by aggregating their local updates at a…

Machine Learning · Computer Science 2026-05-29 Daniel Kuznetsov , Ziqi Wang

COMPOSE: Composing Future Theorems from Citations and Formal Structure

A plausible future mathematical claim must satisfy two constraints: it should follow the direction of prior work and respect the formal dependencies that constrain what can validly follow. Existing approaches typically model only one of…

Computation and Language · Computer Science 2026-05-29 David Busbib , Michael Werman

When, why, and how do diffusion posterior samplers fail? A finite-sample lens

Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement…

Machine Learning · Computer Science 2026-05-29 Benjamin A. Burns , Sara Fridovich-Keil

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamental bottleneck: whether Large Language…

Machine Learning · Computer Science 2026-05-29 Sy-Tuyen Ho , Minghui Liu , Huy Nghiem , Furong Huang

Reasoning with Sampling: Cutting at Decision Points

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power…

Machine Learning · Computer Science 2026-05-29 Felix Zhou , Anay Mehrotra , Quanquan C. Liu

RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world environments. However, current robotic benchmarks primarily emphasize skill-level execution and provide…

Robotics · Computer Science 2026-05-29 Chunru Lin , Hongxin Zhang , Fenghao Yu , Zhehuan Chen , Thomas L. Griffiths , Yejin Choi , David Held , Chuang Gan

In-Context Reward Adaptation for Robust Preference Modeling

Reinforcement Learning from Human Feedback (RLHF) typically relies on static reward models to align Large Language Models with human preferences. However, human values are inherently diverse and heterogeneous, and a single reward model…

Machine Learning · Computer Science 2026-05-29 Zhenyu Sun , Zheng Xu , Ermin Wei

Gram: Assessing sabotage propensities via automated alignment auditing

We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios that incentivize sabotage. We find Gemini…

Machine Learning · Computer Science 2026-05-29 David Lindner , Victoria Krakovna , Sebastian Farquhar

Resolution Diagnostics for Paired LLM Evaluation

Across two public LLM leaderboards, many displayed pairwise rankings do not meet a conventional paired-test resolution target under the actual paired evaluation design: 11 of 40 Open LLM Leaderboard v1 pairwise comparisons and 4 of 9…

Computation and Language · Computer Science 2026-05-29 Anany Kotawala

A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved…

Robotics · Computer Science 2026-05-29 Yufei Jia , Zhanxiang Cao , Mingrui Yu , Heng Zhang , Shenyu Chen , Dixuan Jiang , Meng Li , Xiaofan Li , Yiyang Liu , Junzhe Wu , Zheng Li , XiLin Fang , Tingyu Cui , Shengcheng Fu , Haoyang Li , Anqi Wang , Zifan Wang , Dongjie Zhu , Chenyu Cao , Zhenbiao Huang , Ziang Zheng , Jie Lu , Xin Ma , Zhengyang Wei , Xiang Zhao , Tianyue Zhan , Ye He , Yuxiang Chen , Yizhou Jiang , Yue Li , Haizhou Ge , Yuhang Dong , Fan Jia , Ziheng Zhang , Meng Zhang , Xiwa Deng , Zhixing Chen , Hanyang Shao , Chenxin Dong , Yixuan Li , Yizhi Chen , Bokui Chen , Kaifeng Zhang , Hanqing Cui , Yusen Qin , Ruqi Huang , Lei Han , Tiancai Wang , Xiang Li , Yue Gao , Guyue Zhou

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited. Existing benchmarks often rely on static datasets or…

Computation and Language · Computer Science 2026-05-29 Valentina Bui Muti , Eugénie Dulout , Ziquan Fu

Self-Trained Verification for Training- and Test-Time Self-Improvement

Self-improvement at scale has been a longstanding goal for reasoning models, and there are two natural places to do it: at test time, through verification-refinement (V-R) loops; and at training time, through self-training methods. Both are…

Machine Learning · Computer Science 2026-05-29 Chen Henry Wu , Aditi Raghunathan

Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets

Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches…

Machine Learning · Computer Science 2026-05-29 M. Ross Kunz , John Merickel , Keith Wilson

Gaze2Act: Gaze-Conditioned Vision-Language-Action Policies for Interactive Robot Manipulation

Vision-Language-Action (VLA) models have recently shown strong potential for robot learning by following language instructions. However, in practice, language alone is often insufficient to precisely convey human intent. It is difficult to…

Robotics · Computer Science 2026-05-29 Kuangji Zuo , Gen Li , Bofan Lyu , Yanshuo Lu , Boyu Ma , Shijia Han , Xinyu Zhou , Xichen Yuan , Chuhao Zhou , Jiaqi Bai , Geng Li , Jianfei Yang

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In…

Robotics · Computer Science 2026-05-29 Qiuyue Wang , Mingsheng Li , Jian Guan , Jinhui Ye , Sicheng Xie , Yitao Liu , Junhao Chen , Zhixuan Liang , Jie Zhang , Xintong Hu , Xuhong Huang , Pei Lin , Junyang Lin , Dayiheng Liu , Shuai Bai , Jingren Zhou , Jiazhao Zhang , Haoqi Yuan , Gengze Zhou , Hang Yin , Ye Wang , Yiyang Huang , Zixing Lei , Wujian Peng , Delin Chen , Yingming Zheng , Jingyang Fan , Xianwei Zhuang , Xin Zhou , Haoyang Li , Anzhe Chen , Tong Zhang , Xuejing Liu , Yuchong Sun , Ruizhe Chen , Zhaohai Li , Chenxu Lü , Zhibo Yang , Tao Yu , Xionghui Chen

Neural Operator-Based Surrogate Model for CFD:Helical Coil Steam Generator in Small Modular Reactor

Real-time thermal-hydraulic simulation is essential for digital twin (DT) technology that supports the safe and efficient operation of small modular reactors (SMRs). Computational fluid dynamics (CFD) provides high-fidelity flow analysis,…

Machine Learning · Computer Science 2026-05-29 Minseo Lee , Seongmin Oh , Chaehyeon Song , Bumjin Cho , Shilaj Baral , Sangam Khanal , Minseop Song , Joongoo Jeon