Ji-Rong Wen — Scifaro

PhoneWorld: Scaling Phone-Use Agent Environments

A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important progress on evaluation, but they do not…

Computation and Language · Computer Science 2026-05-29 Zhengyang Tang , Yuxuan Liu , Xin Lai , Junyi Li , Pengyuan Lyu , Jason , Yiduo Guo , Zhengyao Fang , Yang Ding , Yi Zhang , Weinong Wang , Huawen Shen , Xingran Zhou , Liang Wu , Fei Tang , Sunqi Fan , Shangpin Peng , Zheng Ruan , Anran Zhang , Benyou Wang , Rui Yan , Ji-Rong Wen , Chengquan Zhang , Han Hu

Toward Autonomous Long-Horizon Engineering for ML Research

Agentic systems increasingly automate pieces of AI research. Yet turning underspecified research objectives into runnable, experimentally validated ML systems remains a central bottleneck. We study this operational setting as…

Computation and Language · Computer Science 2026-05-27 Guoxin Chen , Jie Chen , Lei Chen , Jiale Zhao , Fanzhe Meng , Wayne Xin Zhao , Ruihua Song , Cheng Chen , Ji-Rong Wen , Kai Jia

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Current code-agent benchmarks primarily evaluate localized issue resolution within a single target repository, leaving under-tested many software engineering tasks that require external knowledge or broader repository-level changes. We…

Computation and Language · Computer Science 2026-05-27 Guoxin Chen , Fanzhe Meng , Jiale Zhao , Minghao Li , Daixuan Cheng , Huatong Song , Jie Chen , Yuzhi Lin , Hui Chen , Xin Zhao , Ruihua Song , Chang Liu , Cheng Chen , Kai Jia , Ji-Rong Wen

Benchmarking LLMs for Community Governance Simulation with Life-history Narratives

Effective community governance hinges on understanding what specific residents think and need. Recent work has used large language models (LLMs) to simulate human respondents, offering a scalable, reproducible way to study human attitudes…

Computers and Society · Computer Science 2026-05-25 Xu Chen , Yuanzi Li , Lei Wang , Nan Lu , Yang Wang , Anding Wang , Lei Shi , Xiaoxing Fu , Ji-Rong Wen

ClawGym: A Scalable Framework for Building Effective Claw Agents

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially…

Computation and Language · Computer Science 2026-05-19 Fei Bai , Huatong Song , Shuang Sun , Daixuan Cheng , Yike Yang , Chuan Hao , Renyuan Li , Feng Chang , Yuan Wei , Ran Tao , Bryan Dai , Jian Yang , Wayne Xin Zhao , Ji-Rong Wen

Pareto-Guided Optimal Transport for Multi-Reward Alignment

Text-to-image generation models have achieved remarkable progress in preference optimization, yet achieving robust alignment across diverse reward models remains a significant challenge. Existing multi-reward fusion approaches rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Ying Ba , Tianyu Zhang , Mohan Zhou , Yalong Bai , Wenyi Mo , Guiwei Zhang , Bing Su , Ji-Rong Wen

Reasoning emerges from constrained inference manifolds in large language models

Reasoning in large language models is predominantly evaluated through labeled benchmarks, conflating task performance with the quality of internal inference. Here we study reasoning as an intrinsic dynamical process by examining the…

Machine Learning · Computer Science 2026-05-12 Yanbiao Ma , Fei Luo , Linfeng Zhang , Chuangxin Zhao , Mingxuan Wang , Yinan Wu , Zhe Qian , Yang Lu , Long Chen , Zhao Cao , Xiaoshuai Hao , Ji-Rong Wen , Jungong Han

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductor Discovery

Artificial intelligence has accelerated materials discovery through high-throughput prediction and generation, yet the decision problem remains a formidable bottleneck. While current AI systems readily propose millions of candidates,…

Machine Learning · Computer Science 2026-05-05 Mingze Li , Yu Rong , Songyou Li , Lihong Wang , Jiacheng Cen , Liming Wu , Anyi Li , Zongzhao Li , Qiuliang Liu , Rui Jiao , Tian Bian , Pengju Wang , Hao Sun , Jianfeng Zhang , Ji-Rong Wen , Deli Zhao , Shifeng Jin , Tingyang Xu , Wenbing Huang

Improving Vision-language Models with Perception-centric Process Reward Models

Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Yingqian Min , Kun Zhou , Yifan Li , Yuhuan Wu , Han Peng , Yifan Du , Wayne Xin Zhao , Min Yang , Ji-Rong Wen

Towards Long-horizon Agentic Multimodal Search

Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Yifan Du , Zikang Liu , Jinbiao Peng , Jie Wu , Junyi Li , Jinyang Li , Wayne Xin Zhao , Ji-Rong Wen

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting…

Artificial Intelligence · Computer Science 2026-04-21 Guanting Dong , Junting Lu , Junjie Huang , Wanjun Zhong , Longxiang Liu , Shijue Huang , Zhenyu Li , Yang Zhao , Xiaoshuai Song , Xiaoxi Li , Jiajie Jin , Yutao Zhu , Hanbin Wang , Fangyu Lei , Qinyu Luo , Mingyang Chen , Zehui Chen , Jiazhan Feng , Ji-Rong Wen , Zhicheng Dou

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted;…

Computation and Language · Computer Science 2026-04-20 Xiaoshuai Song , Haofei Chang , Guanting Dong , Yutao Zhu , Ji-Rong Wen , Zhicheng Dou

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Recently, scaling reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs) has emerged as an effective training paradigm for significantly improving model capabilities, which requires guiding the model to…

Machine Learning · Computer Science 2026-04-14 Zhipeng Chen , Tao Qian , Wayne Xin Zhao , Ji-Rong Wen

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

The rapid advancement of large reasoning models has saturated existing math benchmarks, underscoring the urgent need for more challenging evaluation frameworks. To address this, we introduce OlymMATH, a rigorously curated, Olympiad-level…

Computation and Language · Computer Science 2026-04-14 Haoxiang Sun , Yingqian Min , Zhipeng Chen , Wayne Xin Zhao , Ji-Rong Wen

Computer Environments Elicit General Agentic Intelligence in LLMs

Agentic intelligence in large language models (LLMs) requires not only model intrinsic capabilities but also interactions with external environments. Equipping LLMs with computers now represents a prevailing trend. However, the computer…

Computation and Language · Computer Science 2026-04-09 Daixuan Cheng , Shaohan Huang , Yuxian Gu , Huatong Song , Guoxin Chen , Li Dong , Wayne Xin Zhao , Ji-Rong Wen , Furu Wei

Learning to Retrieve from Agent Trajectories

Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of…

Information Retrieval · Computer Science 2026-04-08 Yuqi Zhou , Sunhao Dai , Changle Qu , Liang Pang , Jun Xu , Ji-Rong Wen

LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation

Traditional social science research often requires designing complex experiments across vast methodological spaces and depends on real human participants, making it labor-intensive, costly, and difficult to scale. Here we present…

Artificial Intelligence · Computer Science 2026-04-03 Lei Wang , Yuanzi Li , Jinchao Wu , Heyang Gao , Xiaohe Bo , Xu Chen , Ji-Rong Wen

Masked Diffusion Models as Energy Minimization

We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy…

Machine Learning · Computer Science 2026-03-24 Sitong Chen , Shen Nie , Jiacheng Sun , Zijin Feng , Zhenguo Li , Ji-Rong Wen , Chongxuan Li

L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention

Recently, Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs), but Vision-Language Models (VLMs) still struggle with multi-step reasoning tasks due to limited multimodal reasoning…

Computation and Language · Computer Science 2026-03-23 Yuliang Zhan , Xinyu Tang , Han Wan , Jian Li , Ji-Rong Wen , Hao Sun

A Survey of Large Language Models

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach,…

Computation and Language · Computer Science 2026-03-19 Wayne Xin Zhao , Kun Zhou , Junyi Li , Tianyi Tang , Xiaolei Wang , Yupeng Hou , Yingqian Min , Beichen Zhang , Junjie Zhang , Zican Dong , Yifan Du , Chen Yang , Yushuo Chen , Zhipeng Chen , Jinhao Jiang , Ruiyang Ren , Yifan Li , Xinyu Tang , Zikang Liu , Peiyu Liu , Jian-Yun Nie , Ji-Rong Wen