Tianshi Zheng — Scifaro

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning

Frontier scientific reasoning is rapidly emerging as a key foundation for advancing AI agents in automated scientific discovery. Deep research agents offer a promising approach to this challenge. These models develop robust problem-solving…

Artificial Intelligence · Computer Science 2026-05-27 Tianshi Zheng , Rui Wang , Xiyun Li , Kelvin Kiu Wai Tam , Newt Nguyen Kim Hue Nam , Wei Fan , Yangqiu Song , Tianqing Fang

Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactive…

Computation and Language · Computer Science 2026-05-26 Wei Fan , Yining Zhou , Mufan Zhang , Yanbing Weng , Yiran HU , Tianshi Zheng , Baixuan Xu , Chunyang Li , Jianhui Yang , Haoran Li , Yangqiu Song

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capability: long-context LVLMs and memory-augmented agents. However, no existing benchmark…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Xiyu Ren , Zhaowei Wang , Yiming Du , Zhongwei Xie , Chi Liu , Xinlin Yang , Haoyue Feng , Wenjun Pan , Tianshi Zheng , Baixuan Xu , Zhengnan Li , Yangqiu Song , Ginny Wong , Simon See

Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs

Abductive reasoning in knowledge graphs aims to generate plausible logical hypotheses from observed entities, with broad applications in areas such as clinical diagnosis and scientific discovery. However, due to a lack of controllability, a…

Artificial Intelligence · Computer Science 2026-05-04 Yisen Gao , Jiaxin Bai , Tianshi Zheng , Qingyun Sun , Ziwei Zhang , Xingcheng Fu , Jianxin Li , Yangqiu Song

AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction

Building effective knowledge graphs (KGs) for Retrieval-Augmented Generation (RAG) is pivotal for advancing question answering (QA) systems. However, its effectiveness is hindered by a fundamental disconnect: the knowledge graph (KG)…

Computation and Language · Computer Science 2026-04-23 Hong Ting Tsang , Jiaxin Bai , Haoyu Huang , Qiao Xiao , Tianshi Zheng , Baixuan Xu , Shujie Liu , Yangqiu Song

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence, enabling complex reasoning, web interaction, coding, and autonomous research capabilities. However, current agent…

Artificial Intelligence · Computer Science 2026-04-23 Tianqing Fang , Zhisong Zhang , Xiaoyang Wang , Rui Wang , Can Qin , Yuxuan Wan , Jun-Yu Ma , Ce Zhang , Jiaqi Chen , Xiyun Li , Yonglin Wang , Jingchen Ni , Tianshi Zheng , Chun Chen , Wenhao Yu , Zhenwen Liang , Hongming Zhang , Haitao Mi , Dong Yu

Rethinking Prospect Theory for LLMs: Revealing the Instability of Decision-Making under Epistemic Uncertainty

Prospect Theory (PT) models human decision-making behaviour under uncertainty, among which linguistic uncertainty is commonly adopted in real-world scenarios. Although recent studies have developed some frameworks to test PT parameters for…

Artificial Intelligence · Computer Science 2026-04-13 Rui Wang , Qihan Lin , Jiayu Liu , Qing Zong , Tianshi Zheng , Dadi Guo , Haochen Shi , Weiqi Wang , Yangqiu Song

NAACL: Noise-AwAre Verbal Confidence Calibration for Robust LLMs in RAG Systems

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in…

Computation and Language · Computer Science 2026-03-23 Jiayu Liu , Rui Wang , Qing Zong , Yumeng Wang , Cheng Qian , Qingcheng Zeng , Tianshi Zheng , Haochen Shi , Dadi Guo , Baixuan Xu , Chunyang Li , Yangqiu Song

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Large language models are emerging as powerful tools for scientific law discovery, a foundational challenge in AI-driven science. However, existing benchmarks for this task suffer from a fundamental methodological trilemma, forcing a…

Artificial Intelligence · Computer Science 2026-02-25 Tianshi Zheng , Kelvin Kiu-Wai Tam , Newt Hue-Nam K. Nguyen , Baixuan Xu , Zhaowei Wang , Jiayang Cheng , Hong Ting Tsang , Weiqi Wang , Jiaxin Bai , Tianqing Fang , Yangqiu Song , Ginny Y. Wong , Simon See

SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

Test-time scaling without interpreter feedback is essential for real-world code generation scenarios where test cases are not readily available. While existing paradigms often rely on either greedy exploitation (i.e., iterative refinement)…

Software Engineering · Computer Science 2025-11-06 Yixiang Chen , Tianshi Zheng , Shijue Huang , Zhitao He , Yi R. Fung

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs). However, our study reveals a surprising contradiction to this prevailing perspective within the…

Computation and Language · Computer Science 2025-11-04 Tianshi Zheng , Yixiang Chen , Chengxi Li , Chunyang Li , Qing Zong , Haochen Shi , Baixuan Xu , Yangqiu Song , Ginny Y. Wong , Simon See

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Accurate confidence calibration in Large Language Models (LLMs) is critical for safe use in high-stakes domains, where clear verbalized confidence enhances user trust. Traditional methods that mimic reference confidence expressions often…

Computation and Language · Computer Science 2025-10-29 Qing Zong , Jiayu Liu , Tianshi Zheng , Chunyang Li , Baixuan Xu , Haochen Shi , Weiqi Wang , Zhaowei Wang , Chunkit Chan , Yangqiu Song

DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay

Multimodal abductive reasoning--the generation and selection of explanatory hypotheses from partial observations--is a cornerstone of intelligence. Current evaluations of this ability in vision-language models (VLMs) are largely confined to…

Artificial Intelligence · Computer Science 2025-10-14 Yunxiang Mo , Tianshi Zheng , Qing Zong , Jiayu Liu , Baixuan Xu , Yauwai Yim , Chunkit Chan , Jiaxin Bai , Yangqiu Song

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

Enabling LLMs to effectively operate long-horizon task which requires long-term planning and multiple interactions is essential for open-world autonomy. Conventional methods adopt planning with actions where a executable action list would…

Artificial Intelligence · Computer Science 2025-10-09 Baixuan Xu , Tianshi Zheng , Zhaowei Wang , Hong Ting Tsang , Weiqi Wang , Tianqing Fang , Yangqiu Song

LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game

Effective multi-agent collaboration requires agents to infer the rationale behind others' actions, a capability rooted in Theory-of-Mind (ToM). While recent Large Language Models (LLMs) excel at logical inference, their ability to infer…

Artificial Intelligence · Computer Science 2025-10-07 Fangzhou Liang , Tianshi Zheng , Chunkit Chan , Yauwai Yim , Yangqiu Song

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI…

Computation and Language · Computer Science 2025-09-18 Tianshi Zheng , Zheye Deng , Hong Ting Tsang , Weiqi Wang , Jiaxin Bai , Zihao Wang , Yangqiu Song

LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning

Modern large language models (LLMs) employ diverse logical inference mechanisms for reasoning, making the strategic optimization of these approaches critical for advancing their capabilities. This paper systematically investigate the…

Computation and Language · Computer Science 2025-09-18 Tianshi Zheng , Jiayang Cheng , Chunyang Li , Haochen Shi , Zihao Wang , Jiaxin Bai , Yangqiu Song , Ginny Y. Wong , Simon See

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

The evolution of AI systems toward agentic operation and context-aware retrieval necessitates transforming unstructured text into structured formats like tables, knowledge graphs, and charts. While such conversions enable critical…

Computation and Language · Computer Science 2025-08-19 Zheye Deng , Chunkit Chan , Tianshi Zheng , Wei Fan , Weiqi Wang , Yangqiu Song

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce…

Computation and Language · Computer Science 2025-08-04 Jiaxin Bai , Wei Fan , Qi Hu , Qing Zong , Chunyang Li , Hong Ting Tsang , Hongyu Luo , Yauwai Yim , Haoyu Huang , Xiao Zhou , Feng Qin , Tianshi Zheng , Xi Peng , Xin Yao , Huiwen Yang , Leijie Wu , Yi Ji , Gong Zhang , Renhai Chen , Yangqiu Song

KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?

Retrieval-Augmented Generation (RAG) systems show remarkable potential as question answering tools in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However,…

Computation and Language · Computer Science 2025-07-22 Tianshi Zheng , Weihan Li , Jiaxin Bai , Weiqi Wang , Yangqiu Song