Lin Sun
LLM agents are increasingly deployed as executable systems that use tools, modify workspaces, and produce concrete artifacts. In such workflows, performance depends not only on the base model, but also on the harness: the system layer that…
We argue that multi-document reasoning is constrained not only by how much text a model can read, but also by how limited query-time evidence budget is allocated across documents and semantic granularities. Full-context inference exposes…
Large language model agents increasingly rely on persistent memory to store past interactions, retrieve relevant demonstrations, and improve long-horizon task execution. However, this memory mechanism also creates a practical security…
Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream…
Using an asymptotic perturbation method, we study the initial value problem for the KP equation with initial data consisting of parts of exact line-soliton solutions. We consider a slow modulation of the soliton parameters, described by a…
The challenge of reducing the size of Large Language Models (LLMs) while maintaining their performance has gained significant attention. However, existing methods, such as model distillation and transfer learning, often fail to achieve high…
Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and…
Physical random number generators based on chaotic microcombs, with their complex nonlinear dynamics and multi-channel parallel capability, have attracted considerable research attention. However, key technical challenges for chaotic…
The pseudogap phenomenon is a hallmark of strongly interacting Fermi systems, from high-temperature superconductors to ultracold atomic gases, yet its precise origin remains debated. Here we calculate the spectral function and rf spectra of…
The existence of a pseudogap in unitary Fermi gases has recently been established and measured experimentally [Li et al., Nature 626, 288 (2024)]. This lends strong support for the pairing origin as the mechanism of the pseudogap in Fermi…
Proper treatment of the many-body interactions is of paramount importance in our understanding of strongly correlated systems. Here we investigate the effects of particle-hole fluctuations on the Berezinskii-Kosterlitz-Thouless (BKT)…
Model merging has emerged as a promising paradigm for composing the capabilities of large language models by directly operating in weight space, enabling the integration of specialized models without costly retraining. However, existing…
Large Language Models (LLMs) face a fundamental safety-helpfulness trade-off due to static, one-size-fits-all safety policies that lack runtime controllabilityxf, making it difficult to tailor responses to diverse application needs. %As a…
In recent years, safety risks associated with large language models have become increasingly prominent, highlighting the urgent need to mitigate the generation of toxic and harmful content. The mainstream paradigm for LLM safety alignment…
Temporal context is essential for robotic manipulation because such tasks are inherently non-Markovian, yet mainstream VLA models typically overlook it and struggle with long-horizon, temporally dependent tasks. Cognitive science suggests…
Current methods for content safety in Large Language Models (LLMs), such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), often rely on multi-stage training pipelines and lack fine-grained,…
This paper investigates the maximum spectral radius of planar graphs with concrete fixed number of vertices, providing some tight bounds on the maximum spectral radius of general planar graph resorting to its order, and confirming that…
In this paper, we present Dexbotic, an open-source Vision-Language-Action (VLA) model toolbox based on PyTorch. It aims to provide a one-stop VLA research service for professionals in the field of embodied intelligence. It offers a codebase…
In this paper, we propose a ``Generalization Stress Test" to assess Large Language Models' (LLMs) generalization ability under slight and controlled perturbations, including option length, problem types, and irrelevant noun replacements. We…
Document images encapsulate a wealth of knowledge, while the portability of spoken queries enables broader and flexible application scenarios. Yet, no prior work has explored knowledge base question answering over visual document images…