Cheng Fu
Solid materials are commonly classified as crystalline or amorphous based on the presence or absence of long-range order.Metal-organic frameworks (MOFs), like other solids,also display markedly different properties and functions in these…
As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling.…
Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by…
In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing.…
The rise of large language models (LLMs) has significantly transformed both the construction and application of information retrieval (IR) systems. However, current interactions between IR systems and LLMs remain limited, with LLMs merely…
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up.…
Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic…
When trying to answer complex questions, people often rely on multiple sources of information, such as visual, textual, and tabular data. Previous approaches to this problem have focused on designing input features or model structure in the…
Multi-document grounded dialogue systems (DGDS) belong to a class of conversational agents that answer users' requests by finding supporting knowledge from a collection of documents. Most previous studies aim to improve the knowledge…
Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a…
Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learningbased methods achieve very impressive performance on standard EM benchmarks, their realworld application performance is much frustrating.…
Modeling place functions from a computational perspective is a prevalent research topic. Trajectory embedding, as a neural-network-backed dimension reduction technology, allows the possibility to put places with similar social functions at…
Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the…
Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead.…