Yongxin Tong — Scifaro

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

Vision-Language Models (VLMs) have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. Federated Learning mitigates this issue by enabling…

Artificial Intelligence · Computer Science 2026-05-06 Shule Lu , Yujing Wang , Hainan Zhang , Xiaoshan Yang , Hongwei Zheng , Yongxin Tong , Changsheng Xu , Zhiming Zheng

Taming Noise-Induced Prototype Degradation for Privacy-Preserving Personalized Federated Fine-Tuning

Prototype-based Personalized Federated Learning (ProtoPFL) enables efficient multi-domain adaptation by communicating compact class prototypes, but directly sharing them poses privacy risks. A common defense involves per-example $\ell_2$…

Computer Vision and Pattern Recognition · Computer Science 2026-05-01 Yuhua Wang , Qinnan Zhang , Xiaodong Li , Huan Zhang , Yifan Sun , Wangjie Qiu , Hainan Zhang , Yongxin Tong , Zhiming Zheng

Unified and Efficient Approach for Multi-Vector Similarity Search

Multi-Vector Similarity Search is essential for fine-grained semantic retrieval in many real-world applications, offering richer representations than traditional single-vector paradigms. Due to the lack of native multi-vector index,…

Databases · Computer Science 2026-04-06 Binhan Yang , Yuxiang Zeng , Hengxin Zhang , Zhuanglin Zheng , Yunzhen Chi , Yongxin Tong , Ke Xu

Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

Distance Comparison Operations (DCOs), which decide whether the distance between a data vector and a query is within a threshold, are a critical performance bottleneck in vector similarity search. Recent DCO methods that avoid…

Databases · Computer Science 2026-04-06 Zhuanglin Zheng , Yuxiang Zeng , Chenchen Liu , Yunzhen Chi , Binhan Yang , Yongxin Tong

FGIM: a Fast Graph-based Indexes Merging Framework for Approximate Nearest Neighbor Search

As the state-of-the-art methods for high-dimensional data retrieval, Approximate Nearest Neighbor Search (ANNS) approaches with graph-based indexes have attracted increasing attention and play a crucial role in many real-world applications,…

Databases · Computer Science 2026-03-24 Zekai Wu , Jiabao Jin , Peng Cheng , Xiaoyao Zhong , Lei Chen , Yongxin Tong , Zhitao Shen , Jingkuan Song , Heng Tao Shen , Xuemin Lin

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent…

Artificial Intelligence · Computer Science 2026-03-18 Zihe Wang , Yihuan Wang , Haiyang Yu. Zhiyong Cui , Xiaojian Liao , Chengcheng Wang , Yonglin Tian , Yongxin Tong

Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

Data quality remains an important challenge in data-driven systems, as errors in tabular data can severely compromise downstream analytics and machine learning performance. Although numerous error detection algorithms have been proposed,…

Databases · Computer Science 2026-03-10 Xinyuan Liu , Jiahui Chen , Bocheng Hu , Yu Sun , Xinyang Chen , Shaoxu Song , Yongxin Tong

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

VLMs have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. FL mitigates this issue by enabling decentralized training, but practical…

Artificial Intelligence · Computer Science 2026-03-06 Shule Lu , Yujing Wang , Hainan Zhang , Xiaoshan Yang , Hongwei Zheng , Yongxin Tong , Changsheng Xu , Zhiming Zheng

FedMosaic: Federated Retrieval-Augmented Generation via Parametric Adapters

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge to improve factuality and reduce hallucinations. Yet most deployments assume a centralized corpus, which is infeasible…

Computation and Language · Computer Science 2026-02-06 Zhilin Liang , Yuxiang Wang , Zimu Zhou , Hainan Zhang , Boyi Liu , Yongxin Tong

GraphDLG: Exploring Deep Leakage from Gradients in Federated Graph Learning

Federated graph learning (FGL) has recently emerged as a promising privacy-preserving paradigm that enables distributed graph learning across multiple data owners. A critical privacy concern in federated learning is whether an adversary can…

Machine Learning · Computer Science 2026-01-28 Shuyue Wei , Wantong Chen , Tongyu Wei , Chen Gong , Yongxin Tong , Lizhen Cui

Less is More: Compact Clue Selection for Efficient Retrieval-Augmented Generation Reasoning

Current RAG retrievers are designed primarily for human readers, emphasizing complete, readable, and coherent paragraphs. However, Large Language Models (LLMs) benefit more from precise, compact, and well-structured input, which enhances…

Computation and Language · Computer Science 2026-01-28 Qianchi Zhang , Hainan Zhang , Liang Pang , Yongxin Tong , Hongwei Zheng , Zhiming Zheng

CAFEDistill: Learning Personalized and Dynamic Models through Federated Early-Exit Network Distillation

Personalized Federated Learning (PFL) enables collaboratively model training on decentralized, heterogeneous data while tailoring them to each client's unique distribution. However, existing PFL methods produce static models with a fixed…

Machine Learning · Computer Science 2026-01-16 Boyi Liu , Zimu Zhou , Yongxin Tong

FedSEA-LLaMA: A Secure, Efficient and Adaptive Federated Splitting Framework for Large Language Models

Private data holds promise for improving LLMs due to its high quality, but its scattered distribution across data silos and the high computational demands of LLMs limit their deployment in federated environments. To address this, the…

Computation and Language · Computer Science 2026-01-05 Zishuai Zhang , Hainan zhang , Weihua Li , Qinnan zhang , jin Dong , Yongxin Tong , Zhiming Zheng

Privacy-Preserving Reasoning with Knowledge-Distilled Parametric Retrieval Augmented Generation

The current RAG system requires uploading plaintext documents to the cloud, risking private data leakage. Parametric RAG (PRAG) encodes documents as LoRA parameters within LLMs, offering a possible way to reduce exposure of raw content.…

Computation and Language · Computer Science 2025-12-01 Jinwen Chen , Hainan Zhang , Liang Pang , Yongxin Tong , Haibo Zhou , Yuan Zhan , Wei Lin , Zhiming Zheng

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting

Periodicity is a fundamental characteristic of time series data and has long played a central role in forecasting. Recent deep learning methods strengthen the exploitation of periodicity by treating patches as basic tokens, thereby…

Machine Learning · Computer Science 2025-10-07 Yiming Niu , Jinliang Deng , Yongxin Tong

HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data

Centralized RAG pipelines struggle with heterogeneous and privacy-sensitive data, especially in distributed healthcare settings where patient data spans SQL, knowledge graphs, and clinical notes. Clinicians face difficulties retrieving rare…

Artificial Intelligence · Computer Science 2025-09-09 Cheng Qian , Hainan Zhang , Yongxin Tong , Hong-Wei Zheng , Zhiming Zheng

Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering

Accurate and efficient multivariate time series (MTS) forecasting is essential for applications such as traffic management and weather prediction, which depend on capturing long-range temporal dependencies and interactions between entities.…

Machine Learning · Computer Science 2025-05-27 Yiming Niu , Jinliang Deng , Lulu Zhang , Zimu Zhou , Yongxin Tong

Efficient Data Valuation Approximation in Federated Learning: A Sampling-based Approach

Federated learning paradigm to utilize datasets across multiple data providers. In FL, cross-silo data providers often hesitate to share their high-quality dataset unless their data value can be fairly assessed. Shapley value (SV) has been…

Machine Learning · Computer Science 2025-04-24 Shuyue Wei , Yongxin Tong , Zimu Zhou , Tianran He , Yi Xu

Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented Large Language Models

Retrieval-Augmented Generation (RAG) is a promising technique for applying LLMs to proprietary domains. However, retrieved documents may contain sensitive knowledge, posing risks of privacy leakage in generative results. Thus, effectively…

Computation and Language · Computer Science 2025-04-15 Yujing Wang , Hainan Zhang , Liang Pang , Yongxin Tong , Binghui Guo , Hongwei Zheng , Zhiming Zheng

Ten Challenging Problems in Federated Foundation Models

Federated Foundation Models (FedFMs) represent a distributed learning paradigm that fuses general competences of foundation models as well as privacy-preserving capabilities of federated learning. This combination allows the large…

Machine Learning · Computer Science 2025-02-19 Tao Fan , Hanlin Gu , Xuemei Cao , Chee Seng Chan , Qian Chen , Yiqiang Chen , Yihui Feng , Yang Gu , Jiaxiang Geng , Bing Luo , Shuoling Liu , Win Kent Ong , Chao Ren , Jiaqi Shao , Chuan Sun , Xiaoli Tang , Hong Xi Tae , Yongxin Tong , Shuyue Wei , Fan Wu , Wei Xi , Mingcong Xu , He Yang , Xin Yang , Jiangpeng Yan , Hao Yu , Han Yu , Teng Zhang , Yifei Zhang , Xiaojin Zhang , Zhenzhe Zheng , Lixin Fan , Qiang Yang