Computer Science

On abelian periodicity of purely morphic words

Deciding periodicity of infinite words generated by morphisms is a classical result in combinatorics on words from 80's by Harju, Linna and Pansiot. In this paper, we are interested in this question in the abelian setting. Two words are…

Discrete Mathematics · Computer Science 2026-05-29 Arina Filimonova , Svetlana Puzynina

GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases

Semi-structured knowledge bases (SKBs) embed textual documents in a typed graph of entities and relations, and underpin applications such as product search, academic paper search, and precision-medicine inquiries. Existing hybrid retrieval…

Information Retrieval · Computer Science 2026-05-29 Yicheng Tao , Yiqun Wang , Xiangchen Song , Xin Luo , Kai Liu , Jie Liu

LexPath: A domain-oriented multi-path framework for legal article retrieval

Legal article retrieval is critical for building traceable and reliable legal AI systems, where conclusions must be grounded in specific legal articles. However, existing open-domain retrieval methods rely heavily on surface-level lexical…

Information Retrieval · Computer Science 2026-05-29 Weixuan Liu , Qingfeng Zhuge , Xuyang Chen

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval…

Information Retrieval · Computer Science 2026-05-29 Lixuan Guo , Yifei Wang , Tiansheng Wen , Aosong Feng , Stefanie Jegelka , Chenyu You

Uncertainty Quantification for Multimodal Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) improves the question answering capabilities of Large Language Models (LLMs) by incorporating external knowledge and has recently been extended to multimodal settings through Vision-Language Models…

Information Retrieval · Computer Science 2026-05-29 Simon Binz , Heydar Soudani , Faegheh Hasibi

Rec-Distill: An Industrial Distillation Pipeline for Large-Scale Recommendation Models

Large recommendation models have demonstrated substantial potential gains under scaling laws, yet these gains are difficult to realize in industrial recommendation systems because real-world deployment requires lightweight models with…

Information Retrieval · Computer Science 2026-05-29 Haoran Ding , Wenlin Zhao , Yuchen Jiang , Juren Li , Jie Zhu , Xinchun Li , Yishujie Zhao , Yi Zhang , Ao Qiao , Jianhui Dong , Cheng Chen , Ziyan Gong , Deping Xie , Peng Xu , Zikai Wang , Yuwei Wang , Huizhi Yang , Zhe Chen , Yuchao Zheng

Dichotomy study of the Steiner tree problem in split-like graphs

Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the minimum Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. A \emph{split…

Discrete Mathematics · Computer Science 2026-05-29 Jyothish S , Sadagopan Narasimhan

FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Scoring

Late-interaction retrieval (ColBERT, ColPali) scores a query against a document with the MaxSim operator: for every query token, the maximum similarity over the document tokens, summed over query tokens. The standard implementation…

Information Retrieval · Computer Science 2026-05-29 Roi Pony , Adi Raz Goldfarb , Idan Friedman , Daniel Ezer , Udi Barzelay

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen…

Information Retrieval · Computer Science 2026-05-29 Benjamin Clavié , Sean Lee , Aamir Shakir , Makoto P. Kato

ACE: Anisotropy-Controllable Embedding for LLM-enhanced Sequential Recommendation

Recent advances in the LLM-as-Extractor paradigm leverage large language models (LLMs) to transfer semantically rich item embeddings into sequential recommendation (SR) backbones. However, LLM-generated embeddings often suffer from strong…

Information Retrieval · Computer Science 2026-05-29 Dongcheol Lee , Hye-young Kim , Jongwuk Lee

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking

Item-to-Item (I2I) retrieval is a fundamental part of modern content platforms, supporting critical industrial workflows from recommendation engines to content auditing. While multimodal embedding methods have advanced general retrieval,…

Information Retrieval · Computer Science 2026-05-29 Jinghan Zhao , Wenwei Jin , Anqi Li , Jintao Tong , Luya Mo , Jiawei Li , Bin Li , Yao Hu

CrossAlpha: An Annual-Report Benchmark for Cross-Market Factor Research

Cross-market factor research studies whether firm-level signals from one or more markets can predict returns in a target market, but existing public benchmarks do not support cross-market disclosure-to-return evaluation. Building such a…

Information Retrieval · Computer Science 2026-05-29 Qian Wang , Zhongyi Tong , Nuo Chen , Zhaomin Wu , Bingsheng He

On the Practice of Scaling Search Conversion Rate Prediction

Scaling a Search Conversion Rate (CVR) prediction model, especially in high-traffic environments, presents a challenge: superior model quality needs to be balanced with strict constraints on training cost and serving latency. This paper…

Information Retrieval · Computer Science 2026-05-29 James Pak , Jyun-Yu Jiang , Fan Zhang , Sen Wang , Taekmin Kim , Henry Tsai , Vijay Rajaram , Juexin Lin , Mohitdeep Singh , Alessandro Magnani , Johnny Chen , Qian Zhao , Rao Fu , Zhirong Liang , Jordan Gilliland , Winter Jiao

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

Traditional recommender systems (RecSys) primarily infer user preferences from implicit signals (such as clicks, watches, and purchases), often neglecting the rich explicit contextual feedback users provide through verbal text, like…

Information Retrieval · Computer Science 2026-05-29 Weizhi Zhang , Wooseong Yang , Yuxin Cui , Zhaohui Guo , Hins Hu , Liangwei Yang , Henry Peng Zou , Qifei Wang , Hanqing Zeng , Jiayi Liu , Yinglong Xia , Philip S. Yu

Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap

Real-world user behavior rarely consists of isolated actions; instead, it often forms intent flows governed by spatiotemporal dependencies. To provide integrated service recommendations, we focus on the task of Generative Spatiotemporal…

Information Retrieval · Computer Science 2026-05-29 Sicong Wang , Ruiting Dong , Yue Liu , Bowen Zheng , Jun Meng , Jie Li , Shuaijun Guo , Yu Gu , Fanyi Di , Xin Li

The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure

AI is transforming life sciences research at unprecedented speed, accelerating discovery across protein structure prediction, genome modeling, and drug development (Jumper et al., 2021; Mak et al., 2024). Yet this rapid advancement, coupled…

Digital Libraries · Computer Science 2026-05-29 Vasudha Sharma , Chakresh Kumar Singh , Jayesh Choudhari , Dharmit Nakrani

Co-creation of AI technology, empowering curators of cultural heritage information and guarding research commons

The substance of this paper is the description of the use of Retrieval-Augmented Generation (RAG) for specific digital collections of cultural assets. The collections are provided by institutions operating in the cultural sector. The…

Digital Libraries · Computer Science 2026-05-29 Andrea Scharnhorst , Han Yang , Jetze Touber , Kim Ferguson , Philipp Mayr , Vyacheslav Tykhonov

Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative Recommenders

Recently, Generative Recommenders (GRs), characterized by a unified end-to-end framework, have exhibited astonishing potential in transforming the recommendation paradigm. Despite their effectiveness, we recognize that GRs are still…

Information Retrieval · Computer Science 2026-05-29 Jun Yin , Bangguo Zhu , Peng Huo , Ruochen Liu , Hao Chen , Senzhang Wang , Shirui Pan , Chengqi Zhang

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled…

Information Retrieval · Computer Science 2026-05-29 Shuai Wang , Yu Yin , Shengyao Zhuang , Bevan Koopman , Guido Zuccon

Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents

Large Language Models (LLM) have been widely used in reranking. Computational overhead and large context lengths remain a challenging issue for LLM rerankers. Efficient reranking usually involves selecting a subset of the ranked list from…

Information Retrieval · Computer Science 2026-05-29 Nilanjan Sinhababu , Soumedhik Bharati , Debasis Ganguly , Pabitra Mitra