Related papers: CDEMapper: Enhancing NIH Common Data Element Norma…

CDE-Mapper: Using Retrieval-Augmented Language Models for Linking Clinical Data Elements to Controlled Vocabularies

The standardization of clinical data elements (CDEs) aims to ensure consistent and comprehensive patient information across various healthcare systems. Existing methods often falter when standardizing CDEs of varying representation and…

Information Retrieval · Computer Science 2025-05-08 Komal Gilani , Marlo Verket , Christof Peters , Michel Dumontier , Hans-Peter Brunner-La Rocca , Visara Urovi

A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering

This research aims to develop a dynamic and scalable framework to facilitate harmonization of Common Data Elements (CDEs) across heterogeneous biomedical datasets by addressing challenges such as semantic heterogeneity, structural…

Information Retrieval · Computer Science 2025-06-04 Madan Krishnamurthy , Daniel Korn , Melissa A Haendel , Christopher J Mungall , Anne E Thessen

Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling

Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream…

Computation and Language · Computer Science 2026-04-07 Yuanhao Liu , Zihan Zhou , Kaiying Wu , Shuo Liu , Yiyang Huang , Jiajun Guo , Aimin Zhou , Hong Qian

Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel…

Methodology · Statistics 2026-03-17 Jia Liu , Zhiyu Xu , Yuqi Gu

Break the ID-Language Barrier: An Adaption Framework for LLM-based Sequential Recommendation

The recent breakthrough of large language models (LLMs) in natural language processing has sparked exploration in recommendation systems, however, their limited domain-specific knowledge remains a critical bottleneck. Specifically, LLMs…

Information Retrieval · Computer Science 2025-10-03 Xiaohan Yu , Li Zhang , Xin Zhao , Yue Wang

Embedding in Recommender Systems: A Survey

Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that convert the high-dimensional discrete features, such as user and…

Information Retrieval · Computer Science 2025-10-23 Maolin Wang , Xinjian Zhao , Wanyu Wang , Sheng Zhang , Jiansheng Li , Bowen Yu , Binhao Wang , Shucheng Zhou , Dawei Yin , Qing Li , Ruocheng Guo , Xiangyu Zhao

MapperGPT: Large Language Models for Linking and Mapping Entities

Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is…

Computation and Language · Computer Science 2023-10-06 Nicolas Matentzoglu , J. Harry Caufield , Harshad B. Hegde , Justin T. Reese , Sierra Moxon , Hyeongsik Kim , Nomi L. Harris , Melissa A Haendel , Christopher J. Mungall

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching

Entity matching (EM) is a critical step in entity resolution (ER). Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary…

Computation and Language · Computer Science 2024-12-13 Tianshu Wang , Xiaoyang Chen , Hongyu Lin , Xuanang Chen , Xianpei Han , Hao Wang , Zhenyu Zeng , Le Sun

Unlocking the Power of Large Language Models for Multi-table Entity Matching

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying…

Computation and Language · Computer Science 2026-04-24 Yingkai Tang , Taoyu Su , Wenyuan Zhang , Xiaoyang Guo , Tingwen Liu

Leveraging Large Language Models for Entity Matching

Entity matching (EM) is a critical task in data integration, aiming to identify records across different datasets that refer to the same real-world entities. Traditional methods often rely on manually engineered features and rule-based…

Computation and Language · Computer Science 2024-06-03 Qianyu Huang , Tongfang Zhao

LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items. Traditional sequential recommenders predominantly rely on ID-based…

Information Retrieval · Computer Science 2025-06-30 Yingzhi He , Xiaohao Liu , An Zhang , Yunshan Ma , Tat-Seng Chua

End-to-End Personalization: Unifying Recommender Systems with Large Language Models

Recommender systems are essential for guiding users through the vast and diverse landscape of digital content by delivering personalized and relevant suggestions. However, improving both personalization and interpretability remains a…

Information Retrieval · Computer Science 2025-08-05 Danial Ebrat , Tina Aminian , Sepideh Ahmadian , Luis Rueda

DEP: A Decentralized Large Language Model Evaluation Protocol

With the rapid development of Large Language Models (LLMs), a large number of benchmarks have been proposed. However, most benchmarks lack unified evaluation standard and require the manual implementation of custom scripts, making results…

Computation and Language · Computer Science 2026-03-03 Jianxiang Peng , Junhao Li , Hongxiang Wang , Haocheng Lyu , Hui Guo , Siyi Hao , Zhen Wang , Chuang Liu , Shaowei Zhang , Bojian Xiong , Yue Chen , Zhuowen Han , Ling Shi , Tianyu Dong , Juesi Xiao , Lei Yang , Yuqi Ren , Deyi Xiong

CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks via interactions with tools and data retrievers have garnered significant interest within database and AI communities. While these systems have…

Databases · Computer Science 2024-06-04 Yanlin Feng , Sajjadur Rahman , Aaron Feng , Vincent Chen , Eser Kandogan

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the…

Computation and Language · Computer Science 2025-02-25 Song Wang , Peng Wang , Tong Zhou , Yushun Dong , Zhen Tan , Jundong Li

Liberal Entity Matching as a Compound AI Toolchain

Entity matching (EM), the task of identifying whether two descriptions refer to the same entity, is essential in data management. Traditional methods have evolved from rule-based to AI-driven approaches, yet current techniques using large…

Databases · Computer Science 2024-06-18 Silvery D. Fu , David Wang , Wen Zhang , Kathleen Ge

Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads

Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators primarily focus on traditional…

Hardware Architecture · Computer Science 2026-04-02 Boyu Li , Zongwei Zhu , Yi Xiong , Qianyue Cao , Jiawei Geng , Xiaonan Zhang , Xi Li

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models

Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and precluding potentially valuable…

Computation and Language · Computer Science 2026-01-27 Brian Ondov , Chia-Hsuan Chang , Yujia Zhou , Mauro Giuffrè , Hua Xu

An NLP Crosswalk Between the Common Core State Standards and NAEP Item Specifications

Natural language processing (NLP) is rapidly developing for applications in educational assessment. In this paper, I describe an NLP-based procedure that can be used to support subject matter experts in establishing a crosswalk between item…

Computation and Language · Computer Science 2024-06-04 Gregory Camilli

Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal

We have seen remarkable success in representation learning and language models (LMs) using deep neural networks. Many studies aim to build the underlying connections among different modalities via the alignment and mappings at the token or…

Sound · Computer Science 2025-03-04 Daniel Chin , Gus Xia