Related papers: LinkML: An Open Data Modeling Framework

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

Linking information across sources is fundamental to a variety of analyses in social science, business, and government. While large language models (LLMs) offer enormous promise for improving record linkage in noisy datasets, in many…

Computation and Language · Computer Science 2024-06-26 Abhishek Arora , Melissa Dell

Leveraging Large Language Models for Automated Scalable Development of Open Scientific Databases

With the exponential increase in online scientific literature, identifying reliable domain-specific data has become increasingly important but also very challenging. Manual data collection and filtering for domain-specific scientific…

Information Retrieval · Computer Science 2026-03-10 Nikita Gautam , Doina Caragea , Ignacio Ciampitti , Federico Gomez

The AnIML Ontology: Enabling Semantic Interoperability for Large-Scale Experimental Data in Interconnected Scientific Labs

Achieving semantic interoperability across heterogeneous experimental data systems remains a major barrier to data-driven scientific discovery. The Analytical Information Markup Language (AnIML), a flexible XML-based standard for analytical…

Artificial Intelligence · Computer Science 2026-04-03 Wilf Morlidge , Elliott Watkiss-Leek , George Hannah , Harry Rostron , Andrew Ng , Ewan Johnson , Andrew Mitchell , Terry R. Payne , Valentina Tamma , Jacopo de Berardinis

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…

Computation and Language · Computer Science 2024-02-20 YiQiu Guo , Yuchen Yang , Ya Zhang , Yu Wang , Yanfeng Wang

A Simple Standard for Sharing Ontological Mappings (SSSOM)

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases…

Databases · Computer Science 2023-09-26 Nicolas Matentzoglu , James P. Balhoff , Susan M. Bello , Chris Bizon , Matthew Brush , Tiffany J. Callahan , Christopher G Chute , William D. Duncan , Chris T. Evelo , Davera Gabriel , John Graybeal , Alasdair Gray , Benjamin M. Gyori , Melissa Haendel , Henriette Harmse , Nomi L. Harris , Ian Harrow , Harshad Hegde , Amelia L. Hoyt , Charles T. Hoyt , Dazhi Jiao , Ernesto Jiménez-Ruiz , Simon Jupp , Hyeongsik Kim , Sebastian Koehler , Thomas Liener , Qinqin Long , James Malone , James A. McLaughlin , Julie A. McMurry , Sierra Moxon , Monica C. Munoz-Torres , David Osumi-Sutherland , James A. Overton , Bjoern Peters , Tim Putman , Núria Queralt-Rosinach , Kent Shefchek , Harold Solbrig , Anne Thessen , Tania Tudorache , Nicole Vasilevsky , Alex H. Wagner , Christopher J. Mungall

UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models

Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, such as Wikipedia. Existing methods focus heavily on using complex…

Artificial Intelligence · Computer Science 2024-08-22 Liu Qi , He Yongyi , Lian Defu , Zheng Zhi , Xu Tong , Liu Che , Chen Enhong

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a…

Computation and Language · Computer Science 2025-10-21 Ming Hu , Chenglong Ma , Wei Li , Wanghan Xu , Jiamin Wu , Jucheng Hu , Tianbin Li , Guohang Zhuang , Jiaqi Liu , Yingzhou Lu , Ying Chen , Chaoyang Zhang , Cheng Tan , Jie Ying , Guocheng Wu , Shujian Gao , Pengcheng Chen , Jiashi Lin , Haitao Wu , Lulu Chen , Fengxiang Wang , Yuanyuan Zhang , Xiangyu Zhao , Feilong Tang , Encheng Su , Junzhi Ning , Xinyao Liu , Ye Du , Changkai Ji , Pengfei Jiang , Cheng Tang , Ziyan Huang , Jiyao Liu , Jiaqi Wei , Yuejin Yang , Xiang Zhang , Guangshuai Wang , Yue Yang , Huihui Xu , Ziyang Chen , Yizhou Wang , Chen Tang , Jianyu Wu , Yuchen Ren , Siyuan Yan , Zhonghua Wang , Zhongxing Xu , Shiyan Su , Shangquan Sun , Runkai Zhao , Zhisheng Zhang , Dingkang Yang , Jinjie Wei , Jiaqi Wang , Jiahao Xu , Jiangtao Yan , Wenhao Tang , Hongze Zhu , Yu Liu , Fudi Wang , Yiqing Shen , Yuanfeng Ji , Yanzhou Su , Tong Xie , Hongming Shan , Chun-Mei Feng , Zhi Hou , Diping Song , Lihao Liu , Yanyan Huang , Lequan Yu , Bin Fu , Shujun Wang , Xiaomeng Li , Xiaowei Hu , Yun Gu , Ben Fei , Benyou Wang , Yuewen Cao , Minjie Shen , Jie Xu , Haodong Duan , Fang Yan , Hongxia Hao , Jielan Li , Jiajun Du , Yanbo Wang , Imran Razzak , Zhongying Deng , Chi Zhang , Lijun Wu , Conghui He , Zhaohui Lu , Jinhai Huang , Wenqi Shao , Yihao Liu , Siqi Luo , Yi Xin , Xiaohong Liu , Fenghua Ling , Yuqiang Li , Aoran Wang , Siqi Sun , Qihao Zheng , Nanqing Dong , Tianfan Fu , Dongzhan Zhou , Yan Lu , Wenlong Zhang , Jin Ye , Jianfei Cai , Yirong Chen , Wanli Ouyang , Yu Qiao , Zongyuan Ge , Shixiang Tang , Junjun He , Chunfeng Song , Lei Bai , Bowen Zhou

OpenML: networked science in machine learning

Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning…

Machine Learning · Computer Science 2014-08-04 Joaquin Vanschoren , Jan N. van Rijn , Bernd Bischl , Luis Torgo

StreamLink: Large-Language-Model Driven Distributed Data Engineering System

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the…

Databases · Computer Science 2025-05-29 Dawei Feng , Di Mei , Huiri Tan , Lei Ren , Xianying Lou , Zhangxi Tan

SBML Qualitative Models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools

Background: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise…

Molecular Networks · Quantitative Biology 2013-09-10 Claudine Chaouiya , Duncan Berenguier , Sarah M Keating , Aurelien Naldi , Martijn P. van Iersel , Nicolas Rodriguez , Andreas Dräger , Finja Büchel , Thomas Cokelaer , Bryan Kowal , Benjamin Wicks , Emanuel Gonçalves , Julien Dorier , Michel Page , Pedro T. Monteiro , Axel von Kamp , Ioannis Xenarios , Hidde de Jong , Michael Hucka , Steffen Klamt , Denis Thieffry , Nicolas Le Novère , Julio Saez-Rodriguez , Tomáš Helikar

A Survey on Open Dataset Search in the LLM Era: Retrospectives and Perspectives

High-quality datasets are typically required for accomplishing data-driven tasks, such as training medical diagnosis models, predicting real-time traffic conditions, or conducting experiments to validate research hypotheses. Consequently,…

Information Retrieval · Computer Science 2025-09-03 Pengyue Li , Sheng Wang , Hua Dai , Zhiyu Chen , Zhifeng Bao , Brian D. Davison

Revealing Interconnections between Diseases: from Statistical Methods to Large Language Models

Identifying disease interconnections through manual analysis of large-scale clinical data is labor-intensive, subjective, and prone to expert disagreement. While machine learning (ML) shows promise, three critical challenges remain: (1)…

Machine Learning · Computer Science 2025-10-13 Alina Ermilova , Dmitrii Kornilov , Sofia Samoilova , Ekaterina Laptenkova , Anastasia Kolesnikova , Ekaterina Podplutova , Senotrusova Sofya , Maksim G. Sharaev

An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from…

Artificial Intelligence · Computer Science 2020-06-09 Maulik R. Kamdar , Mark A. Musen

Evidence-based lean logic profiles for conceptual data modelling languages

Multiple logic-based reconstructions of conceptual data modelling languages such as EER, UML Class Diagrams, and ORM exist. They mainly cover various fragments of the languages and none are formalised such that the logic applies…

Artificial Intelligence · Computer Science 2019-09-20 Pablo Rubén Fillottrani , C. Maria Keet

Literature Review Of Attribute Level And Structure Level Data Linkage Techniques

Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex…

Databases · Computer Science 2015-10-09 Mohammed Gollapalli

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of…

Computation and Language · Computer Science 2024-02-20 Fangzhi Xu , Zhiyong Wu , Qiushi Sun , Siyu Ren , Fei Yuan , Shuai Yuan , Qika Lin , Yu Qiao , Jun Liu

Utilizing Large Language Models for Natural Interface to Pharmacology Databases

The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast…

Computation and Language · Computer Science 2023-08-01 Hong Lu , Chuan Li , Yinheng Li , Jie Zhao

dLLM: Simple Diffusion Language Modeling

Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations,…

Computation and Language · Computer Science 2026-02-27 Zhanhui Zhou , Lingjie Chen , Hanghang Tong , Dawn Song

DataJoint: A Simpler Relational Data Model

The relational data model offers unrivaled rigor and precision in defining data structure and querying complex data. Yet the use of relational databases in scientific data pipelines is limited due to their perceived unwieldiness. We propose…

Databases · Computer Science 2018-07-31 Dimitri Yatsenko , Edgar Y. Walker , Andreas S. Tolias

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, the understanding of their capability to process structured data like tables remains an under-explored area.…

Computation and Language · Computer Science 2024-07-18 Yuan Sui , Mengyu Zhou , Mingjie Zhou , Shi Han , Dongmei Zhang