Related papers: Omics Data Discovery Agents

Low Resource Recognition and Linking of Biomedical Concepts from a Large Ontology

Tools to explore scientific literature are essential for scientists, especially in biomedicine, where about a million new papers are published every year. Many such tools provide users the ability to search for specific entities (e.g.…

Computation and Language · Computer Science 2021-07-05 Sunil Mohan , Rico Angell , Nick Monath , Andrew McCallum

Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study

As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all…

Genomics · Quantitative Biology 2023-02-09 Roman Hornung , Frederik Ludwigs , Jonas Hagenberg , Anne-Laure Boulesteix

A Framework for Implementing Machine Learning on Omics Data

The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning…

Machine Learning · Computer Science 2018-11-28 Geoffroy Dubourg-Felonneau , Timothy Cannings , Fergal Cotter , Hannah Thompson , Nirmesh Patel , John W Cassidy , Harry W Clifford

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

The development of vision-language models (VLMs) is driven by large-scale and diverse multimodal datasets. However, progress toward generalist biomedical VLMs is limited by the lack of annotated, publicly accessible datasets across biology…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Alejandro Lozano , Min Woo Sun , James Burgess , Liangyu Chen , Jeffrey J Nirschl , Jeffrey Gu , Ivan Lopez , Josiah Aklilu , Austin Wolfgang Katzer , Collin Chiu , Anita Rau , Xiaohan Wang , Yuhui Zhang , Alfred Seunghoon Song , Robert Tibshirani , Serena Yeung-Levy

Agentic publications: redesigning scientific publishing in the age of thinking large language models

Purpose: This paper introduces the concept of "Agentic Publication," a novel LLM-driven framework designed to complement traditional scientific publishing by transforming papers into interactive knowledge systems that address challenges…

Artificial Intelligence · Computer Science 2026-05-06 Roberto Pugliese , George Kourousias , Francesco Venier , Grazia Garlatti Costa

Omics-scale polymer computational database transferable to real-world artificial intelligence applications

Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural language processing, materials science,…

Chemical Physics · Physics 2025-11-18 Ryo Yoshida , Yoshihiro Hayashi , Hidemine Furuya , Ryohei Hosoya , Kazuyoshi Kaneko , Hiroki Sugisawa , Yu Kaneko , Aiko Takahashi , Yoh Noguchi , Shun Nanjo , Keiko Shinoda , Tomu Hamakawa , Mitsuru Ohno , Takuya Kitamura , Misaki Yonekawa , Stephen Wu , Masato Ohnishi , Chang Liu , Teruki Tsurimoto , Arifin , Araki Wakiuchi , Kohei Noda , Junko Morikawa , Teruaki Hayakawa , Junichiro Shiomi , Masanobu Naito , Kazuya Shiratori , Tomoki Nagai , Norio Tomotsu , Hiroto Inoue , Ryuichi Sakashita , Masashi Ishii , Isao Kuwajima , Kenji Furuichi , Norihiko Hiroi , Yuki Takemoto , Takahiro Ohkuma , Keita Yamamoto , Naoya Kowatari , Masato Suzuki , Naoya Matsumoto , Seiryu Umetani , Hisaki Ikebata , Yasuyuki Shudo , Mayu Nagao , Shinya Kamada , Kazunori Kamio , Taichi Shomura , Kensaku Nakamura , Yudai Iwamizu , Atsutoshi Abe , Koki Yoshitomi , Yuki Horie , Katsuhiko Koike , Koichi Iwakabe , Shinya Gima , Kota Usui , Gikyo Usuki , Takuro Tsutsumi , Keitaro Matsuoka , Kazuki Sada , Masahiro Kitabata , Takuma Kikutsuji , Akitaka Kamauchi , Yusuke Iijima , Tsubasa Suzuki , Takenori Goda , Yuki Takabayashi , Kazuko Imai , Yuji Mochizuki , Hideo Doi , Koji Okuwaki , Hiroya Nitta , Taku Ozawa , Hitoshi Kamijima , Toshiaki Shintani , Takuma Mitamura , Massimiliano Zamengo , Yuitsu Sugami , Seiji Akiyama , Yoshinari Murakami , Atsushi Betto , Naoya Matsuo , Satoru Kagao , Tetsuya Kobayashi , Norie Matsubara , Shosei Kubo , Yuki Ishiyama , Yuri Ichioka , Mamoru Usami , Satoru Yoshizaki , Seigo Mizutani , Yosuke Hanawa , Shogo Kunieda , Mitsuru Yambe , Takeru Nakamura , Hiromori Murashima , Kenji Takahashi , Naoki Wada , Masahiro Kawano , Yosuke Harada , Takehiro Fujita , Erina Fujita , Ryoji Himeno , Hiori Kino , Kenji Fukumizu

Omics-driven hybrid dynamic modeling of bioprocesses with uncertainty estimation

This work presents an omics-driven modeling pipeline that integrates machine-learning tools to facilitate the dynamic modeling of multiscale biological systems. Random forests and permutation feature importance are proposed to mine omics…

Quantitative Methods · Quantitative Biology 2025-01-17 Sebastián Espinel-Ríos , José Montaño López , José L. Avalos

Patience is all you need! An agentic system for performing scientific literature review

Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines. The models on their own have already shown promise for answering basic questions, however fail quickly where…

Information Retrieval · Computer Science 2025-04-15 David Brett , Anniek Myatt

OpenLens AI: Fully Autonomous Research Agent for Health Infomatics

Health informatics research is characterized by diverse data modalities, rapid knowledge expansion, and the need to integrate insights across biomedical science, data analytics, and clinical practice. These characteristics make it…

Artificial Intelligence · Computer Science 2025-09-24 Yuxiao Cheng , Jinli Suo

Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training

Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training…

Computation and Language · Computer Science 2025-12-19 Meng Xiao , Xunxin Cai , Qingqing Long , Chengrui Wang , Yuanchun Zhou , Hengshu Zhu

Leveraging Large Language Models for Automated Scalable Development of Open Scientific Databases

With the exponential increase in online scientific literature, identifying reliable domain-specific data has become increasingly important but also very challenging. Manual data collection and filtering for domain-specific scientific…

Information Retrieval · Computer Science 2026-03-10 Nikita Gautam , Doina Caragea , Ignacio Ciampitti , Federico Gomez

Ontology-aligned structuring and reuse of multimodal materials data and workflows towards automatic reproduction

Reproducibility of computational results remains a challenge in materials science, as simulation workflows and parameters are often reported only in unstructured text and tables. While literature data are valuable for validation and reuse,…

Materials Science · Physics 2026-01-21 Sepideh Baghaee Ravari , Abril Azocar Guzman , Sarath Menon , Stefan Sandfeld , Tilmann Hickel , Markus Stricker

Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking

Pre-trained language models (PLMs) have proven to be effective for document re-ranking task. However, they lack the ability to fully interpret the semantics of biomedical and health-care queries and often rely on simplistic patterns for…

Computation and Language · Computer Science 2023-05-09 Deepak Gupta , Dina Demner-Fushman

Methods to Expand Cell Signaling Models using Automated Reading and Model Checking

Biomedical research results are being published at a high rate, and with existing search engines, the vast amount of published work is usually easily accessible. However, reproducing published results, either experimental data or…

Molecular Networks · Quantitative Biology 2017-06-19 Kai-Wen Liang , Qinsi Wang , Cheryl Telmer , Divyaa Ravichandran , Peter Spirtes , Natasa Miskov-Zivanov

OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning

Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language…

Genomics · Quantitative Biology 2026-05-11 Maciej Sypetkowski , Joanna Krawczyk , Łukasz Smoliński , Remigiusz Kinas , Przemysław Pietrzak , Tomasz Jetka , Rafał Powalski

Material Database Agent: A Multimodal Agentic Framework for Scientific Literature Mining

Materials science workflows rely on structured and unstructured data from the vast body of available scientific literature. However, most of the experimental details remain buried in text, tables, graphs and figures. Thus, constructing…

Computation and Language · Computer Science 2026-05-07 Achuth Chandrasekhar , Omid Barati Farimani , Radheesh Sharma Meda , Amir Barati Farimani

COmic: Convolutional Kernel Networks for Interpretable End-to-End Learning on (Multi-)Omics Data

Motivation: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare,…

Quantitative Methods · Quantitative Biology 2023-05-04 Jonas C. Ditz , Bernhard Reuter , Nico Pfeifer

Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science

This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks: structured data querying (using programmatic languages, Python/Pandas) and information extraction from unstructured…

Computation and Language · Computer Science 2026-01-29 Juan Jose Rubio Jan , Jack Wu , Julia Ive

Agents of Discovery

The substantial data volumes encountered in modern particle physics and other domains of fundamental physics research allow (and require) the use of increasingly complex data analysis tools and workflows. While the use of machine learning…

High Energy Physics - Phenomenology · Physics 2026-02-18 Sascha Diefenbacher , Anna Hallin , Gregor Kasieczka , Michael Krämer , Anne Lauscher , Tim Lukas

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su