English
Related papers

Related papers: Data-driven Discovery with Large Generative Models

200 papers

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we…

Given the remarkable performance of Large Language Models (LLMs), an important question arises: Can LLMs conduct human-like scientific research and discover new knowledge, and act as an AI scientist? Scientific discovery is an iterative…

Machine Learning · Computer Science 2025-02-24 Tingting Chen , Srinivas Anumasa , Beibei Lin , Vedant Shah , Anirudh Goyal , Dianbo Liu

There is an increasing interest in leveraging Large Language Models (LLMs) for managing structured data and enhancing data science processes. Despite the potential benefits, this integration poses significant questions regarding their…

Artificial Intelligence · Computer Science 2023-11-21 Nathalia Nascimento , Cristina Tavares , Paulo Alencar , Donald Cowan

Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges…

Computation and Language · Computer Science 2025-11-18 Yue Huang , Siyuan Wu , Chujie Gao , Dongping Chen , Qihui Zhang , Yao Wan , Tianyi Zhou , Jianfeng Gao , Chaowei Xiao , Lichao Sun , Xiangliang Zhang

High-quality datasets are typically required for accomplishing data-driven tasks, such as training medical diagnosis models, predicting real-time traffic conditions, or conducting experiments to validate research hypotheses. Consequently,…

Information Retrieval · Computer Science 2025-09-03 Pengyue Li , Sheng Wang , Hua Dai , Zhiyu Chen , Zhifeng Bao , Brian D. Davison

Discovering the governing equations of dynamical systems is a central problem across many scientific disciplines. As experimental data become increasingly available, automated equation discovery methods offer a promising data-driven…

Machine Learning · Computer Science 2026-04-07 Amirmohammad Ziaei Bideh , Jonathan Gryak

Algorithmic discovery has traditionally relied on human ingenuity and extensive experimentation. Here we investigate whether a prominent scientific computing algorithm, the Kalman Filter, can be discovered through an automated, data-driven,…

Neural and Evolutionary Computing · Computer Science 2025-08-26 Vasileios Saketos , Sebastian Kaltenbach , Sergey Litvinov , Petros Koumoutsakos

Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in…

New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data,…

Machine Learning · Statistics 2023-08-04 Genevera I. Allen , Luqin Gan , Lili Zheng

Recommender systems serve as foundational infrastructure in modern information ecosystems, helping users navigate digital content and discover items aligned with their preferences. At their core, recommender systems address a fundamental…

Information Retrieval · Computer Science 2026-05-12 Min Hou , Le Wu , Yuxin Liao , Yonghui Yang , Zhen Zhang , Yu Wang , Changlong Zheng , Han Wu , Richang Hong

Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI…

Computation and Language · Computer Science 2025-09-18 Tianshi Zheng , Zheye Deng , Hong Ting Tsang , Weiqi Wang , Jiaxin Bai , Zihao Wang , Yangqiu Song

Materials discovery and design are essential for advancing technology across various industries by enabling the development of application-specific materials. Recent research has leveraged Large Language Models (LLMs) to accelerate this…

Computation and Language · Computer Science 2025-02-11 Shrinidhi Kumbhar , Venkatesh Mishra , Kevin Coutinho , Divij Handa , Ashif Iquebal , Chitta Baral

Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing…

Human-Computer Interaction · Computer Science 2025-10-03 Maura E Halstead , Mark A. Green , Caroline Jay , Richard Kingston , David Topping , Alexander Singleton

Large Language Models (LLMs) are transforming scientific hypothesis generation and validation by enabling information synthesis, latent relationship discovery, and reasoning augmentation. This survey provides a structured overview of…

Developing the capacity to effectively search for requisite datasets is an urgent requirement to assist data users in identifying relevant datasets considering the very limited available metadata. For this challenge, the utilization of…

Information Retrieval · Computer Science 2024-10-08 Teruaki Hayashi , Hiroki Sakaji , Jiayi Dai , Randy Goebel

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine…

Quantitative Methods · Quantitative Biology 2015-06-19 Ali Faisal , Jaakko Peltonen , Elisabeth Georgii , Johan Rung , Samuel Kaski

In recent years, groundbreaking advancements in natural language processing have culminated in the emergence of powerful large language models (LLMs), which have showcased remarkable capabilities across a vast array of domains, including…

Computation and Language · Computer Science 2023-12-11 Microsoft Research AI4Science , Microsoft Azure Quantum

Discovery of new knowledge is increasingly data-driven, predicated on a team's ability to collaboratively create, find, analyze, retrieve, and share pertinent datasets over the duration of an investigation. This is especially true in the…

Human-Computer Interaction · Computer Science 2021-10-06 Hongsuda Tangmunarunkit , Aref Shafaeibejestan , Joshua Chudy , Karl Czajkowski , Robert Schuler , Carl Kesselman

Machine learning (ML) holds great promise for clinical applications but is often hindered by limited access to high-quality data due to privacy concerns, high costs, and long timelines associated with clinical trials. While large language…

Computation and Language · Computer Science 2026-03-27 Zerui Xu , Fang Wu , Yingzhou Lu , Yuanyuan Zhang , Yue Zhao

Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining.…

‹ Prev 1 2 3 10 Next ›