English
Related papers

Related papers: A Case for Dataset Specific Profiling

200 papers

Artificial intelligence (AI) provides many opportunities to improve private and public life. Discovering patterns and structures in large troves of data in an automated manner is a core component of data science, and currently drives…

Machine Learning · Computer Science 2020-09-25 Vaishak Belle , Ioannis Papantonis

The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental…

Machine Learning · Computer Science 2021-10-26 Jeyan Thiyagalingam , Mallikarjun Shankar , Geoffrey Fox , Tony Hey

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine…

Quantitative Methods · Quantitative Biology 2015-06-19 Ali Faisal , Jaakko Peltonen , Elisabeth Georgii , Johan Rung , Samuel Kaski

In this work, we reflect on the data-driven modeling paradigm that is gaining ground in AI-driven automation of patient care. We argue that the repurposing of existing real-world patient datasets for machine learning may not always…

Dataset Search -- the process of finding appropriate datasets for a given task -- remains a critical yet under-explored challenge in data science workflows. Assessing dataset suitability for a task (e.g., training a classification model) is…

Human-Computer Interaction · Computer Science 2025-07-28 Rachel Lin , Bhavya Chopra , Wenjing Lin , Shreya Shankar , Madelon Hulsebos , Aditya G. Parameswaran

Machine Science, or Data-driven Research, is a new and interesting scientific methodology that uses advanced computational techniques to identify, retrieve, classify and analyse data in order to generate hypotheses and develop models. In…

Information Retrieval · Computer Science 2010-08-24 T W Kelsey , W H B Wallace

While data science has emerged as a contentious new scientific field, enormous debates and discussions have been made on it why we need data science and what makes it as a science. In reviewing hundreds of pieces of literature which include…

Computers and Society · Computer Science 2020-07-01 Longbing Cao

In this paper we argue that data science is a coherent and novel approach to empirical problems that, in its most general form, does not build understanding about phenomena. Within the new type of mathematization at work in data science,…

Other Statistics · Statistics 2021-03-31 Domenico Napoletani , Marco Panza , Daniele Struppa

Empirical and LLM-based research in model-driven engineering increasingly relies on datasets of software models, for instance, to train or evaluate machine learning techniques for modeling support. These datasets have a significant impact…

Software Engineering · Computer Science 2026-03-06 Philipp-Lorenz Glaser , Lola Burgueño , Dominik Bork

The rapid advancement of large language models has fundamentally shifted the bottleneck in AI development from computational power to data availability-with countless valuable datasets remaining hidden across specialized repositories,…

Artificial Intelligence · Computer Science 2025-08-12 Keyu Li , Mohan Jiang , Dayuan Fu , Yunze Wu , Xiangkun Hu , Dequan Wang , Pengfei Liu

Many real-world scientific processes are governed by complex nonlinear dynamic systems that can be represented by differential equations. Recently, there has been increased interest in learning, or discovering, the forms of the equations…

Methodology · Statistics 2022-10-20 Joshua S. North , Christopher K. Wikle , Erin M. Schliep

Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed…

Information Retrieval · Computer Science 2021-06-08 Basmah Altaf , Shichao Pei , Xiangliang Zhang

Data science is an integrated workflow of technical, analytical, communication, and ethical skills, but current AI benchmarks focus mostly on constituent parts. We test whether AI models can generate end-to-end data science projects. To do…

Other Statistics · Statistics 2026-02-17 Evelyn Hughes , Rohan Alexander

Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range…

Machine Learning · Computer Science 2023-10-04 Priyanshu Burark , Karn Tiwari , Meer Mehran Rashid , Prathosh A P , N M Anoop Krishnan

As we are fast approaching the beginning of a paradigm shift in the field of science, Data driven science (the so called fourth science paradigm) is going to be the driving force in research and innovation. From medicine to biodiversity and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-07 Hrishav Bakul Barua

With the explosion of applications of Data Science, the field is has come loose from its foundations. This article argues for a new program of applied research in areas familiar to researchers in Bayesian methods in AI that are needed to…

Machine Learning · Computer Science 2023-07-04 John Mark Agosta , Robert Horton

Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used to predict global weather patterns days in…

Atmospheric and Oceanic Physics · Physics 2020-12-30 Stephan Rasp , Peter D. Dueben , Sebastian Scher , Jonathan A. Weyn , Soukayna Mouatadid , Nils Thuerey

The advent of data-driven science in the 21st century brought about the need for well-organized structured data and associated infrastructure able to facilitate the applications of Artificial Intelligence and Machine Learning. We present an…

Databases · Computer Science 2022-03-03 Alexander Zech , Timur Bazhirov

Successful data-driven science requires complex data engineering pipelines to clean, transform, and alter data in preparation for machine learning, and robust results can only be achieved when each step in the pipeline can be justified, and…

Databases · Computer Science 2024-04-08 Adriane Chapman , Luca Lauro , Paolo Missier , Riccardo Torlone

Recent research has helped to cultivate growing awareness that machine learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science…

Machine Learning · Statistics 2019-07-23 Brian d'Alessandro , Cathy O'Neil , Tom LaGatta
‹ Prev 1 2 3 10 Next ›