Related papers: DataPrep.EDA: Task-Centric Exploratory Data Analys…

SmartEDA: An R Package for Automated Exploratory Data Analysis

This paper introduces SmartEDA, which is an R package for performing Exploratory data analysis (EDA). EDA is generally the first step that one needs to perform before developing any machine learning or statistical models. The goal of EDA is…

Computation · Statistics 2020-08-10 Sayan Putatunda , Kiran Rama , Dayananda Ubrangala , Ravi Kondapalli

EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In-Situ Code Search and Recommendation

Using computational notebooks (e.g., Jupyter Notebook), data scientists rationalize their exploratory data analysis (EDA) based on their prior experience and external knowledge such as online examples. For novices or data scientists who…

Human-Computer Interaction · Computer Science 2021-12-16 Xingjun Li , Yizhi Zhang , Justin Leung , Chengnian Sun , Jian Zhao

Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Exploratory data analysis (EDA) is a vital procedure for data science projects. In this work, we introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA. By exploiting the SEPs…

Machine Learning · Computer Science 2023-06-08 Yuxuan Song , Yongyu Wang

Human-Guided Data Exploration

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative…

Machine Learning · Statistics 2018-04-11 Andreas Henelius , Emilia Oikarinen , Kai Puolamäki

Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study

How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data…

Human-Computer Interaction · Computer Science 2019-11-05 Kanit Wongsuphasawat , Yang Liu , Jeffrey Heer

Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Exploratory data analysis (EDA), coupled with SQL, is essential for data analysts involved in data exploration and analysis. However, data analysts often encounter two primary challenges: (1) the need to craft SQL queries skillfully, and…

Databases · Computer Science 2025-07-29 Jun-Peng Zhu , Boyan Niu , Peng Cai , Zheming Ni , Jianwei Wan , Kai Xu , Jiajun Huang , Shengbo Ma , Bing Wang , Xuan Zhou , Guanglei Bao , Donghui Zhang , Liu Tang , Qi Liu

Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach

Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit…

Machine Learning · Statistics 2021-11-08 Kai Puolamäki , Emilia Oikarinen , Bo Kang , Jefrey Lijffijt , Tijl De Bie

Automated Question Generation on Tabular Data for Conversational Data Exploration

Exploratory data analysis (EDA) is an essential step for analyzing a dataset to derive insights. Several EDA techniques have been explored in the literature. Many of them leverage visualizations through various plots. But it is not easy to…

Computation and Language · Computer Science 2024-07-19 Ritwik Chaudhuri , Rajmohan C , Kirushikesh DB , Arvind Agarwal

PyTond: Efficient Python Data Science on the Shoulders of Databases

Python data science libraries such as Pandas and NumPy have recently gained immense popularity. Although these libraries are feature-rich and easy to use, their scalability limitations require more robust computational resources. In this…

Databases · Computer Science 2024-07-17 Hesam Shahrokhi , Amirali Kaboli , Mahdi Ghorbani , Amir Shaikhha

DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender Systems

Recommender systems have demonstrated significant impact across diverse domains, yet ensuring the reproducibility of experimental findings remains a persistent challenge. A primary obstacle lies in the fragmented and often opaque data…

Information Retrieval · Computer Science 2025-04-08 Alberto Carlo Maria Mancino , Salvatore Bufi , Angela Di Fazio , Antonio Ferrara , Daniele Malitesta , Claudio Pomo , Tommaso Di Noia

DataPerf: Benchmarks for Data-Centric AI Development

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the…

Machine Learning · Computer Science 2023-10-16 Mark Mazumder , Colby Banbury , Xiaozhe Yao , Bojan Karlaš , William Gaviria Rojas , Sudnya Diamos , Greg Diamos , Lynn He , Alicia Parrish , Hannah Rose Kirk , Jessica Quaye , Charvi Rastogi , Douwe Kiela , David Jurado , David Kanter , Rafael Mosquera , Juan Ciro , Lora Aroyo , Bilge Acun , Lingjiao Chen , Mehul Smriti Raje , Max Bartolo , Sabri Eyuboglu , Amirata Ghorbani , Emmett Goodman , Oana Inel , Tariq Kane , Christine R. Kirkpatrick , Tzu-Sheng Kuo , Jonas Mueller , Tristan Thrush , Joaquin Vanschoren , Margaret Warren , Adina Williams , Serena Yeung , Newsha Ardalani , Praveen Paritosh , Lilith Bat-Leah , Ce Zhang , James Zou , Carole-Jean Wu , Cody Coleman , Andrew Ng , Peter Mattson , Vijay Janapa Reddi

DataSist: A Python-based library for easy data analysis, visualization and modeling

A large amount of data is produced every second from modern information systems such as mobile devices, the world wide web, Internet of Things, social media, etc. Analysis and mining of this massive data requires a lot of advanced tools and…

Machine Learning · Computer Science 2020-01-13 Rising Odegua , Festus Ikpotokin

Event Detection Explorer: An Interactive Tool for Event Detection Exploration

Event Detection (ED) is an important task in natural language processing. In the past few years, many datasets have been introduced for advancing ED machine learning models. However, most of these datasets are under-explored because not…

Computation and Language · Computer Science 2022-04-27 Wenlong Zhang , Bhagyashree Ingale , Hamza Shabir , Tianyi Li , Tian Shi , Ping Wang

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark…

Computation and Language · Computer Science 2025-12-05 Fangyu Lei , Jinxiang Meng , Yiming Huang , Junjie Zhao , Yitong Zhang , Jianwen Luo , Xin Zou , Ruiyi Yang , Wenbo Shi , Yan Gao , Shizhu He , Zuo Wang , Qian Liu , Yang Wang , Ke Wang , Jun Zhao , Kang Liu

The Landscape of R Packages for Automated Exploratory Data Analysis

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is…

Computation · Statistics 2019-09-19 Mateusz Staniak , Przemyslaw Biecek

PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments

This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit…

Mathematical Software · Computer Science 2023-12-20 Abhijit Chowdhary , Shady E. Ahmed , Ahmed Attia

Urania: Visualizing Data Analysis Pipelines for Natural Language-Based Data Exploration

Exploratory Data Analysis (EDA) is an essential yet tedious process for examining a new dataset. To facilitate it, natural language interfaces (NLIs) can help people intuitively explore the dataset via data-oriented questions. However,…

Human-Computer Interaction · Computer Science 2023-06-14 Yi Guo , Nan Cao , Xiaoyu Qi , Haoyang Li , Danqing Shi , Jing Zhang , Qing Chen , Daniel Weiskopf

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes

We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the…

Machine Learning · Statistics 2012-10-30 Mark A. Kon , Nikolay Nikolaev

Scaling Up Estimation of Distribution Algorithms For Continuous Optimization

Since Estimation of Distribution Algorithms (EDA) were proposed, many attempts have been made to improve EDAs' performance in the context of global optimization. So far, the studies or applications of multivariate probabilistic model based…

Neural and Evolutionary Computing · Computer Science 2011-11-10 Weishan Dong , Tianshi Chen , Peter Tino , Xin Yao