English
Related papers

Related papers: DataPrep.EDA: Task-Centric Exploratory Data Analys…

200 papers

This paper introduces SmartEDA, which is an R package for performing Exploratory data analysis (EDA). EDA is generally the first step that one needs to perform before developing any machine learning or statistical models. The goal of EDA is…

Computation · Statistics 2020-08-10 Sayan Putatunda , Kiran Rama , Dayananda Ubrangala , Ravi Kondapalli

Using computational notebooks (e.g., Jupyter Notebook), data scientists rationalize their exploratory data analysis (EDA) based on their prior experience and external knowledge such as online examples. For novices or data scientists who…

Human-Computer Interaction · Computer Science 2021-12-16 Xingjun Li , Yizhi Zhang , Justin Leung , Chengnian Sun , Jian Zhao

Exploratory data analysis (EDA) is a vital procedure for data science projects. In this work, we introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA. By exploiting the SEPs…

Machine Learning · Computer Science 2023-06-08 Yuxuan Song , Yongyu Wang

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative…

Machine Learning · Statistics 2018-04-11 Andreas Henelius , Emilia Oikarinen , Kai Puolamäki

How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data…

Human-Computer Interaction · Computer Science 2019-11-05 Kanit Wongsuphasawat , Yang Liu , Jeffrey Heer

Exploratory data analysis (EDA), coupled with SQL, is essential for data analysts involved in data exploration and analysis. However, data analysts often encounter two primary challenges: (1) the need to craft SQL queries skillfully, and…

Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit…

Machine Learning · Statistics 2021-11-08 Kai Puolamäki , Emilia Oikarinen , Bo Kang , Jefrey Lijffijt , Tijl De Bie

Exploratory data analysis (EDA) is an essential step for analyzing a dataset to derive insights. Several EDA techniques have been explored in the literature. Many of them leverage visualizations through various plots. But it is not easy to…

Computation and Language · Computer Science 2024-07-19 Ritwik Chaudhuri , Rajmohan C , Kirushikesh DB , Arvind Agarwal

Python data science libraries such as Pandas and NumPy have recently gained immense popularity. Although these libraries are feature-rich and easy to use, their scalability limitations require more robust computational resources. In this…

Databases · Computer Science 2024-07-17 Hesam Shahrokhi , Amirali Kaboli , Mahdi Ghorbani , Amir Shaikhha

Recommender systems have demonstrated significant impact across diverse domains, yet ensuring the reproducibility of experimental findings remains a persistent challenge. A primary obstacle lies in the fragmented and often opaque data…

A large amount of data is produced every second from modern information systems such as mobile devices, the world wide web, Internet of Things, social media, etc. Analysis and mining of this massive data requires a lot of advanced tools and…

Machine Learning · Computer Science 2020-01-13 Rising Odegua , Festus Ikpotokin

Event Detection (ED) is an important task in natural language processing. In the past few years, many datasets have been introduced for advancing ED machine learning models. However, most of these datasets are under-explored because not…

Computation and Language · Computer Science 2022-04-27 Wenlong Zhang , Bhagyashree Ingale , Hamza Shabir , Tianyi Li , Tian Shi , Ping Wang

Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark…

Computation and Language · Computer Science 2025-12-05 Fangyu Lei , Jinxiang Meng , Yiming Huang , Junjie Zhao , Yitong Zhang , Jianwen Luo , Xin Zou , Ruiyi Yang , Wenbo Shi , Yan Gao , Shizhu He , Zuo Wang , Qian Liu , Yang Wang , Ke Wang , Jun Zhao , Kang Liu

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is…

Computation · Statistics 2019-09-19 Mateusz Staniak , Przemyslaw Biecek

This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit…

Mathematical Software · Computer Science 2023-12-20 Abhijit Chowdhary , Shady E. Ahmed , Ahmed Attia

Exploratory Data Analysis (EDA) is an essential yet tedious process for examining a new dataset. To facilitate it, natural language interfaces (NLIs) can help people intuitively explore the dataset via data-oriented questions. However,…

Human-Computer Interaction · Computer Science 2023-06-14 Yi Guo , Nan Cao , Xiaoyu Qi , Haoyang Li , Danqing Shi , Jing Zhang , Qing Chen , Daniel Weiskopf

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the…

Machine Learning · Statistics 2012-10-30 Mark A. Kon , Nikolay Nikolaev

Since Estimation of Distribution Algorithms (EDA) were proposed, many attempts have been made to improve EDAs' performance in the context of global optimization. So far, the studies or applications of multivariate probabilistic model based…

Neural and Evolutionary Computing · Computer Science 2011-11-10 Weishan Dong , Tianshi Chen , Peter Tino , Xin Yao
‹ Prev 1 2 3 10 Next ›