English
Related papers

Related papers: Principles for data analysis workflows

200 papers

It is recommended that teacher-scholars of data science adopt reproducible workflows in their research as scholars and teach reproducible workflows to their students. In this paper, we propose a third dimension to reproducibility practices…

Other Statistics · Statistics 2024-07-23 Mine Dogucu , Mine Cetinkaya-Rundel

With the advent of open source software, a veritable treasure trove of previously proprietary software development data was made available. This opened the field of empirical software engineering research to anyone in academia. Data that is…

Software Engineering · Computer Science 2022-04-19 Adam Tutko , Austin Z. Henley , Audris Mockus

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up…

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

The data science revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design…

Methodology · Statistics 2023-05-24 Lucy D'Agostino McGowan , Roger D. Peng , Stephanie C. Hicks

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this…

Databases · Computer Science 2023-07-26 Rebecca C. Steorts

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

It is important for researchers to understand precisely how data scientists turn raw data into insights, including typical programming patterns, workflow, and methodology. This paper contributes a novel system, called DataInquirer, that…

Human-Computer Interaction · Computer Science 2024-05-29 Jinjin Zhao , Avidgor Gal , Sanjay Krishnan

Reproducibility is a confused terminology. In this paper, I take a fundamental view on reproducibility rooted in the scientific method. The scientific method is analysed and characterised in order to develop the terminology required to…

Machine Learning · Computer Science 2022-01-19 Odd Erik Gundersen

The high incidence of irreproducible research has led to urgent appeals for transparency and equitable practices in open science. For the scientific disciplines that rely on computationally intensive analyses of large data sets, a granular…

In the data science courses at the University of British Columbia, we define data science as the study, development and practice of reproducible and auditable processes to obtain insight from data. While reproducibility is core to our…

Computers and Society · Computer Science 2022-07-26 Joel Ostblom , Tiffany Timbers

Increasingly larger number of software systems today are including data science components for descriptive, predictive, and prescriptive analytics. The collection of data science stages from acquisition, to cleaning/curation, to modeling,…

Software Engineering · Computer Science 2022-02-15 Sumon Biswas , Mohammad Wardat , Hridesh Rajan

Process mining offers techniques to exploit event data by providing insights and recommendations to improve business processes. The growing amount of algorithms for process discovery has raised the question of which algorithms perform best…

Software Engineering · Computer Science 2018-06-20 Toon Jouck , Alfredo Bolt , Benoît Depaire , Massimiliano de Leoni , Wil M. P. van der Aalst

One of the foundations of science is that researchers must publish the methodology used to achieve their results so that others can attempt to reproduce them. This has the added benefit of allowing methods to be adopted and adapted for…

Databases · Computer Science 2014-06-05 Paolo Missier , Simon Woodman , Hugo Hiden , Paul Watson

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset…

Machine Learning · Computer Science 2023-07-18 Murad Tukan , Alaa Maalouf , Margarita Osadchy

How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data…

Human-Computer Interaction · Computer Science 2019-11-05 Kanit Wongsuphasawat , Yang Liu , Jeffrey Heer

Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no…

Human-Computer Interaction · Computer Science 2016-07-19 Jean-Daniel Fekete , Romain Primet

As data have become more prevalent in academia, industry, and daily life, it is imperative that undergraduate students are equipped with the skills needed to analyze data in the modern environment. In recent years there has been a lot of…

Other Statistics · Statistics 2023-07-10 Maria Tackett

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large…

Machine Learning · Computer Science 2023-02-28 Malik Hassanaly , Bruce A. Perry , Michael E. Mueller , Shashank Yellapantula
‹ Prev 1 2 3 10 Next ›