English

Data Makes Better Data Scientists

Human-Computer Interaction 2024-05-29 v1

Abstract

With the goal of identifying common practices in data science projects, this paper proposes a framework for logging and understanding incremental code executions in Jupyter notebooks. This framework aims to allow reasoning about how insights are generated in data science and extract key observations into best data science practices in the wild. In this paper, we show an early prototype of this framework and ran an experiment to log a machine learning project for 25 undergraduate students.

Keywords

Cite

@article{arxiv.2405.17690,
  title  = {Data Makes Better Data Scientists},
  author = {Jinjin Zhao and Avidgor Gal and Sanjay Krishnan},
  journal= {arXiv preprint arXiv:2405.17690},
  year   = {2024}
}