Related papers: Data Makes Better Data Scientists
It is important for researchers to understand precisely how data scientists turn raw data into insights, including typical programming patterns, workflow, and methodology. This paper contributes a novel system, called DataInquirer, that…
By bringing together code, text, and examples, Jupyter notebooks have become one of the most popular means to produce scientific results in a productive and reproducible way. As many of the notebook authors are experts in their scientific…
Interactive notebooks, such as Jupyter, have revolutionized the field of data science by providing an integrated environment for data, code, and documentation. However, their adoption by robotics researchers and model developers has been…
In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for…
Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data…
Computational notebooks are intended to prioritize the needs of scientists, but little is known about how scientists interact with notebooks, what requirements drive scientists' software development processes, or what tactics scientists use…
The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts…
Background. Jupyter notebooks are one of the main tools used by data scientists. Notebooks include features (configuration scripts, markdown, images, etc.) that make them challenging to analyze compared to traditional software. As a result,…
Many data science students and practitioners don't see the value in making time to learn and adopt good coding practices as long as the code "works". However, code standards are an important part of modern data science practice, and they…
The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning…
Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific…
As scientific work becomes more computational and data intensive, research processes and results become more difficult to interpret and reproduce. In this poster, we show how the Jupyter notebook, a tool originally designed as a free…
We report a user-friendly software environment for battery data science. It is designed to streamline data management, data cleaning, and data analysis to help bridge the gap between the domain expertise of most battery scientists and the…
Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualizations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is…
Machine learning developers frequently use interactive computational notebooks, such as Jupyter notebooks, to host code for data processing and model training. Jupyter notebooks provide a convenient tool for writing machine learning…
In this paper, we detail the integration of Python data analysis into a first-year physics laboratory course, a task accomplished without significant alterations to the existing course structure. We introduced tailored laboratory…
Nowadays, numerous industries have exceptional demand for skills in data science, such as data analysis, data mining, and machine learning. The computational notebook (e.g., Jupyter Notebook) is a well-known data science tool adopted in…
In this article we describe how we successfully incorporated data analysis in Python in a first-year laboratory course without significantly altering the course structure and without overburdening students. We show how we created and used…
Software developers use metrics to evaluate code quality and productivity, but these practices are still rare in programming education. This project bridges the gap by collecting real-time learning analytics from individual student and…
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how…