English
Related papers

Related papers: A Static Analysis Framework for Data Science Noteb…

200 papers

Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly…

Software Engineering · Computer Science 2022-09-08 Chenyang Yang , Rachel A Brower-Sinning , Grace A. Lewis , Christian Kästner

Saving, or checkpointing, intermediate results during interactive data exploration can potentially boost user productivity. However, existing studies on this topic are limited, as they primarily rely on small-scale experiments with human…

Human-Computer Interaction · Computer Science 2025-04-03 Hanxi Fang , Supawit Chockchowwat , Hari Sundaram , Yongjoo Park

Background. Jupyter notebooks are one of the main tools used by data scientists. Notebooks include features (configuration scripts, markdown, images, etc.) that make them challenging to analyze compared to traditional software. As a result,…

Software Engineering · Computer Science 2025-07-28 Wenyuan Jiang , Diany Pressato , Harsh Darji , Thibaud Lutellier

It is important for researchers to understand precisely how data scientists turn raw data into insights, including typical programming patterns, workflow, and methodology. This paper contributes a novel system, called DataInquirer, that…

Human-Computer Interaction · Computer Science 2024-05-29 Jinjin Zhao , Avidgor Gal , Sanjay Krishnan

How can we develop visual analytics (VA) tools that can be easily adopted? Visualization researchers have developed a large number of web-based VA tools to help data scientists in a wide range of tasks. However, adopting these standalone…

Human-Computer Interaction · Computer Science 2023-05-16 Zijie J. Wang , David Munechika , Seongmin Lee , Duen Horng Chau

The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning…

Software Engineering · Computer Science 2022-05-25 Luigi Quaranta

How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments.…

Human-Computer Interaction · Computer Science 2022-02-23 Zijie J. Wang , Katie Dai , W. Keith Edwards

Just like other software, spreadsheets can contain significant faults. Static analysis is an accepted and well-established technique in software engineering known for its capability to discover faults. In recent years, a growing number of…

Software Engineering · Computer Science 2014-01-30 Daniel Kulesz , Jan-Peter Ostberg

Data Scientists often use notebooks to develop Data Science (DS) pipelines, particularly since they allow to selectively execute parts of the pipeline. However, notebooks for DS have many well-known flaws. We focus on the following ones in…

Software Engineering · Computer Science 2023-04-10 Lars Reimann , Günter Kniesel-Wünsche

Data exploration is an important aspect of the workflow of mixed-methods researchers, who conduct both qualitative and quantitative analysis. However, there currently exists few tools that adequately support both types of analysis…

Human-Computer Interaction · Computer Science 2024-05-31 Jiawen Stefanie Zhu , Zibo Zhang , Jian Zhao

This paper proposes the use of notebooks for the design documentation and tool interaction in the rigorous design of embedded systems. Conventionally, a notebook is a sequence of cells alternating between (textual) code and prose to form a…

Software Engineering · Computer Science 2018-11-28 Spencer Park , Emil Sekerinski

Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from…

Programming Languages · Computer Science 2026-05-05 Stephen N. Freund , Emery D. Berger , Cormac Flanagan , Eunice Jun

Computational notebooks, which integrate code, documentation, tags, and visualizations into a single document, have become increasingly popular for data analysis tasks. With the advent of immersive technologies, these notebooks have evolved…

Human-Computer Interaction · Computer Science 2025-08-21 Sungwon In , Ayush Roy , Eric Krokos , Kirsten Whitley , Chris North , Yalong Yang

The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts…

Computers and Society · Computer Science 2026-02-18 Manuel Valle Torre , Marcus Specht , Catharine Oertel

Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they…

Human-Computer Interaction · Computer Science 2023-03-27 Frederick Choi , Sajjadur Rahman , Hannah Kim , Dan Zhang

In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for…

Computational notebooks (e.g., Jupyter, Google Colab) are widely used for interactive data science and machine learning. In those frameworks, users can start a session, then execute cells (i.e., a set of statements) to create variables,…

Databases · Computer Science 2025-03-12 Zhaoheng Li , Pranav Gor , Rahul Prabhu , Hui Yu , Yuzhou Mao , Yongjoo Park

Designing a static analysis is generally a substantial undertaking, requiring significant expertise in both program analysis and the domain of the program analysis, and significant development resources. As a result, most program analyses…

Programming Languages · Computer Science 2018-10-17 Colin S. Gordon

Interactive visualization can support fluid exploration but is often limited to predetermined tasks. Scripting can support a vast range of queries but may be more cumbersome for free-form exploration. Embedding interactive visualization in…

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that…

‹ Prev 1 2 3 10 Next ›