Related papers: Enabling Collaborative Data Science Development wi…

Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process

The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces…

Software Engineering · Computer Science 2022-02-14 Nadia Nahar , Shurui Zhou , Grace Lewis , Christian Kästner

BEAT: An Open-Source Web-Based Open-Science Platform

With the increased interest in computational sciences, machine learning (ML), pattern recognition (PR) and big data, governmental agencies, academia and manufacturers are overwhelmed by the constant influx of new algorithms and techniques…

Software Engineering · Computer Science 2017-07-28 André Anjos , Laurent El-Shafey , Sébastien Marcel

Meeting in the notebook: a notebook-based environment for micro-submissions in data science collaborations

Developers in data science and other domains frequently use computational notebooks to create exploratory analyses and prototype models. However, they often struggle to incorporate existing software engineering tooling into these…

Human-Computer Interaction · Computer Science 2021-03-30 Micah J. Smith , Jürgen Cito , Kalyan Veeramachaneni

Data Science through the looking glass and what we found there

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS)…

Machine Learning · Computer Science 2019-12-23 Fotis Psallidas , Yiwen Zhu , Bojan Karlas , Matteo Interlandi , Avrilia Floratou , Konstantinos Karanasos , Wentao Wu , Ce Zhang , Subru Krishnan , Carlo Curino , Markus Weimer

Data Engineering for Everyone

Data engineering is one of the fastest-growing fields within machine learning (ML). As ML becomes more common, the appetite for data grows more ravenous. But ML requires more data than individual teams of data engineers can readily produce,…

Machine Learning · Computer Science 2021-02-24 Vijay Janapa Reddi , Greg Diamos , Pete Warden , Peter Mattson , David Kanter

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving…

Databases · Computer Science 2014-09-03 Anant Bhardwaj , Souvik Bhattacherjee , Amit Chavan , Amol Deshpande , Aaron J. Elmore , Samuel Madden , Aditya G. Parameswaran

A Datalake for Data-driven Social Science Research

Social science research increasingly demands data-driven insights, yet researchers often face barriers such as lack of technical expertise, inconsistent data formats, and limited access to reliable datasets.Social science research…

Databases · Computer Science 2025-12-03 Puneet Arya , Ojas Sahasrabudhe , Adwaiya Srivastav , Partha Pratim Das , Maya Ramanath

Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks

There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-29 Vivekanandan Balasubramanian , Antons Treikalis , Ole Weidner , Shantenu Jha

Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems

Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by…

Software Engineering · Computer Science 2024-07-23 Gabriel Busquim , Allysson Allex Araújo , Maria Julia Lima , Marcos Kalinowski

Human-Machine Collaboration for Democratizing Data Science

Everybody wants to analyse their data, but only few posses the data science expertise to to this. Motivated by this observation we introduce a novel framework and system \textsc{VisualSynth} for human-machine collaboration in data science.…

Artificial Intelligence · Computer Science 2020-04-24 Clément Gautrais , Yann Dauxais , Stefano Teso , Samuel Kolb , Gust Verbruggen , Luc De Raedt

Editorial: Special Issue on Collaborative Aspects of Open Data in Software EngineeringJohan

High-quality data has become increasingly important to software engineers in designing and implementing today's software, for example, as an input to machine-learning algorithms and visualisation- and analytics-based features. Open data -…

Software Engineering · Computer Science 2022-08-02 Johan Linåker , Per Runeson , Anneke Zuiderwijk , Amanda Brock

A Case for Dataset Specific Profiling

Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets. With modern machine learning frameworks, anyone can develop and execute…

Machine Learning · Computer Science 2022-08-09 Seth Ockerman , John Wu , Christopher Stewart

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

We present the Open MatSci ML Toolkit: a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data with a specific focus on materials science and the OpenCatalyst Dataset. Our…

Machine Learning · Computer Science 2023-09-01 Santiago Miret , Kin Long Kelvin Lee , Carmelo Gonzales , Marcel Nassar , Matthew Spellings

Data Science Methodologies: Current Challenges and Future Approaches

Data science has employed great research efforts in developing advanced analytics, improving data models and cultivating new algorithms. However, not many authors have come across the organizational and socio-technical challenges that arise…

Machine Learning · Computer Science 2022-01-17 Iñigo Martinez , Elisabeth Viles , Igor G. Olaizola

A Vision on Open Science for the Evolution of Software Engineering Research and Practice

Open Science aims to foster openness and collaboration in research, leading to more significant scientific and social impact. However, practicing Open Science comes with several challenges and is currently not properly rewarded. In this…

Software Engineering · Computer Science 2024-05-21 Edson OliveiraJr , Fernanda Madeiral , Alcemir Rodrigues Santos , Christina von Flach , Sergio Soares

CateCom: a practical data-centric approach to categorization of computational models

The advent of data-driven science in the 21st century brought about the need for well-organized structured data and associated infrastructure able to facilitate the applications of Artificial Intelligence and Machine Learning. We present an…

Databases · Computer Science 2022-03-03 Alexander Zech , Timur Bazhirov

The Network of Scientific Collaborations within the European Framework Programme

We use the emergent field of Complex Networks to analyze the network of scientific collaborations between entities (universities, research organizations, industry related companies,...) which collaborate in the context of the so-called…

Data Analysis, Statistics and Probability · Physics 2009-01-23 Juan A. Almendral , Joao G. Oliveira , L. López , J. F. F. Mendes , Miguel A. F. Sanjuán

On the Interaction between Software Engineers and Data Scientists when building Machine Learning-Enabled Systems

In recent years, Machine Learning (ML) components have been increasingly integrated into the core systems of organizations. Engineering such systems presents various challenges from both a theoretical and practical perspective. One of the…

Software Engineering · Computer Science 2024-02-09 Gabriel Busquim , Hugo Villamizar , Maria Julia Lima , Marcos Kalinowski

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

Large language models (LLMs) have achieved remarkable progress across domains and applications but face challenges such as high fine-tuning costs, inference latency, limited edge deployability, and reliability concerns. Small language…

Computation and Language · Computer Science 2025-11-06 Fali Wang , Jihai Chen , Shuhua Yang , Ali Al-Lawati , Linli Tang , Hui Liu , Suhang Wang

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data…

Artificial Intelligence · Computer Science 2024-11-07 Ziming Li , Qianbo Zang , David Ma , Jiawei Guo , Tuney Zheng , Minghao Liu , Xinyao Niu , Yue Wang , Jian Yang , Jiaheng Liu , Wanjun Zhong , Wangchunshu Zhou , Wenhao Huang , Ge Zhang