Related papers: nuts-flow/ml: data pre-processing for deep learnin…

Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping

Machine learning (ML) research and application often involve time-consuming steps such as model architecture prototyping, feature selection, and dataset preparation. To support these tasks, we introduce the Deep Fast Machine Learning Utils…

Machine Learning · Computer Science 2024-09-17 Fabi Prezja

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications…

Machine Learning · Computer Science 2018-10-24 Brandon Schoenfeld , Christophe Giraud-Carrier , Mason Poggemann , Jarom Christensen , Kevin Seppi

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either…

Machine Learning · Computer Science 2023-04-19 Ping Gong , Yuxin Ma , Cheng Li , Xiaosong Ma , Sam H. Noh

Preprocessing Methods and Pipelines of Data Mining: An Overview

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

A Study of Deep Learning for Network Traffic Data Forecasting

We present a study of deep learning applied to the domain of network traffic data forecasting. This is a very important ingredient for network traffic engineering, e.g., intelligent routing, which can optimize network performance,…

Machine Learning · Computer Science 2019-09-13 Benedikt Pfülb , Christoph Hardegen , Alexander Gepperth , Sebastian Rieger

Continuous Deep Learning: A Workflow to Bring Models into Production

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall

Automated data processing and feature engineering for deep learning and big data applications: a survey

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of…

Machine Learning · Computer Science 2024-03-20 Alhassan Mumuni , Fuseini Mumuni

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…

Machine Learning · Computer Science 2022-03-28 Alexander Isenko , Ruben Mayer , Jeffrey Jedele , Hans-Arno Jacobsen

DiffML: End-to-end Differentiable ML Pipelines

In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also…

Databases · Computer Science 2022-07-06 Benjamin Hilprecht , Christian Hammacher , Eduardo Reis , Mohamed Abdelaal , Carsten Binnig

Two-stage Optimization for Machine Learning Workflow

Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging…

Machine Learning · Computer Science 2019-07-02 Alexandre Quemy

Machine Learning Operations: A Survey on MLOps Tool Support

Machine Learning (ML) has become a fast-growing, trending approach in solution development in practice. Deep Learning (DL) which is a subset of ML, learns using deep neural networks to simulate the human brain. It trains machines to learn…

Software Engineering · Computer Science 2022-02-23 Nipuni Hewage , Dulani Meedeniya

Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models

Data Pipeline plays an indispensable role in tasks such as modeling machine learning and developing data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an…

Machine Learning · Computer Science 2024-02-21 Jiang Wu , Hongbo Wang , Chunhe Ni , Chenwei Zhang , Wenran Lu

RankML: a Meta Learning-Based Approach for Pre-Ranking Machine Learning Pipelines

The explosion of digital data has created multiple opportunities for organizations and individuals to leverage machine learning (ML) to transform the way they operate. However, the shortage of experts in the field of machine learning --…

Machine Learning · Computer Science 2019-11-21 Doron Laadan , Roman Vainshtein , Yarden Curiel , Gilad Katz , Lior Rokach

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex…

Machine Learning · Computer Science 2021-02-25 Derek G. Murray , Jiri Simsa , Ana Klimovic , Ihor Indyk

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Machine learning (ML) offers powerful methods for detecting and modeling associations often in data with large feature spaces and complex associations. Many useful tools/packages (e.g. scikit-learn) have been developed to make the various…

Machine Learning · Computer Science 2022-06-27 Ryan J. Urbanowicz , Robert Zhang , Yuhan Cui , Pranshu Suri

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

Despite strong results on many tasks, multimodal large language models (MLLMs) still underperform on visual mathematical problem solving, especially in reliably perceiving and interpreting diagrams. Inspired by human problem-solving, we…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Shuhang Chen , Hangjie Yuan , Yunqiu Xu , Pengwei Liu , Tao Feng , Jun Cen , Zeying Huang , Yi Yang

Tailoring Machine Learning for Process Mining

Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on…

Machine Learning · Computer Science 2024-02-21 Paolo Ceravolo , Sylvio Barbon Junior , Ernesto Damiani , Wil van der Aalst

Towards A Domain-Customized Automated Machine Learning Framework For Networks and Systems

Clouds gather a vast volume of telemetry from their networked systems which contain valuable information that can help solve many of the problems that continue to plague them. However, it is hard to extract useful information from such raw…

Networking and Internet Architecture · Computer Science 2020-04-28 Behnaz Arzani , Bita Rouhani

AutoFlow: Learning a Better Training Set for Optical Flow

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Deqing Sun , Daniel Vlasic , Charles Herrmann , Varun Jampani , Michael Krainin , Huiwen Chang , Ramin Zabih , William T. Freeman , Ce Liu

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

Large language models (LLMs) rely on pretraining on massive and heterogeneous corpora, where training data composition has a decisive impact on training efficiency and downstream generalization under realistic compute and data budget…

Computation and Language · Computer Science 2026-04-21 Zhuo Chen , Yuxuan Miao , Supryadi , Deyi Xiong