English
Related papers

Related papers: nuts-flow/ml: data pre-processing for deep learnin…

200 papers

Machine learning (ML) research and application often involve time-consuming steps such as model architecture prototyping, feature selection, and dataset preparation. To support these tasks, we introduce the Deep Fast Machine Learning Utils…

Machine Learning · Computer Science 2024-09-17 Fabi Prezja

Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications…

Machine Learning · Computer Science 2018-10-24 Brandon Schoenfeld , Christophe Giraud-Carrier , Mason Poggemann , Jarom Christensen , Kevin Seppi

In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either…

Machine Learning · Computer Science 2023-04-19 Ping Gong , Yuxin Ma , Cheng Li , Xiaosong Ma , Sam H. Noh

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

We present a study of deep learning applied to the domain of network traffic data forecasting. This is a very important ingredient for network traffic engineering, e.g., intelligent routing, which can optimize network performance,…

Machine Learning · Computer Science 2019-09-13 Benedikt Pfülb , Christoph Hardegen , Alexander Gepperth , Sebastian Rieger

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of…

Machine Learning · Computer Science 2024-03-20 Alhassan Mumuni , Fuseini Mumuni

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…

Machine Learning · Computer Science 2022-03-28 Alexander Isenko , Ruben Mayer , Jeffrey Jedele , Hans-Arno Jacobsen

In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also…

Databases · Computer Science 2022-07-06 Benjamin Hilprecht , Christian Hammacher , Eduardo Reis , Mohamed Abdelaal , Carsten Binnig

Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging…

Machine Learning · Computer Science 2019-07-02 Alexandre Quemy

Machine Learning (ML) has become a fast-growing, trending approach in solution development in practice. Deep Learning (DL) which is a subset of ML, learns using deep neural networks to simulate the human brain. It trains machines to learn…

Software Engineering · Computer Science 2022-02-23 Nipuni Hewage , Dulani Meedeniya

Data Pipeline plays an indispensable role in tasks such as modeling machine learning and developing data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an…

Machine Learning · Computer Science 2024-02-21 Jiang Wu , Hongbo Wang , Chunhe Ni , Chenwei Zhang , Wenran Lu

The explosion of digital data has created multiple opportunities for organizations and individuals to leverage machine learning (ML) to transform the way they operate. However, the shortage of experts in the field of machine learning --…

Machine Learning · Computer Science 2019-11-21 Doron Laadan , Roman Vainshtein , Yarden Curiel , Gilad Katz , Lior Rokach

Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex…

Machine Learning · Computer Science 2021-02-25 Derek G. Murray , Jiri Simsa , Ana Klimovic , Ihor Indyk

Machine learning (ML) offers powerful methods for detecting and modeling associations often in data with large feature spaces and complex associations. Many useful tools/packages (e.g. scikit-learn) have been developed to make the various…

Machine Learning · Computer Science 2022-06-27 Ryan J. Urbanowicz , Robert Zhang , Yuhan Cui , Pranshu Suri

Despite strong results on many tasks, multimodal large language models (MLLMs) still underperform on visual mathematical problem solving, especially in reliably perceiving and interpreting diagrams. Inspired by human problem-solving, we…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Shuhang Chen , Hangjie Yuan , Yunqiu Xu , Pengwei Liu , Tao Feng , Jun Cen , Zeying Huang , Yi Yang

Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on…

Machine Learning · Computer Science 2024-02-21 Paolo Ceravolo , Sylvio Barbon Junior , Ernesto Damiani , Wil van der Aalst

Clouds gather a vast volume of telemetry from their networked systems which contain valuable information that can help solve many of the problems that continue to plague them. However, it is hard to extract useful information from such raw…

Networking and Internet Architecture · Computer Science 2020-04-28 Behnaz Arzani , Bita Rouhani

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Deqing Sun , Daniel Vlasic , Charles Herrmann , Varun Jampani , Michael Krainin , Huiwen Chang , Ramin Zabih , William T. Freeman , Ce Liu

Large language models (LLMs) rely on pretraining on massive and heterogeneous corpora, where training data composition has a decisive impact on training efficiency and downstream generalization under realistic compute and data budget…

Computation and Language · Computer Science 2026-04-21 Zhuo Chen , Yuxuan Miao , Supryadi , Deyi Xiong
‹ Prev 1 2 3 10 Next ›