Related papers: Augmenting data-driven models for energy systems t…

On Designing Data Models for Energy Feature Stores

The digital transformation of the energy infrastructure enables new, data driven, applications often supported by machine learning models. However, domain specific data transformations, pre-processing and management in modern data driven…

Artificial Intelligence · Computer Science 2022-09-12 Gregor Cerar , Blaž Bertalanič , Anže Pirnat , Andrej Čampa , Carolina Fortuna

Feature Engineering for Predictive Modeling using Reinforcement Learning

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given…

Artificial Intelligence · Computer Science 2017-09-22 Udayan Khurana , Horst Samulowitz , Deepak Turaga

Code Generation for Machine Learning using Model-Driven Engineering and SysML

Data-driven engineering refers to systematic data collection and processing using machine learning to improve engineering systems. Currently, the implementation of data-driven engineering relies on fundamental data science and software…

Software Engineering · Computer Science 2024-04-10 Simon Raedler , Matthias Rupp , Eugen Rigger , Stefanie Rinderle-Ma

An Empirical Analysis of Feature Engineering for Predictive Modeling

Machine learning models, such as neural networks, decision trees, random forests, and gradient boosting machines, accept a feature vector, and provide a prediction. These models learn in a supervised fashion where we provide feature vectors…

Machine Learning · Computer Science 2020-11-03 Jeff Heaton

Feature Selection Tutorial with Python Examples

In Machine Learning, feature selection entails selecting a subset of the available features in a dataset to use for model development. There are many motivations for feature selection, it may result in better models, it may provide insight…

Machine Learning · Computer Science 2021-06-14 Padraig Cunningham , Bahavathy Kathirgamanathan , Sarah Jane Delany

Data Complexity-aware Deep Model Performance Forecasting

Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…

Machine Learning · Computer Science 2026-01-06 Yen-Chia Chen , Hsing-Kuo Pao , Hanjuan Huang

Data Engineering for the Analysis of Semiconductor Manufacturing Data

We have analyzed manufacturing data from several different semiconductor manufacturing plants, using decision tree induction software called Q-YIELD. The software generates rules for predicting when a given product should be rejected. The…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions

Before applying data analytics or machine learning to a data set, a vital step is usually the construction of an informative set of features from the data. In this paper, we present SMARTFEAT, an efficient automated feature engineering tool…

Databases · Computer Science 2024-12-17 Yin Lin , Bolin Ding , H. V. Jagadish , Jingren Zhou

Enhancing Regression Models for Complex Systems Using Evolutionary Techniques for Feature Engineering

This work proposes an automatic methodology for modeling complex systems. Our methodology is based on the combination of Grammatical Evolution and classical regression to obtain an optimal set of features that take part of a linear and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Patricia Arroba , José L. Risco-Martín , Marina Zapater , José M. Moya , José L. Ayala

Automated data processing and feature engineering for deep learning and big data applications: a survey

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of…

Machine Learning · Computer Science 2024-03-20 Alhassan Mumuni , Fuseini Mumuni

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

An Integrated Data Processing Framework for Pretraining Foundation Models

The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data. In order to improve data quality, researchers and practitioners often have to manually curate datasets from difference sources…

Machine Learning · Computer Science 2024-04-24 Yiding Sun , Feng Wang , Yutao Zhu , Wayne Xin Zhao , Jiaxin Mao

Embedded Constrained Feature Construction for High-Energy Physics Data Classification

Before any publication, data analysis of high-energy physics experiments must be validated. This validation is granted only if a perfect understanding of the data and the analysis process is demonstrated. Therefore, physicists prefer using…

Machine Learning · Computer Science 2019-12-18 Noëlie Cherrier , Maxime Defurne , Jean-Philippe Poli , Franck Sabatié

The autofeat Python Library for Automated Feature Engineering and Selection

This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such…

Machine Learning · Computer Science 2020-02-27 Franziska Horn , Robert Pack , Michael Rieger

Prediction-Powered Inference

Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing…

Machine Learning · Statistics 2023-11-10 Anastasios N. Angelopoulos , Stephen Bates , Clara Fannjiang , Michael I. Jordan , Tijana Zrnic

A robust modeling framework for energy analysis of data centers

Global digitalization has given birth to the explosion of digital services in approximately every sector of contemporary life. Applications of artificial intelligence, blockchain technologies, and internet of things are promising to…

Computers and Society · Computer Science 2020-06-15 Nuoa Lei

Feature Selection: A Data Perspective

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature…

Machine Learning · Computer Science 2018-08-28 Jundong Li , Kewei Cheng , Suhang Wang , Fred Morstatter , Robert P. Trevino , Jiliang Tang , Huan Liu

Hybrid Adaptive Modeling using Neural Networks Trained with Nonlinear Dynamics Based Features

Accurate models are essential for design, performance prediction, control, and diagnostics in complex engineering systems. Physics-based models excel during the design phase but often become outdated during system deployment due to changing…

Machine Learning · Computer Science 2025-01-22 Zihan Liu , Prashant N. Kambali , C. Nataraj

An evaluation framework for synthetic data generation models

Nowadays, the use of synthetic data has gained popularity as a cost-efficient strategy for enhancing data augmentation for improving machine learning models performance as well as addressing concerns related to sensitive data privacy.…

Machine Learning · Computer Science 2025-10-27 Ioannis E. Livieris , Nikos Alimpertis , George Domalis , Dimitris Tsakalidis

A Case for Dataset Specific Profiling

Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets. With modern machine learning frameworks, anyone can develop and execute…

Machine Learning · Computer Science 2022-08-09 Seth Ockerman , John Wu , Christopher Stewart