Related papers: Model Selection Techniques -- An Overview

A Survey on Data Selection for Language Models

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as…

Computation and Language · Computer Science 2024-08-05 Alon Albalak , Yanai Elazar , Sang Michael Xie , Shayne Longpre , Nathan Lambert , Xinyi Wang , Niklas Muennighoff , Bairu Hou , Liangming Pan , Haewon Jeong , Colin Raffel , Shiyu Chang , Tatsunori Hashimoto , William Yang Wang

Optimal subdata selection for linear model selection

If the assumed model does not accurately capture the underlying structure of the data, a statistical method is likely to yield sub-optimal results, and so model selection is crucial in order to conduct any statistical analysis. However, in…

Methodology · Statistics 2023-06-21 Vasilis Chasiotis , Dimitris Karlis

Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review

Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this…

Machine Learning · Computer Science 2019-05-09 Benyamin Ghojogh , Maria N. Samad , Sayema Asif Mashhadi , Tania Kapoor , Wahab Ali , Fakhri Karray , Mark Crowley

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Feature Selection: A Data Perspective

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature…

Machine Learning · Computer Science 2018-08-28 Jundong Li , Kewei Cheng , Suhang Wang , Fred Morstatter , Robert P. Trevino , Jiliang Tang , Huan Liu

Feature Selection Tutorial with Python Examples

In Machine Learning, feature selection entails selecting a subset of the available features in a dataset to use for model development. There are many motivations for feature selection, it may result in better models, it may provide insight…

Machine Learning · Computer Science 2021-06-14 Padraig Cunningham , Bahavathy Kathirgamanathan , Sarah Jane Delany

Predicting Choice with Set-Dependent Aggregation

Providing users with alternatives to choose from is an essential component in many online platforms, making the accurate prediction of choice vital to their success. A renewed interest in learning choice models has led to significant…

Machine Learning · Computer Science 2020-01-22 Nir Rosenfeld , Kojin Oshiba , Yaron Singer

Issues in Strategic Decision Modelling

[Spreadsheet] Models are invaluable tools for strategic planning. Models help key decision makers develop a shared conceptual understanding of complex decisions, identify sensitivity factors and test management scenarios. Different…

Human-Computer Interaction · Computer Science 2024-12-31 Paula Jennings

Model selection and hypothesis testing for large-scale network models with overlapping groups

The effort to understand network systems in increasing detail has resulted in a diversity of methods designed to extract their large-scale structure from data. Unfortunately, many of these methods yield diverging descriptions of the same…

Data Analysis, Statistics and Probability · Physics 2015-03-27 Tiago P. Peixoto

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for…

Machine Learning · Computer Science 2020-11-12 Sebastian Raschka

Economic variable selection

Regression plays a key role in many research areas and its variable selection is a classic and major problem. This study emphasizes cost of predictors to be purchased for future use, when we select a subset of them. Its economic aspect is…

Methodology · Statistics 2021-03-19 Steven N. MacEachern , Koji Miyawaki

Model Selection for Production System via Automated Online Experiments

A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such…

Machine Learning · Statistics 2021-05-31 Zhenwen Dai , Praveen Chandar , Ghazal Fazelnia , Ben Carterette , Mounia Lalmas-Roelleke

Regression Model Selection Under General Conditions

Model selection criteria are one of the most important tools in statistics. Proofs showing a model selection criterion is asymptotically optimal are tailored to the type of model (linear regression, quantile regression, penalized…

Statistics Theory · Mathematics 2025-10-17 Amaze Lusompa

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Do We Really Sample Right In Model-Based Diagnosis?

Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first…

Artificial Intelligence · Computer Science 2022-08-05 Patrick Rodler , Fatima Elichanova

An Overview of Mixture Models

This paper has been withdrawn. With the advancement of statistical theory and computing power, data sets are providing a greater amount of insight into the problems of today. Statisticians have an ever increasing number of tools to attack…

Statistics Theory · Mathematics 2012-12-20 Derek S. Young

Survey on Feature Selection

Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when…

Machine Learning · Computer Science 2015-10-13 Tarek Amr Abdallah , Beatriz de La Iglesia

Methods of Selective Inference for Linear Mixed Models: a Review and Empirical Comparison

Selective inference aims at providing valid inference after a data-driven selection of models or hypotheses. It is essential to avoid overconfident results and replicability issues. While significant advances have been made in this area for…

Methodology · Statistics 2025-03-14 Matteo D'Alessandro , Magne Thoresen

Machine learning-based clinical prediction modeling -- A practical guide for clinicians

In the emerging era of big data, larger available clinical datasets and computational advances have sparked a massive interest in machine learning-based approaches. The number of manuscripts related to machine learning or artificial…

Machine Learning · Statistics 2020-06-29 Julius M. Kernbach , Victor E. Staartjes

Good practices for evaluation of machine learning systems

Many development decisions affect the results obtained from ML experiments: training data, features, model architecture, hyperparameters, test data, etc. Among these aspects, arguably the most important design decisions are those that…

Machine Learning · Computer Science 2024-12-06 Luciana Ferrer , Odette Scharenborg , Tom Bäckström