English
Related papers

Related papers: vtreat: a data.frame Processor for Predictive Mode…

200 papers

Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…

Machine Learning · Computer Science 2026-01-06 Yen-Chia Chen , Hsing-Kuo Pao , Hanjuan Huang

Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least common values of this target variable are associated with…

Machine Learning · Computer Science 2015-05-14 Paula Branco , Luis Torgo , Rita Ribeiro

Pretrained models of code, such as CodeBERT and CodeT5, have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data, which are rarely available for…

Machine Learning · Computer Science 2024-01-23 Kamel Alrashedy , Vincent J. Hellendoorn , Alessandro Orso

Amid rising concerns of reproducibility and generalizability in predictive modeling, we explore the possibility and potential benefits of introducing pre-registration to the field. Despite notable advancements in predictive modeling,…

Machine Learning · Computer Science 2023-12-01 Jake M. Hofman , Angelos Chatzimparmpas , Amit Sharma , Duncan J. Watts , Jessica Hullman

The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data. In order to improve data quality, researchers and practitioners often have to manually curate datasets from difference sources…

Machine Learning · Computer Science 2024-04-24 Yiding Sun , Feng Wang , Yutao Zhu , Wayne Xin Zhao , Jiaxin Mao

Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as…

Machine Learning · Computer Science 2022-04-29 Alexander Scarlatos , Christopher Brinton , Andrew Lan

Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-preformed studies…

Software Engineering · Computer Science 2020-08-11 Yanming Yang , Xin Xia , David Lo , Tingting Bi , John Grundy , Xiaohu Yang

Most machine learning techniques are based upon statistical learning theory, often simplified for the sake of computing speed. This paper is focused on the uncertainty aspect of mathematical modeling in machine learning. Regression analysis…

Machine Learning · Computer Science 2022-06-07 Valentin Arkov

In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. This is critical for building trust in various stages of a machine learning pipeline: from cleaning…

Machine Learning · Computer Science 2022-12-27 Yingyan Zeng , Jiachen T. Wang , Si Chen , Hoang Anh Just , Ran Jin , Ruoxi Jia

Prediction rule ensembles (PREs) are a relatively new statistical learning method, which aim to strike a balance between predictive accuracy and interpretability. Starting from a decision tree ensemble, like a boosted tree ensemble or a…

Applications · Statistics 2023-10-02 Marjolein Fokkema , Carolin Strobl

In this thesis, we develop various techniques for working with sets in machine learning. Each input or output is not an image or a sequence, but a set: an unordered collection of multiple objects, each object described by a feature vector.…

Machine Learning · Computer Science 2021-03-09 Yan Zhang

Variable trees are a new method for the exploration of discrete multivariate data. They display nested subsets and corresponding frequencies and percentages. Manual calculation of these quantities can be laborious, especially when there are…

Computation · Statistics 2021-02-08 Nick Barrowman , Richard J. Webster

3D softwares are now capable of producing highly realistic images that look nearly indistinguishable from the real images. This raises the question: can real datasets be enhanced with 3D rendered data? We investigate this question. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-04-06 Shesh Narayan Gupta , Nicholas Bear Brown

Matrix regression plays an important role in modern data analysis due to its ability to handle complex relationships involving both matrix and vector variables. We propose a class of regularized regression models capable of predicting both…

Optimization and Control · Mathematics 2025-01-14 Meixia Lin , Ziyang Zeng , Yangjing Zhang

Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents' response processes of solving the items. Process data analysis aims at…

Computation · Statistics 2020-06-11 Xueying Tang , Susu Zhang , Zhi Wang , Jingchen Liu , Zhiliang Ying

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Artificial intelligence (AI) - and specifically machine learning (ML) - applications for climate prediction across timescales are proliferating quickly. The emergence of these methods prompts a revisit to the impact of data preprocessing, a…

We have analyzed manufacturing data from several different semiconductor manufacturing plants, using decision tree induction software called Q-YIELD. The software generates rules for predicting when a given product should be rejected. The…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

In the backdrop of increasing data requirements of Deep Neural Networks for object recognition that is growing more untenable by the day, we present Developmental PreTraining (DPT) as a possible solution. DPT is designed as a…

Machine Learning · Computer Science 2023-12-04 Niranjan Rajesh , Debayan Gupta

In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a…

Applications · Statistics 2021-09-21 Emanuele Aliverti , Kristian Lum , James E. Johndrow , David B. Dunson
‹ Prev 1 2 3 10 Next ›