Related papers: A simple data discretizer

Discretization of Time Series Data

Data discretization, also known as binning, is a frequently used technique in computer science, statistics, and their applications to biological data analysis. We present a new method for the discretization of real-valued data into a finite…

Other Quantitative Biology · Quantitative Biology 2007-05-23 Elena S. Dimitrova , John J. McGee , Reinhard C. Laubenbacher

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers

Learning algorithms that learn linear models often have high representation bias on real-world problems. In this paper, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in…

Machine Learning · Computer Science 2017-01-26 Nayyar A. Zaidi , Yang Du , Geoffrey I. Webb

Using Discretization for Extending the Set of Predictive Features

To date, attribute discretization is typically performed by replacing the original set of continuous features with a transposed set of discrete ones. This paper provides support for a new idea that discretized features should often be used…

Machine Learning · Computer Science 2018-02-12 Avi Rosenfeld , Ron Illuz , Dovid Gottesman , Mark Last

A Semi-Supervised Adaptive Discriminative Discretization Method Improving Discrimination Power of Regularized Naive Bayes

Recently, many improved naive Bayes methods have been developed with enhanced discrimination capabilities. Among them, regularized naive Bayes (RNB) produces excellent performance by balancing the discrimination power and generalization…

Machine Learning · Computer Science 2023-04-19 Shihe Wang , Jianfeng Ren , Ruibin Bai

A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary…

Machine Learning · Computer Science 2023-04-06 Shihe Wang , Jianfeng Ren , Ruibin Bai , Yuan Yao , Xudong Jiang

The Data Minimization Principle in Machine Learning

The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has…

Machine Learning · Computer Science 2024-05-31 Prakhar Ganesh , Cuong Tran , Reza Shokri , Ferdinando Fioretto

PDL: Regularizing Multiple Instance Learning with Progressive Dropout Layers

Multiple instance learning (MIL) was a weakly supervised learning approach that sought to assign binary class labels to collections of instances known as bags. However, due to their weak supervision nature, the MIL methods were susceptible…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Wenhui Zhu , Peijie Qiu , Xiwen Chen , Oana M. Dumitrascu , Yalin Wang

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to…

Machine Learning · Statistics 2021-06-08 Antoine Dedieu , Hussein Hazimeh , Rahul Mazumder

Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information

Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xinhao Zhong , Bin Chen , Hao Fang , Xulin Gu , Shu-Tao Xia , En-Hui Yang

An Empirical Study on Feature Discretization

When dealing with continuous numeric features, we usually adopt feature discretization. In this work, to find the best way to conduct feature discretization, we present some theoretical analysis, in which we focus on analyzing correctness…

Machine Learning · Computer Science 2020-04-28 Qiang Liu , Zhaocheng Liu , Haoli Zhang

Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software Engineering

Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due…

Software Engineering · Computer Science 2022-02-15 Gopi Krishnan Rajbahadur , Shaowei Wang , Yasutaka Kamei , Ahmed E. Hassan

The Minimum Information Principle for Discriminative Learning

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we…

Machine Learning · Computer Science 2012-07-19 Amir Globerson , Naftali Tishby

Convergence for Discrete Parameter Update Schemes

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on…

Machine Learning · Computer Science 2025-12-08 Paul Wilson , Fabio Zanasi , George Constantinides

Unsupervised Discretization by Two-dimensional MDL-based Histogram

Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the…

Machine Learning · Computer Science 2022-12-12 Lincen Yang , Mitra Baratchi , Matthijs van Leeuwen

Simplified and Unified Analysis of Various Learning Problems by Reduction to Multiple-Instance Learning

In statistical learning, many problem formulations have been proposed so far, such as multi-class learning, complementarily labeled learning, multi-label learning, multi-task learning, which provide theoretical models for various real-world…

Machine Learning · Computer Science 2022-11-14 Daiki Suehiro , Eiji Takimoto

Discretization of Temporal Data: A Survey

In real world, the huge amount of temporal data is to be processed in many application areas such as scientific, financial, network monitoring, sensor data analysis. Data mining techniques are primarily oriented to handle discrete features.…

Databases · Computer Science 2014-02-19 P. Chaudhari , D. P. Rana , R. G. Mehta , N. J. Mistry , M. M. Raghuwanshi

Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification

Dataset distillation aims at synthesizing a dataset by a small number of artificially generated data items, which, when used as training data, reproduce or approximate a machine learning (ML) model as if it were trained on the entire…

Machine Learning · Computer Science 2024-03-27 Radu-Andrei Rosu , Mihaela-Elena Breaban , Henri Luchian

Automatically detecting data drift in machine learning classifiers

Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical…

Machine Learning · Computer Science 2021-11-11 Samuel Ackerman , Orna Raz , Marcel Zalmanovici , Aviad Zlotnick

Denoising Mutual Knowledge Distillation in Bi-Directional Multiple Instance Learning

Multiple Instance Learning is the predominant method for Whole Slide Image classification in digital pathology, enabling the use of slide-level labels to supervise model training. Although MIL eliminates the tedious fine-grained annotation…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Chen Shu , Boyu Fu , Yiman Li , Ting Yin , Wenchuan Zhang , Jie Chen , Yuhao Yi , Hong Bu

Dataset Distillation

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros