English
Related papers

Related papers: A simple data discretizer

200 papers

Data discretization, also known as binning, is a frequently used technique in computer science, statistics, and their applications to biological data analysis. We present a new method for the discretization of real-valued data into a finite…

Other Quantitative Biology · Quantitative Biology 2007-05-23 Elena S. Dimitrova , John J. McGee , Reinhard C. Laubenbacher

Learning algorithms that learn linear models often have high representation bias on real-world problems. In this paper, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in…

Machine Learning · Computer Science 2017-01-26 Nayyar A. Zaidi , Yang Du , Geoffrey I. Webb

To date, attribute discretization is typically performed by replacing the original set of continuous features with a transposed set of discrete ones. This paper provides support for a new idea that discretized features should often be used…

Machine Learning · Computer Science 2018-02-12 Avi Rosenfeld , Ron Illuz , Dovid Gottesman , Mark Last

Recently, many improved naive Bayes methods have been developed with enhanced discrimination capabilities. Among them, regularized naive Bayes (RNB) produces excellent performance by balancing the discrimination power and generalization…

Machine Learning · Computer Science 2023-04-19 Shihe Wang , Jianfeng Ren , Ruibin Bai

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary…

Machine Learning · Computer Science 2023-04-06 Shihe Wang , Jianfeng Ren , Ruibin Bai , Yuan Yao , Xudong Jiang

The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has…

Machine Learning · Computer Science 2024-05-31 Prakhar Ganesh , Cuong Tran , Reza Shokri , Ferdinando Fioretto

Multiple instance learning (MIL) was a weakly supervised learning approach that sought to assign binary class labels to collections of instances known as bags. However, due to their weak supervision nature, the MIL methods were susceptible…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Wenhui Zhu , Peijie Qiu , Xiwen Chen , Oana M. Dumitrascu , Yalin Wang

We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to…

Machine Learning · Statistics 2021-06-08 Antoine Dedieu , Hussein Hazimeh , Rahul Mazumder

Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xinhao Zhong , Bin Chen , Hao Fang , Xulin Gu , Shu-Tao Xia , En-Hui Yang

When dealing with continuous numeric features, we usually adopt feature discretization. In this work, to find the best way to conduct feature discretization, we present some theoretical analysis, in which we focus on analyzing correctness…

Machine Learning · Computer Science 2020-04-28 Qiang Liu , Zhaocheng Liu , Haoli Zhang

Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due…

Software Engineering · Computer Science 2022-02-15 Gopi Krishnan Rajbahadur , Shaowei Wang , Yasutaka Kamei , Ahmed E. Hassan

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we…

Machine Learning · Computer Science 2012-07-19 Amir Globerson , Naftali Tishby

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on…

Machine Learning · Computer Science 2025-12-08 Paul Wilson , Fabio Zanasi , George Constantinides

Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the…

Machine Learning · Computer Science 2022-12-12 Lincen Yang , Mitra Baratchi , Matthijs van Leeuwen

In statistical learning, many problem formulations have been proposed so far, such as multi-class learning, complementarily labeled learning, multi-label learning, multi-task learning, which provide theoretical models for various real-world…

Machine Learning · Computer Science 2022-11-14 Daiki Suehiro , Eiji Takimoto

In real world, the huge amount of temporal data is to be processed in many application areas such as scientific, financial, network monitoring, sensor data analysis. Data mining techniques are primarily oriented to handle discrete features.…

Databases · Computer Science 2014-02-19 P. Chaudhari , D. P. Rana , R. G. Mehta , N. J. Mistry , M. M. Raghuwanshi

Dataset distillation aims at synthesizing a dataset by a small number of artificially generated data items, which, when used as training data, reproduce or approximate a machine learning (ML) model as if it were trained on the entire…

Machine Learning · Computer Science 2024-03-27 Radu-Andrei Rosu , Mihaela-Elena Breaban , Henri Luchian

Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical…

Machine Learning · Computer Science 2021-11-11 Samuel Ackerman , Orna Raz , Marcel Zalmanovici , Aviad Zlotnick

Multiple Instance Learning is the predominant method for Whole Slide Image classification in digital pathology, enabling the use of slide-level labels to supervise model training. Although MIL eliminates the tedious fine-grained annotation…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Chen Shu , Boyu Fu , Yiman Li , Ting Yin , Wenchuan Zhang , Jie Chen , Yuhao Yi , Hong Bu

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros
‹ Prev 1 2 3 10 Next ›