Related papers: Mixed-Integer Linear Optimization for Cardinality-…
Decision trees are one of the most popular methods for solving classification problems, mainly because of their good interpretability properties. Moreover, due to advances in recent years in mixed-integer optimization, several models have…
The data made available for analysis are becoming more and more complex along several directions: high dimensionality, number of examples and the amount of labels per example. This poses a variety of challenges for the existing machine…
Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention…
Manifold alignment is a type of data fusion technique that creates a shared low-dimensional representation of data collected from multiple domains, enabling cross-domain learning and improved performance in downstream tasks. This paper…
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a…
In structured output learning, obtaining labelled data for real-world applications is usually costly, while unlabelled examples are available in abundance. Semi-supervised structured classification has been developed to handle large amounts…
We consider semi-supervised binary classification for applications in which data points are naturally grouped (e.g., survey responses grouped by state) and the labeled data is biased (e.g., survey respondents are not representative of the…
Semi-supervised learning has received increasingly attention in statistics and machine learning. In semi-supervised learning settings, a labeled data set with both outcomes and covariates and an unlabeled data set with covariates only are…
We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…
Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of not classified data, to perform classification, in situations when, typically, the labelled data are few. Even though this is not…
Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successful general-purpose machine learning algorithms of modern times. In this…
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for…
The centrality and diversity of the labeled data are very influential to the performance of semi-supervised learning (SSL), but most SSL models select the labeled data randomly. This study first construct a leading forest that forms a…
Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…
We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…
In many machine learning applications, labeling datasets can be an arduous and time-consuming task. Although research has shown that semi-supervised learning techniques can achieve high accuracy with very few labels within the field of…
Global optimization of decision trees is a long-standing challenge in combinatorial optimization, yet such models play an important role in interpretable machine learning. Although the problem has been investigated for several decades, only…
Multi-label classification is a challenging task, particularly in domains where the number of labels to be predicted is large. Deep neural networks are often effective at multi-label classification of images and textual data. When dealing…
Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem. We assume…
It has been found that stochastic algorithms often find good solutions much more rapidly than inherently-batch approaches. Indeed, a very useful rule of thumb is that often, when solving a machine learning problem, an iterative technique…