Related papers: Basics of Feature Selection and Statistical Learni…
Feature selection is an important task in many problems occurring in pattern recognition, bioinformatics, machine learning and data mining applications. The feature selection approach enables us to reduce the computation burden and the…
The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…
These three lectures provide an introduction to the main concepts of statistical data analysis useful for precision measurements and searches for new signals in High Energy Physics. The frequentist and Bayesian approaches to probability…
Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature…
In Machine Learning, feature selection entails selecting a subset of the available features in a dataset to use for model development. There are many motivations for feature selection, it may result in better models, it may provide insight…
Machine-learning techniques have become fundamental in high-energy physics and, for new physics searches, it is crucial to know their performance in terms of experimental sensitivity, understood as the statistical significance of the…
Many analyses in high-energy physics rely on selection thresholds (cuts) applied to detector, particle, or event properties. Initial cut values can often be guessed from physical intuition, but cut optimization, especially for multiple…
The recent progresses in Machine Learning opened the door to actual applications of learning algorithms but also to new research directions both in the field of Machine Learning directly and, at the edges with other disciplines. The case…
Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The…
In this work we discuss the impact of nuisance parameters on the effectiveness of machine learning in high-energy physics problems, and provide a review of techniques that allow to include their effect and reduce their impact in the search…
A good feature representation is a determinant factor to achieve high performance for many machine learning algorithms in terms of classification. This is especially true for techniques that do not build complex internal representations of…
We provide a prescription to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six…
We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a…
Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training…
Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use…
These lectures describe several topics in statistical data analysis as used in High Energy Physics. They focus on areas most relevant to analyses at the LHC that search for new physical phenomena, including statistical tests for discovery…
Different ways of extracting parameters of interest from combined data sets of separate experiments are investigated accounting for the systematic errors. It is shown, that the frequentist approach may yield larger $\chi^2$ values when…
Data analysis is a powerful tool in all experimental sciences. Statistical methods, such as sampling theory, computer technologies necessary for handling large amounts of data, skill in analysing information contained in different types of…
Before any publication, data analysis of high-energy physics experiments must be validated. This validation is granted only if a perfect understanding of the data and the analysis process is demonstrated. Therefore, physicists prefer using…
These lectures concern two topics that are becoming increasingly important in the analysis of High Energy Physics (HEP) data: Bayesian statistics and multivariate methods. In the Bayesian approach we extend the interpretation of probability…