Related papers: Machine Learning using Stata/Python
pystacked implements stacked generalization (Wolpert, 1992) for regression and binary classification via Python's scikit-learn. Stacking combines multiple supervised machine learners -- the "base" or "level-0" learners -- into a single…
Machine Learning (ML) and linear System Identification (SI) have been historically developed independently. In this paper, we leverage well-established ML tools - especially the automatic differentiation framework - to introduce SIMBa, a…
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous…
Intrusion detection systems are crucial for network security. Verification of these systems is complicated by various factors, including the heterogeneity of network platforms and the continuously changing landscape of cyber threats. In…
In this paper, we present MELT-ML, a machine learning extension to the Matching and EvaLuation Toolkit (MELT) which facilitates the application of supervised learning for ontology and instance matching. Our contributions are twofold: We…
Machine learning (ML) offers powerful methods for detecting and modeling associations often in data with large feature spaces and complex associations. Many useful tools/packages (e.g. scikit-learn) have been developed to make the various…
This paper presents the first application of spiking neural networks (SNNs) for the classification of chronic lower back pain (CLBP) using the EmoPain dataset. Our work has two main contributions. We introduce Spike Threshold Adaptive…
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…
Language models often show little to no improvement (i.e., "saturation") when trained via vanilla supervised fine-tuning (SFT) on data similar to what they saw in their training set (e.g., MATH). We introduce a new fine-tuning strategy,…
In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community --- can a limited amount of highly-discrimin-ative (e.g., hyperspectral) training data improve the performance…
This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in…
In this paper we present MLaut (Machine Learning AUtomation Toolbox) for the python data science ecosystem. MLaut automates large-scale evaluation and benchmarking of machine learning algorithms on a large number of datasets. MLaut provides…
Self-training via pseudo labeling is a conventional, simple, and popular pipeline to leverage unlabeled data. In this work, we first construct a strong baseline of self-training (namely ST) for semi-supervised semantic segmentation via…
This paper presents the Stata community-distributed command "opl_ma_fb" (and the companion command "opl_ma_vf"), for implementing the first-best Optimal Policy Learning (OPL) algorithm to estimate the best treatment assignment given the…
As data science and machine learning methods are taking on an increasingly important role in the materials research community, there is a need for the development of machine learning software tools that are easy to use (even for nonexperts…
Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…
Regression methods based in Machine Learning Algorithms (MLA) have become an important tool for data analysis in many different disciplines. In this work, we use MLA in an astrophysical context; our goal is to measure the mean longitudinal…
This paper proposes a novel sparse principal component analysis algorithm with self-learning ability for successive modes, where synaptic intelligence is employed to measure the importance of variables and a regularization term is added to…
We introduce Logic Guided Machine Learning (LGML), a novel approach that symbiotically combines machine learning (ML) and logic solvers with the goal of learning mathematical functions from data. LGML consists of two phases, namely a…
Machine learning (ML) is increasingly adopted in scientific research, yet the quality and reliability of results often depend on how experiments are designed and documented. Poor baselines, inconsistent preprocessing, or insufficient…