机器学习 — Scifaro

Robust Matrix Completion for Discrete Rating-Scale Data: Coping with Fake Profiles in Recommender Systems

Recommender systems are essential tools in the digital landscape for connecting users with content that more closely aligns with their preferences. Matrix completion is a widely used statistical framework for such systems, aiming to predict…

机器学习 · 统计学 2025-07-30 Aurore Archimbaud , Andreas Alfons , Ines Wilms

Active learning for level set estimation under input uncertainty and its extensions

Testing under what conditions the product satisfies the desired properties is a fundamental problem in manufacturing industry. If the condition and the property are respectively regarded as the input and the output of a black-box function,…

机器学习 · 统计学 2025-07-30 Yu Inatsu , Masayuki Karasuyama , Keiichi Inoue , Ichiro Takeuchi

Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures

The dynamic mode decomposition (DMD) is a data-driven approach that extracts the dominant features from spatiotemporal data. In this work, we introduce sparse-mode DMD, a new variant of the optimized DMD framework that specifically…

机器学习 · 统计学 2025-07-29 Sara M. Ichinaga , Steven L. Brunton , Aleksandr Y. Aravkin , J. Nathan Kutz

Adaptive Bayesian Data-Driven Design of Reliable Solder Joints for Micro-electronic Devices

Solder joint reliability related to failures due to thermomechanical loading is a critically important yet physically complex engineering problem. As a result, simulated behavior is oftentimes computationally expensive. In an increasingly…

机器学习 · 统计学 2025-07-29 Leo Guo , Adwait Inamdar , Willem D. van Driel , GuoQi Zhang

Bayesian symbolic regression: Automated equation discovery from a physicists' perspective

Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic…

机器学习 · 统计学 2025-07-29 Roger Guimera , Marta Sales-Pardo

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that…

机器学习 · 统计学 2025-07-29 Dominik Stöger , Yizhe Zhu

Penalty Learning for Optimal Partitioning using Multilayer Perceptron

Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics, and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly…

机器学习 · 统计学 2025-07-29 Tung L Nguyen , Toby Dylan Hocking

MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

Learning a nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations often scale quadratically in $d$ unless additional…

机器学习 · 统计学 2025-07-29 Victor Rielly , Kamel Lahouel , Ethan Lew , Nicholas Fisher , Vicky Haney , Michael Wells , Bruno Jedynak

Tensor Completion with Nearly Linear Samples Given Weak Side Information

Tensor completion exhibits an interesting computational-statistical gap in terms of the number of samples needed to perform tensor estimation. While there are only $\Theta(tn)$ degrees of freedom in a $t$-order tensor with $n^t$ entries,…

机器学习 · 统计学 2025-07-29 Christina Lee Yu , Xumei Xi

Perfect Clustering in Very Sparse Diverse Multiplex Networks

The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned…

机器学习 · 统计学 2025-07-28 Marianna Pensky

Probably Approximately Correct Causal Discovery

The discovery of causal relationships is a foundational problem in artificial intelligence, statistics, epidemiology, economics, and beyond. While elegant theories exist for accurate causal discovery given infinite data, real-world…

机器学习 · 统计学 2025-07-28 Mian Wei , Somesh Jha , David Page

Central limit theorems for the eigenvalues of graph Laplacians on data clouds

Given i.i.d.\ samples $X_n =\{ x_1, \dots, x_n \}$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $\Delta_n$ associated to an $\varepsilon$-proximity…

机器学习 · 统计学 2025-07-28 Chenghui Li , Nicolás García Trillos , Housen Li , Leo Suchan

Lower Bounds on the Size of Markov Equivalence Classes

Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned…

机器学习 · 统计学 2025-07-28 Erik Jahn , Frederick Eberhardt , Leonard J. Schulman

From Conditional to Unconditional Independence: Testing Conditional Independence via Transport Maps

Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We…

机器学习 · 统计学 2025-07-28 Chenxuan He , Yuan Gao , Liping Zhu , Jian Huang

DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple…

机器学习 · 统计学 2025-07-25 Miguel Aspis , Sebastián A. Cajas Ordónez , Andrés L. Suárez-Cetrulo , Ricardo Simón Carbajo

On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine…

机器学习 · 统计学 2025-07-25 George Wynne

A Two-armed Bandit Framework for A/B Testing

A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and…

机器学习 · 统计学 2025-07-25 Jinjuan Wang , Qianglin Wen , Yu Zhang , Xiaodong Yan , Chengchun Shi

Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper…

机器学习 · 统计学 2025-07-25 Masaki Adachi , Masahiro Fujisawa , Michael A Osborne

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups', where a group is formed by mixing a subset of the $p$ samples. Assuming that the…

机器学习 · 统计学 2025-07-25 Shuvayan Banerjee , Radhendushka Srivastava , James Saunderson , Ajit Rajwade

Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise…

机器学习 · 统计学 2025-07-25 Xin Gu , Gautam Kamath , Zhiwei Steven Wu