机器学习
Recent studies have shown that deep learning models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. To analyze and study these vulnerabilities, various…
Anomaly detection (AD) plays a vital role across a wide range of domains, but its performance might deteriorate when applied to target domains with limited data. Domain Adaptation (DA) offers a solution by transferring knowledge from a…
Federated learning has emerged as an essential paradigm for distributed multi-source data analysis under privacy concerns. Most existing federated learning methods focus on the ``static" datasets. However, in many real-world applications,…
We investigate the use of path signatures in a machine learning context for hedging exotic derivatives under non-Markovian stochastic volatility models. In a deep learning setting, we use signatures as features in feedforward neural…
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We…
Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, represents a critical frontier for agricultural planning, energy management, and disaster preparedness. However, it…
For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the…
The Pairwise Markov Chain (PMC) is a probabilistic graphical model extending the well-known Hidden Markov Model. This model, although highly effective for many tasks, has been scarcely utilized for continuous value prediction. This is…
This paper presents a canonical polyadic (CP) tensor decomposition that addresses unaligned observations. The mode with unaligned observations is represented using functions in a reproducing kernel Hilbert space (RKHS). We introduce a…
We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field…
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and…
Real-world datasets often exhibit imbalanced data distribution, where certain class levels are severely underrepresented. In such cases, traditional pattern classifiers have shown a bias towards the majority class, impeding accurate…
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving…
Feature importance (FI) statistics provide a prominent and valuable method of insight into the decision process of machine learning (ML) models, but their effectiveness has well-known limitations when correlation is present among the…
Internet live streaming is widely used in online entertainment and e-commerce, where live advertising is an important marketing tool for anchors. An advertising campaign hopes to maximize the effect (such as conversions) under constraints…
We consider matrices $\boldsymbol{A}(\boldsymbol\theta)\in\mathbb{R}^{m\times m}$ that depend, possibly nonlinearly, on a parameter $\boldsymbol\theta$ from a compact parameter space $\Theta$. We present a Monte Carlo estimator for…
In this work, we discuss what we refer to as reduction techniques for survival analysis, that is, techniques that "reduce" a survival task to a more common regression or classification task, without ignoring the specifics of survival data.…
In functional data analysis, binary classification with one functional covariate has been extensively studied. We aim to fill in the gap of considering multivariate functional covariates in classification. In particular, we propose an…
Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this…
Diffusion probabilistic models have been successfully used to generate data from noise. However, most diffusion models are computationally expensive and difficult to interpret with a lack of theoretical justification. Random feature models…