Related papers: A model selection approach for multiple sequence s…
Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function…
We consider nonlinear mixed effects models including high-dimensional covariates to model individual parameters variability. The objective is to identify relevant covariates among a large set under sparsity assumption and to estimate model…
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…
We consider the segmentation problem of univariate distributions from the exponential family with multiple parameters. In segmentation, the choice of the number of segments remains a difficult issue due to the discrete nature of the…
In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one element from each…
A common approach to statistical learning with big-data is to randomly split it among $m$ machines and learn the parameter of interest by averaging the $m$ individual estimates. In this paper, focusing on empirical risk minimization, or…
Sequence segmentation is a well-studied problem, where given a sequence of elements, an integer K, and some measure of homogeneity, the task is to split the sequence into K contiguous segments that are maximally homogeneous. A classic…
Let $(Y,X_1,...,X_m)$ be a random vector. It is desired to predict $Y$ based on $(X_1,...,X_m)$. Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the…
Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is…
Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large…
We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…
We present a convex approach to probabilistic segmentation and modeling of time series data. Our approach builds upon recent advances in multivariate total variation regularization, and seeks to learn a separate set of parameters for the…
This paper presents a new approach to statistical similarity assessment based on sequence alignment. The algorithm performs mutual matching of two random sequences by successively searching for common elements and by applying sequence…
Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One…
Binary segmentation, which is sequential in nature is thus far the most widely used method for identifying multiple change points in statistical models. Here we propose a top down methodology called arbitrary segmentation that proceeds in a…
This paper proposes a novel method for segmentation of images by hierarchical multilevel thresholding. The method is global, agglomerative in nature and disregards pixel locations. It involves the optimization of the ratio of the unbiased…
We study the problem of partitioning integer sequences in the one-pass data streaming model. Given is an input stream of integers $X \in \{0, 1, \dots, m \}^n$ of length $n$ with maximum element $m$, and a parameter $p$. The goal is to…
We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many…
The Adaptive Multilevel Splitting algorithm is a very powerful and versatile method to estimate rare events probabilities. It is an iterative procedure on an interacting particle system, where at each step, the $k$ less well-adapted…
Partitioning a sequence of length $n$ into $k$ coherent segments (Seg) is one of the classic optimization problems. As long as the optimization criterion is additive, Seg can be solved exactly in $O(n^2k)$ time using a classic dynamic…