Related papers: M-decomposability, elliptical unimodal densities, …
Robustly determining the optimal number of clusters in a data set is an essential factor in a wide range of applications. Cluster enumeration becomes challenging when the true underlying structure in the observed data is corrupted by…
A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric…
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used…
Clustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel…
Kernel density estimation is a popular method for estimating unseen probability distributions. However, the convergence of these classical estimators to the true density slows down in high dimensions. Moreover, they do not define meaningful…
In this paper we introduce the notion of m-irreducibility that extends the standard concept of irreducibility of a numerical semigroup when the multiplicity is fixed. We analyze the structure of the set of m-irreducible numerical…
Elliptically contoured distributions can be considered to be the distributions for which the contours of the density functions are proportional ellipsoids. Kamiya, Takemura and Kuriki (2006) generalized the elliptically contoured…
A theoretical framework is developed to describe the transformation that distributes probability density functions uniformly over space. In one dimension, the cumulative distribution can be used, but does not generalize to higher…
We introduce a general semiparametric clusterwise elliptical distribution to assess how latent cluster structure shapes continuous outcomes. Using a subjectwise representation, we first estimate cluster-specific mean vectors and a…
Density-based clustering methodology has been widely considered in the statistical literature for classifying Euclidean observations. However, this approach has not been contemplated for directional data yet. In this work, directional…
The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation…
We show how clustering standard errors in one or more dimensions can be justified in M-estimation when there is sampling or assignment uncertainty. Since existing procedures for variance estimation are either conservative or invalid, we…
The denoising diffusion probabilistic model (DDPM) has emerged as a mainstream generative model in generative AI. While sharp convergence guarantees have been established for the DDPM, the iteration complexity is, in general, proportional…
Density estimation plays a fundamental role in many areas of statistics and machine learning. Parametric, nonparametric and semiparametric density estimation methods have been proposed in the literature. Semiparametric density models are…
A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric…
Lattice models parameterized using first-principles calculations constitute an effective framework to simulate the thermodynamic behavior of physical systems. The cluster expansion method is a flexible lattice-based method used extensively…
In this paper, we propose a new ellipsoidal mixture model. This model is based a new probability density function belonging to the family of elliptical distributions and designed to model points spread around an ellipsoidal surface. Then,…
The paper aims at finding widely and smoothly defined nonparametric location and scatter functionals. As a convenient vehicle, maximum likelihood estimation of the location vector m and scatter matrix S of an elliptically symmetric t…
We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the…
In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the…