English
Related papers

Related papers: Two-cluster test

200 papers

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

Classical tests for a difference in means control the type I error rate when the groups are defined a priori. However, when the groups are instead defined via clustering, then applying a classical test yields an extremely inflated type I…

Methodology · Statistics 2022-11-01 Lucy L. Gao , Jacob Bien , Daniela Witten

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In…

Methodology · Statistics 2023-11-29 Yiqun T. Chen , Lucy L. Gao

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we…

Methodology · Statistics 2022-03-30 Yiqun T. Chen , Daniela M. Witten

Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray…

Methodology · Statistics 2016-10-07 Erika S. Helgeson , Eric Bair

Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical…

Machine Learning · Computer Science 2025-01-08 Weizhi Li , Visar Berisha , Gautam Dasarathy

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…

Machine Learning · Computer Science 2022-07-20 Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be…

Machine Learning · Statistics 2015-09-22 Qiyi Lu , Xingye Qiao

Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other…

Methodology · Statistics 2014-11-20 Patrick K. Kimes , Yufeng Liu , D. Neil Hayes , J. S. Marron

In the last years many studies examined the consistency of students' answers in a variety of contexts. Some of these papers tried to develop more detailed models of the consistency of students' reasoning, or to subdivide a sample of…

Physics Education · Physics 2017-08-17 Onofrio Rosario Battaglia , Benedetto Di Paola , Claudio Fazio

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…

Machine Learning · Computer Science 2019-04-23 Zengyou He , Chaohua Sheng , Yan Liu , Quan Zou

When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…

Methodology · Statistics 2025-11-12 Kentaro Fukumoto

Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the…

Machine Learning · Computer Science 2022-11-09 Lianyu Hu , Mudi Jiang , Yan Liu , Zengyou He

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria…

Methodology · Statistics 2020-09-10 Christian Hennig

Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts.…

Methodology · Statistics 2021-10-20 Hanwen Huang , Yufeng Liu , Ming Yuan , J. S. Marron

We propose novel methodology for testing equality of model parameters between two high-dimensional populations. The technique is very general and applicable to a wide range of models. The method is based on sample splitting: the data is…

Methodology · Statistics 2013-01-17 Nicolas Städler , Sach Mukherjee

For testing the statistical significance of a treatment effect, we usually compare between two parts of a population, one is exposed to the treatment, and the other is not exposed to it. Standard parametric and nonparametric two-sample…

Computation · Statistics 2012-11-02 Bikram Karmakar , Kumaresh Dhara , Kushal Kumar Dey , Analabha Basu , Anil Ghosh

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet,…

Machine Learning · Computer Science 2016-02-24 Margareta Ackerman , Andreas Adolfsson , Naomi Brownstein

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…

Methodology · Statistics 2017-02-10 Wei Fu , Patrick O. Perry
‹ Prev 1 2 3 10 Next ›