Sambit Panda — Scifaro

Extremely Simple Streaming Forest

Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in…

Machine Learning · Computer Science 2025-06-27 Haoyin Xu , Jayanta Dey , Sambit Panda , Joshua T. Vogelstein

Learning Interpretable Characteristic Kernels via Decision Forests

Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While…

Machine Learning · Statistics 2024-10-14 Sambit Panda , Cencheng Shen , Joshua T. Vogelstein

Universally Consistent K-Sample Tests via Dependence Measures

The K-sample testing problem involves determining whether K groups of data points are each drawn from the same distribution. Analysis of variance is arguably the most classical method to test mean differences, along with several recent…

Machine Learning · Statistics 2024-10-04 Sambit Panda , Cencheng Shen , Ronan Perry , Jelle Zorn , Antoine Lutz , Carey E. Priebe , Joshua T. Vogelstein

hyppo: A Multivariate Hypothesis Testing Python Package

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing. While many multivariate independence tests have R packages available, the interfaces are…

Computation · Statistics 2024-09-16 Sambit Panda , Satish Palaniappan , Junhao Xiong , Eric W. Bridgeford , Ronak Mehta , Cencheng Shen , Joshua T. Vogelstein

The Chi-Square Test of Distance Correlation

Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any…

Machine Learning · Statistics 2024-06-27 Cencheng Shen , Sambit Panda , Joshua T. Vogelstein

Learning sources of variability from high-dimensional observational studies

Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug…

Methodology · Statistics 2023-11-30 Eric W. Bridgeford , Jaewon Chung , Brian Gilbert , Sambit Panda , Adam Li , Cencheng Shen , Alexandra Badea , Brian Caffo , Joshua T. Vogelstein

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of…

Machine Learning · Computer Science 2021-11-04 Haoyin Xu , Kaleab A. Kinfu , Will LeVine , Sambit Panda , Jayanta Dey , Michael Ainsworth , Yu-Chung Peng , Madi Kusmanov , Florian Engert , Christopher M. White , Joshua T. Vogelstein , Carey E. Priebe

Stochastic gravitational wave background mapmaking using regularised deconvolution

Obtaining a faithful source intensity distribution map of the sky from noisy data demands incorporating known information of the expected signal, especially when the signal is weak compared to the noise. We introduce a widely used procedure…

General Relativity and Quantum Cosmology · Physics 2019-09-04 Sambit Panda , Swetha Bhagwat , Jishnu Suresh , Sanjit Mitra