Rahul Mazumder — Scifaro

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

Quantization is an effective strategy to reduce the storage and computation footprint of large language models (LLMs). Post-training quantization (PTQ) is a leading approach for compressing LLMs. Popular weight quantization procedures,…

Machine Learning · Computer Science 2026-05-13 Ryan Lucas , Mehdi Makni , Xiang Meng , Adam Deng , Rahul Mazumder

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Reasoning language models such as DeepSeek-R1 produce long chain-of-thought traces during inference time which make them costly to deploy at scale. We show that using compression techniques such as neural network pruning produces greater…

Artificial Intelligence · Computer Science 2026-05-05 Ryan Lucas , Kayhan Behdin , Zhipeng Wang , Qingquan Song , Shao Tang , Rahul Mazumder

MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods…

Machine Learning · Computer Science 2026-04-16 Gabriel Afriat , Xiang Meng , Shibal Ibrahim , Hussein Hazimeh , Rahul Mazumder

Computation of Least Trimmed Squares: A Branch-and-Bound framework with Hyperplane Arrangement Enhancements

We study computational aspects of a key problem in robust statistics -- the penalized least trimmed squares (LTS) regression problem, a robust estimator that mitigates the influence of outliers in data by capping residuals with large…

Optimization and Control · Mathematics 2026-04-15 Xiang Meng , Andrés Gómez , Rahul Mazumder

Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is…

Machine Learning · Computer Science 2026-04-07 Kayhan Behdin , Wenyu Chen , Rahul Mazumder

Sparse PCA: A New Scalable Estimator Based On Integer Programming

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global…

Methodology · Statistics 2026-04-06 Kayhan Behdin , Rahul Mazumder

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships…

Machine Learning · Statistics 2026-04-01 Brian Liu , Rahul Mazumder , Peter Radchenko

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

We study the problem of routing queries to large language models (LLMs) under cost, GPU resources, and concurrency constraints. Prior per-query routing methods often fail to control batch-level cost, especially under non-uniform or…

Machine Learning · Computer Science 2026-03-31 Jelena Markovic-Voronov , Kayhan Behdin , Yuanda Xu , Zhengze Zhou , Zhipeng Wang , Rahul Mazumder

Modeling with Categorical Features via Exact Fusion and Sparsity Regularisation

We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a)…

Methodology · Statistics 2026-03-30 Kayhan Behdin , Riade Benbaki , Peter Radchenko , Rahul Mazumder

DuaLip-GPU Technical Report

Large-scale linear programs (LPs) arise in many decision systems, including ranking, allocation, and matching problems that must be solved repeatedly at massive scale. Prior work such as ECLIPSE and LinkedIn's open-source DuaLip showed that…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Gregory Dexter , Aida Rahmattalabi , Sanjana Garg , Qinquan Song , Ruby Tu , Yuan Gao , Yi Zhang , Zhipeng Wang , Rahul Mazumder

3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

Sparse plus Low-Rank $(\mathbf{S} + \mathbf{LR})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank…

Machine Learning · Computer Science 2026-03-03 Mehdi Makni , Xiang Meng , Rahul Mazumder

A GPU-accelerated Nonlinear Branch-and-Bound Framework for Sparse Linear Models

We study exact sparse linear regression with an $\ell_0-\ell_2$ penalty and develop a branch-and-bound (BnB) algorithm explicitly designed for GPU execution. Starting from a perspective reformulation, we derive an interval relaxation that…

Optimization and Control · Mathematics 2026-02-05 Xiang Meng , Ryan Lucas , Rahul Mazumder

Theoretical Compression Bounds for Wide Multilayer Perceptrons

Pruning and quantization techniques have been broadly successful in reducing the number of parameters needed for large neural networks, yet theoretical justification for their empirical success falls short. We consider a randomized greedy…

Machine Learning · Computer Science 2025-12-09 Houssam El Cheairi , David Gamarnik , Rahul Mazumder

Multi-Task Learning for Sparsity Pattern Heterogeneity: Statistical and Computational Perspectives

We consider a problem in Multi-Task Learning (MTL) where multiple linear models are jointly trained on a collection of datasets ("tasks"). A key novelty of our framework is that it allows the sparsity pattern of regression coefficients and…

Methodology · Statistics 2025-12-08 Kayhan Behdin , Gabriel Loewinger , Kenneth T. Kishida , Giovanni Parmigiani , Rahul Mazumder

Differentially Private High-dimensional Variable Selection via Integer Programming

Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale…

Machine Learning · Statistics 2025-10-28 Petros Prastakos , Kayhan Behdin , Rahul Mazumder

Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendation systems to generative tasks. Although scaling laws indicate that larger models generally…

Information Retrieval · Computer Science 2025-10-28 Kayhan Behdin , Ata Fatahibaarzi , Qingquan Song , Yun Dai , Aman Gupta , Zhipeng Wang , Shao Tang , Hejian Sang , Gregory Dexter , Sirou Zhu , Siyu Zhu , Tejas Dharamsi , Vignesh Kothapalli , Zhoutong Fu , Yihan Cao , Pin-Lun Hsu , Fedor Borisyuk , Natesh Pillai , Luke Simon , Rahul Mazumder

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate…

Machine Learning · Computer Science 2025-09-09 Xiang Meng , Kayhan Behdin , Haoyue Wang , Rahul Mazumder

MOSS: Multi-Objective Optimization for Stable Rule Sets

We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective…

Optimization and Control · Mathematics 2025-07-31 Brian Liu , Rahul Mazumder

FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

We present FAST, an optimization framework for fast additive segmentation. FAST segments piecewise constant shape functions for each feature in a dataset to produce transparent additive models. The framework leverages a novel optimization…

Machine Learning · Statistics 2025-07-31 Brian Liu , Rahul Mazumder

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain…

Machine Learning · Statistics 2025-07-23 Brian Liu , Rahul Mazumder