Defeng Sun
We present an efficient algorithm for least-squares constrained nuclear norm minimization, a computationally challenging problem with broad applications. Our approach combines a level set method with secant iterations and a proximal…
Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their…
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven…
Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented…
The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard…
In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically…
Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers,…
In this manuscript, we establish the global well-posedness for master equations of mean field games of controls, where the interaction is through the joint law of the state and control. Our results are proved under two different conditions:…
In recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution,…
This paper introduces the distributed Halpern Peaceman--Rachford (dHPR) method, an efficient algorithm for solving distributed convex composite optimization problems with non-smooth objectives, which achieves a non-ergodic $O(1/k)$…
This paper establishes the equivalence of the Aubin property and the strong regularity for generalized equations over $C^2$-cone reducible sets. This result resolves a long-standing question in variational analysis and extends the…
We introduce a cutting-plane framework for nonconvex quadratic programs (QPs) that progressively tightens convex relaxations. Our approach leverages the doubly nonnegative (DNN) relaxation to compute strong lower bounds and generate…
Rank-constrained matrix problems appear frequently across science and engineering. The convergence analysis of iterative algorithms developed for these problems often hinges on local error bounds, which correlate the distance to the…
This paper aims to understand the relationships among recently developed GPU-accelerated first-order methods (FOMs) for linear programming (LP), with particular emphasis on HPR-LP -- a Halpern Peaceman--Rachford (HPR) method for LP. Our…
Low-rank tensor models are widely used in statistics. However, most existing methods rely heavily on the assumption that data follows a sub-Gaussian distribution. To address the challenges associated with heavy-tailed distributions…
The metric projection onto the positive semidefinite (PSD) cone is strongly semismooth, a property that guarantees local quadratic convergence for many powerful algorithms in semidefinite programming. In this paper, we investigate whether…
In this paper, we introduce HPR-QP, a dual Halpern Peaceman-Rachford (HPR) method designed for solving large-scale convex composite quadratic programming. One distinctive feature of HPR-QP is that, instead of working with the primal…
In this paper, we propose a novel self-supervised transfer learning method called \underline{\textbf{D}}istribution \underline{\textbf{M}}atching (DM), which drives the representation distribution toward a predefined reference distribution…
In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller…
We explore the approximation capabilities of Transformer networks for H\"older and Sobolev functions, and apply these results to address nonparametric regression estimation with dependent observations. First, we establish novel upper bounds…