Youssef Mroueh — Scifaro

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

Group Relative Policy Optimization (GRPO) has become the dominant method for reinforcement learning with verifiable rewards in large language models, but it suffers from two critical limitations: gradient vanishing and diversity collapse.…

Machine Learning · Computer Science 2026-05-20 Khiem Le , Phuc Nguyen , Youssef Mroueh , Chi-Heng Lin , Shangqian Gao , Ting Hua , Nitesh V. Chawla

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-$n$ test-time scaling with a reward model $r(x,y)$ and speculative samples from a…

Machine Learning · Computer Science 2026-04-28 Jonathan Geuter , Youssef Mroueh , David Alvarez-Melis

CliffSearch: Structured Agentic Co-Evolution over Theory and Code for Scientific Algorithm Discovery

Scientific algorithm discovery is iterative: hypotheses are proposed, implemented, stress-tested, and revised. Current LLM-guided search systems accelerate proposal generation, but often under-represent scientific structure by optimizing…

Machine Learning · Computer Science 2026-04-02 Youssef Mroueh , Carlos Fonseca , Brian Belgodere , David Cox

GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance

The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free…

Machine Learning · Computer Science 2025-11-18 Jiri Navratil , Jarret Ross , Payel Das , Youssef Mroueh , Samuel C Hoffman , Vijil Chenthamarakshan , Brian Belgodere

Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification

Group Relative Policy Optimization (GRPO) was introduced and used recently for promoting reasoning in LLMs under verifiable (binary) rewards. We show that the mean + variance calibration of these rewards induces a weighted contrastive loss…

Machine Learning · Computer Science 2025-10-22 Youssef Mroueh

KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity

Recent methods for aligning large language models (LLMs) with human feedback predominantly rely on a single reference model, which limits diversity, model overfitting, and underutilizes the wide range of available pre-trained models.…

Machine Learning · Computer Science 2025-10-21 Gholamali Aminian , Amir R. Asadi , Idan Shenfeld , Youssef Mroueh

Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant

Qiskit is an open-source quantum computing framework that allows users to design, simulate, and run quantum circuits on real quantum hardware. We explore post-training techniques for LLMs to assist in writing Qiskit code. We introduce…

Quantum Physics · Physics 2025-08-29 Nicolas Dupuis , Adarsh Tiwari , Youssef Mroueh , David Kremer , Ismael Faro , Juan Cruz-Benito

Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis

A simple yet effective method for inference-time alignment of generative models is Best-of-$N$ (BoN), where $N$ outcomes are sampled from a reference policy, evaluated using a proxy reward model, and the highest-scoring one is selected.…

Machine Learning · Statistics 2025-07-09 Gholamali Aminian , Idan Shenfeld , Amir R. Asadi , Ahmad Beirami , Youssef Mroueh

Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training

We revisit Group Relative Policy Optimization (GRPO) in both on-policy and off-policy optimization regimes. Our motivation comes from recent work on off-policy Proximal Policy Optimization (PPO), which improves training stability, sampling…

Machine Learning · Computer Science 2025-06-02 Youssef Mroueh , Nicolas Dupuis , Brian Belgodere , Apoorva Nitsure , Mattia Rigotti , Kristjan Greenewald , Jiri Navratil , Jerret Ross , Jesus Rios

Gradient Flows and Riemannian Structure in the Gromov-Wasserstein Geometry

The Wasserstein space of probability measures is known for its intricate Riemannian structure, which underpins the Wasserstein geometry and enables gradient flow algorithms. However, the Wasserstein geometry may not be suitable for certain…

Analysis of PDEs · Mathematics 2025-05-23 Zhengxin Zhang , Ziv Goldfeld , Kristjan Greenewald , Youssef Mroueh , Bharath K. Sriperumbudur

GP-MoLFormer: A Foundation Model For Molecular Generation

Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we…

Biomolecules · Quantitative Biology 2025-04-02 Jerret Ross , Brian Belgodere , Samuel C. Hoffman , Vijil Chenthamarakshan , Jiri Navratil , Youssef Mroueh , Payel Das

Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications. In the black-box setting, several self-consistency-based techniques have been proposed for hallucination detection.…

Computation and Language · Computer Science 2025-02-25 Yihao Xue , Kristjan Greenewald , Youssef Mroueh , Baharan Mirzasoleiman

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Pierre Dognin , Igor Melnyk , Youssef Mroueh , Inkit Padhi , Mattia Rigotti , Jarret Ross , Yair Schiff , Richard A. Young , Brian Belgodere

Large Language Models can be Strong Self-Detoxifiers

Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning…

Machine Learning · Computer Science 2024-10-08 Ching-Yun Ko , Pin-Yu Chen , Payel Das , Youssef Mroueh , Soham Dan , Georgios Kollias , Subhajit Chaudhury , Tejaswini Pedapati , Luca Daniel

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little…

Machine Learning · Statistics 2024-06-11 Gabriel Rioux , Apoorva Nitsure , Mattia Rigotti , Kristjan Greenewald , Youssef Mroueh

Information Theoretic Guarantees For Policy Alignment In Large Language Models

Policy alignment of large language models refers to constrained policy optimization, where the policy is optimized to maximize a reward while staying close to a reference policy with respect to an $f$-divergence such as the $\mathsf{KL}$…

Machine Learning · Computer Science 2024-06-11 Youssef Mroueh

Distributional Preference Alignment of LLMs via Optimal Transport

Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for…

Machine Learning · Computer Science 2024-06-11 Igor Melnyk , Youssef Mroueh , Brian Belgodere , Mattia Rigotti , Apoorva Nitsure , Mikhail Yurochkin , Kristjan Greenewald , Jiri Navratil , Jerret Ross

Risk Aware Benchmarking of Large Language Models

We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic…

Machine Learning · Computer Science 2024-06-11 Apoorva Nitsure , Youssef Mroueh , Mattia Rigotti , Kristjan Greenewald , Brian Belgodere , Mikhail Yurochkin , Jiri Navratil , Igor Melnyk , Jerret Ross

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining…

Machine Learning · Computer Science 2024-06-11 Brian Belgodere , Pierre Dognin , Adam Ivankay , Igor Melnyk , Youssef Mroueh , Aleksandra Mojsilovic , Jiri Navratil , Apoorva Nitsure , Inkit Padhi , Mattia Rigotti , Jerret Ross , Yair Schiff , Radhika Vedpathak , Richard A. Young

Physics-enhanced deep surrogates for partial differential equations

Many physics and engineering applications demand Partial Differential Equations (PDE) property evaluations that are traditionally computed with resource-intensive high-fidelity numerical solvers. Data-driven surrogate models provide an…

Machine Learning · Computer Science 2023-12-18 Raphaël Pestourie , Youssef Mroueh , Chris Rackauckas , Payel Das , Steven G. Johnson