Martin Renqiang Min

DiscussLLM: Teaching Large Language Models When to Speak

Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like text, yet they largely operate as reactive agents, responding only when directly prompted. This passivity creates an…

Computation and Language · Computer Science 2026-05-18 Deep Anil Patel , Iain Melvin , Christopher Malon , Martin Renqiang Min

CalibFree: Self-Supervised View Feature Separation for Calibration-Free Multi-Camera Multi-Object Tracking

Multi-camera multi-object tracking (MCMOT) faces significant challenges in maintaining consistent object identities across varying camera perspectives, particularly when precise calibration and extensive annotations are required. In this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Ruiqi Xian , Deep Patel , Iain Melvin , Sanjoy Kundu , Martin Renqiang Min , Dinesh Manocha

Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis

Clinical diagnosis requires sequential evidence acquisition under uncertainty. However, most Large Language Model (LLM) based diagnostic systems assume fully observed patient information and therefore do not explicitly model how clinical…

Artificial Intelligence · Computer Science 2026-04-08 Xuyang Shen , Haoran Liu , Dongjin Song , Martin Renqiang Min

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xiaoxiao He , Quan Dao , Ligong Han , Song Wen , Minhao Bai , Di Liu , Han Zhang , Martin Renqiang Min , Felix Juefei-Xu , Chaowei Tan , Bo Liu , Kang Li , Hongdong Li , Junzhou Huang , Faez Ahmed , Akash Srivastava , Dimitris Metaxas

EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation

Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models, have shown improved performance,…

Computation and Language · Computer Science 2025-11-11 Kai Zhang , Christopher Malon , Lichao Sun , Martin Renqiang Min

Object-Aware 4D Human Motion Generation

Recent advances in video diffusion models have enabled the generation of high-quality videos. However, these videos still suffer from unrealistic deformations, semantic violations, and physical inconsistencies that are largely rooted in the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Shurui Gui , Deep Anil Patel , Xiner Li , Martin Renqiang Min

PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets…

Machine Learning · Computer Science 2025-11-03 Zhenqiao Song , Tiaoxiao Li , Lei Li , Martin Renqiang Min

Group Relative Augmentation for Data Efficient Action Detection

Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges like overfitting and the granularity mismatch between scene-level pre-training and required person-centric understanding. We propose…

Computer Vision and Pattern Recognition · Computer Science 2025-07-30 Deep Anil Patel , Iain Melvin , Zachary Izzo , Martin Renqiang Min

Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior…

Machine Learning · Computer Science 2025-06-06 Haoxuan Chen , Yinuo Ren , Martin Renqiang Min , Lexing Ying , Zachary Izzo

Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation

Multimodal Large Language Models (MLLMs) have shown impressive performance in vision and text tasks. However, hallucination remains a major challenge, especially in fields like healthcare where details are critical. In this work, we show…

Computation and Language · Computer Science 2025-02-24 Yun-Wei Chu , Kai Zhang , Christopher Malon , Martin Renqiang Min

Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models

When performing complex multi-step reasoning tasks, the ability of Large Language Models (LLMs) to derive structured intermediate proof steps is important for ensuring that the models truly perform the desired reasoning and for improving…

Computation and Language · Computer Science 2025-01-31 Zi'ou Zheng , Christopher Malon , Martin Renqiang Min , Xiaodan Zhu

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

We consider the conditional generation of 3D drug-like molecules with \textit{explicit control} over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikeness or Synthetic Accessibility score) and…

Machine Learning · Computer Science 2024-12-20 Haoran Liu , Youzhi Luo , Tianxiao Li , James Caverlee , Martin Renqiang Min

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos from a fixed set of…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Wentao Bao , Kai Li , Yuxiao Chen , Deep Patel , Martin Renqiang Min , Yu Kong

Variational methods for Learning Multilevel Genetic Algorithms using the Kantorovich Monad

Levels of selection and multilevel evolutionary processes are essential concepts in evolutionary theory, and yet there is a lack of common mathematical models for these core ideas. Here, we propose a unified mathematical framework for…

Populations and Evolution · Quantitative Biology 2024-11-18 Jonathan Warrell , Francesco Alesiani , Cameron Smith , Anja Mösch , Martin Renqiang Min

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Yuxiao Chen , Kai Li , Wentao Bao , Deep Patel , Yu Kong , Martin Renqiang Min , Dimitris N. Metaxas

Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization

Compositional 3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games, as it closely mirrors the complexity of real-world multi-object environments. Conventional works typically…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Yao Wei , Martin Renqiang Min , George Vosselman , Li Erran Li , Michael Ying Yang

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

In this paper, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Kumaranage Ravindu Yasas Nagasinghe , Honglu Zhou , Malitha Gunawardhana , Martin Renqiang Min , Daniel Harari , Muhammad Haris Khan

Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering

In protein biophysics, the separation between the functionally important residues (forming the active site or binding surface) and those that create the overall structure (the fold) is a well-established and fundamental concept. Identifying…

Biomolecules · Quantitative Biology 2023-10-17 Tianxiao Li , Hongyu Guo , Filippo Grazioli , Mark Gerstein , Martin Renqiang Min

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Haomiao Ni , Changhao Shi , Kai Li , Sharon X. Huang , Martin Renqiang Min