Related papers: DiffScore: Text Evaluation Beyond Autoregressive L…

DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

Bidirectional Regression for Arbitrary-Shaped Text Detection

Arbitrary-shaped text detection has recently attracted increasing interests and witnessed rapid development with the popularity of deep learning algorithms. Nevertheless, existing approaches often obtain inaccurate detection results, mainly…

Computer Vision and Pattern Recognition · Computer Science 2021-07-14 Tao Sheng , Zhouhui Lian

Relaxing Positional Alignment in Masked Diffusion Language Models

Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation.…

Computation and Language · Computer Science 2026-02-02 Mengyu Ye , Ryosuke Takahashi , Keito Kudo , Jun Suzuki

Exploring Discrete Diffusion Models for Image Captioning

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

DMark: Order-Agnostic Watermarking for Diffusion Large Language Models

Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality, but existing watermarking methods fail on them due to their non-sequential decoding. Unlike autoregressive…

Machine Learning · Computer Science 2025-10-06 Linyu Wu , Linhao Zhong , Wenjie Qu , Yuexin Li , Yue Liu , Shengfang Zhai , Chunhua Shen , Jiaheng Zhang

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while…

Machine Learning · Computer Science 2026-01-22 Linrui Ma , Yufei Cui , Kai Han , Yunhe Wang

Score-balanced Loss for Multi-aspect Pronunciation Assessment

With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each…

Computation and Language · Computer Science 2023-08-30 Heejin Do , Yunsu Kim , Gary Geunbae Lee

Conditional [MASK] Discrete Diffusion Language Model

Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs…

Computation and Language · Computer Science 2025-02-25 Hyukhun Koh , Minha Jhang , Dohyung Kim , Sangmook Lee , Kyomin Jung

DiffuRank: Effective Document Reranking with Diffusion Language Models

Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs, most existing LLM-based rerankers rely…

Information Retrieval · Computer Science 2026-02-16 Qi Liu , Kun Ai , Jiaxin Mao , Yanzhao Zhang , Mingxin Li , Dingkun Long , Pengjun Xie , Fengbin Zhu , Ji-Rong Wen

Projected Autoregression: Autoregressive Language Generation in Continuous State Space

Standard autoregressive language models generate text by repeatedly selecting a discrete next token, coupling prediction with irreversible commitment at every step. We show that token selection is not the only viable autoregressive…

Computation and Language · Computer Science 2026-04-07 Oshri Naparstek

DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers

Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods…

Computation and Language · Computer Science 2024-10-30 Rakesh R. Menon , Shashank Srivastava

Autoregressive Speech Enhancement via Acoustic Tokens

In speech processing pipelines, improving the quality and intelligibility of real-world recordings is crucial. While supervised regression is the primary method for speech enhancement, audio tokenization is emerging as a promising…

Sound · Computer Science 2025-07-18 Luca Della Libera , Cem Subakan , Mirco Ravanelli

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models

Despite the widespread adoption of autoregressive language models, explainability evaluation research has predominantly focused on span infilling and masked language models. Evaluating the faithfulness of an explanation method -- how…

Computation and Language · Computer Science 2025-03-11 Sepehr Kamahi , Yadollah Yaghoobzadeh

Training-Free Self-Correction for Multimodal Masked Diffusion Models

Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error…

Machine Learning · Statistics 2026-02-04 Yidong Ouyang , Panwen Hu , Zhengyan Wan , Zhe Wang , Liyan Xie , Dmitriy Bespalov , Ying Nian Wu , Guang Cheng , Hongyuan Zha , Qiang Sun

Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models

Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g.,depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Luozhou Wang , Guibao Shen , Wenhang Ge , Guangyong Chen , Yijun Li , Ying-cong Chen

Language Semantics Interpretation with an Interaction-based Recurrent Neural Networks

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models is capable making good predictions yet there is lack of connection between language semantics and prediction results. This…

Computation and Language · Computer Science 2021-12-07 Shaw-Hwa Lo , Yiqiao Yin

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

DebFilter: Eradicating Biases Stashed in Value

Text-to-image diffusion models, which are theoretically equivalent to score-based generative models, generate images through a multi-step denoising process guided by text embeddings extracted from pretrained vision-language models such as…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Seung Hyuk Lee , Songkuk Kim