English
Related papers

Related papers: DiffScore: Text Evaluation Beyond Autoregressive L…

200 papers

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

Arbitrary-shaped text detection has recently attracted increasing interests and witnessed rapid development with the popularity of deep learning algorithms. Nevertheless, existing approaches often obtain inaccurate detection results, mainly…

Computer Vision and Pattern Recognition · Computer Science 2021-07-14 Tao Sheng , Zhouhui Lian

Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation.…

Computation and Language · Computer Science 2026-02-02 Mengyu Ye , Ryosuke Takahashi , Keito Kudo , Jun Suzuki

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality, but existing watermarking methods fail on them due to their non-sequential decoding. Unlike autoregressive…

Machine Learning · Computer Science 2025-10-06 Linyu Wu , Linhao Zhong , Wenjie Qu , Yuexin Li , Yue Liu , Shengfang Zhai , Chunhua Shen , Jiaheng Zhang

One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while…

Machine Learning · Computer Science 2026-01-22 Linrui Ma , Yufei Cui , Kai Han , Yunhe Wang

With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each…

Computation and Language · Computer Science 2023-08-30 Heejin Do , Yunsu Kim , Gary Geunbae Lee

Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs…

Computation and Language · Computer Science 2025-02-25 Hyukhun Koh , Minha Jhang , Dohyung Kim , Sangmook Lee , Kyomin Jung

Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs, most existing LLM-based rerankers rely…

Information Retrieval · Computer Science 2026-02-16 Qi Liu , Kun Ai , Jiaxin Mao , Yanzhao Zhang , Mingxin Li , Dingkun Long , Pengjun Xie , Fengbin Zhu , Ji-Rong Wen

Standard autoregressive language models generate text by repeatedly selecting a discrete next token, coupling prediction with irreversible commitment at every step. We show that token selection is not the only viable autoregressive…

Computation and Language · Computer Science 2026-04-07 Oshri Naparstek

Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods…

Computation and Language · Computer Science 2024-10-30 Rakesh R. Menon , Shashank Srivastava

In speech processing pipelines, improving the quality and intelligibility of real-world recordings is crucial. While supervised regression is the primary method for speech enhancement, audio tokenization is emerging as a promising…

Sound · Computer Science 2025-07-18 Luca Della Libera , Cem Subakan , Mirco Ravanelli

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Despite the widespread adoption of autoregressive language models, explainability evaluation research has predominantly focused on span infilling and masked language models. Evaluating the faithfulness of an explanation method -- how…

Computation and Language · Computer Science 2025-03-11 Sepehr Kamahi , Yadollah Yaghoobzadeh

Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error…

Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g.,depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Luozhou Wang , Guibao Shen , Wenhang Ge , Guangyong Chen , Yijun Li , Ying-cong Chen

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models is capable making good predictions yet there is lack of connection between language semantics and prediction results. This…

Computation and Language · Computer Science 2021-12-07 Shaw-Hwa Lo , Yiqiao Yin

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

Text-to-image diffusion models, which are theoretically equivalent to score-based generative models, generate images through a multi-step denoising process guided by text embeddings extracted from pretrained vision-language models such as…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Seung Hyuk Lee , Songkuk Kim
‹ Prev 1 2 3 10 Next ›