English
Related papers

Related papers: DPAR: Dynamic Patchification for Efficient Autoreg…

200 papers

Autoregressive models, built based on the Next Token Prediction (NTP) paradigm, show great potential in developing a unified framework that integrates both language and vision tasks. Pioneering works introduce NTP to autoregressive visual…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Yatian Pang , Peng Jin , Shuo Yang , Bin Lin , Bin Zhu , Zhenyu Tang , Liuhan Chen , Francis E. H. Tay , Ser-Nam Lim , Harry Yang , Li Yuan

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Zhuoyang Zhang , Luke J. Huang , Chengyue Wu , Shang Yang , Kelly Peng , Yao Lu , Song Han

Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Yuqing Wang , Shuhuai Ren , Zhijie Lin , Yujin Han , Haoyuan Guo , Zhenheng Yang , Difan Zou , Jiashi Feng , Xihui Liu

In this paper, we explore a new generative approach for learning visual representations. Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We find that training with Mean Squared Error (MSE)…

Machine Learning · Computer Science 2024-06-05 Yazhe Li , Jorg Bornschein , Ting Chen

This paper presents Diffusion via Autoregressive models (D-AR), a new paradigm recasting the image diffusion process as a vanilla autoregressive procedure in the standard next-token-prediction fashion. We start by designing the tokenizer…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Ziteng Gao , Mike Zheng Shou

Autoregressive (AR) approaches, which represent images as sequences of discrete tokens from a finite codebook, have achieved remarkable success in image generation. However, the quantization process and the limited codebook size inevitably…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Jinyuan Hu , Jiayou Zhang , Shaobo Cui , Kun Zhang , Guangyi Chen

This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Yiheng Liu , Liao Qu , Huichao Zhang , Xu Wang , Yi Jiang , Yiming Gao , Hu Ye , Xian Li , Shuai Wang , Daniel K. Du , Fangmin Chen , Zehuan Yuan , Xinglong Wu

We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Ziqi Pang , Tianyuan Zhang , Fujun Luan , Yunze Man , Hao Tan , Kai Zhang , William T. Freeman , Yu-Xiong Wang

In this work, we first revisit the sampling issues in current autoregressive (AR) image generation models and identify that image tokens, unlike text tokens, exhibit lower information density and non-uniform spatial distribution.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Xiaoxiao Ma , Feng Zhao , Pengyang Ling , Haibo Qiu , Zhixiang Wei , Hu Yu , Jie Huang , Zhixiong Zeng , Lin Ma

Autoregressive models have recently shown great promise in visual generation by leveraging discrete token sequences akin to language modeling. However, existing approaches often suffer from inefficiency, either due to token-by-token…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Ruiqing Yang , Kaixin Zhang , Zheng Zhang , Shan You , Tao Huang

The raster-ordered image token sequence exhibits a significant Euclidean distance between index-adjacent tokens at line breaks, making it unsuitable for autoregressive generation. To address this issue, this paper proposes Direction-Aware…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Yijia Xu , Jianzhong Ju , Jian Luan , Jinshi Cui

Recent advances in autoregressive (AR) models with continuous tokens for image generation show promising results by eliminating the need for discrete tokenization. However, these models face efficiency challenges due to their sequential…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Zhihang Yuan , Yuzhang Shang , Hanling Zhang , Tongcheng Fang , Rui Xie , Bingxin Xu , Yan Yan , Shengen Yan , Guohao Dai , Yu Wang

We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Jingyuan Qi , Zhiyang Xu , Qifan Wang , Lifu Huang

Autoregressive (AR) models for image generation typically adopt a two-stage paradigm of vector quantization and raster-scan ``next-token prediction", inspired by its great success in language modeling. However, due to the huge modality gap,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Hu Yu , Hao Luo , Hangjie Yuan , Yu Rong , Jie Huang , Feng Zhao

Autoregressive visual generation has garnered increasing attention due to its scalability and compatibility with other modalities compared with diffusion models. Most existing methods construct visual sequences as spatial patches for…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Yuanhui Huang , Weiliang Chen , Wenzhao Zheng , Yueqi Duan , Jie Zhou , Jiwen Lu

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yang Ye , Junliang Guo , Haoyu Wu , Tianyu He , Tim Pearce , Tabish Rashid , Katja Hofmann , Jiang Bian

We introduce ARPG, a novel visual Autoregressive model that enables Randomized Parallel Generation, addressing the inherent limitations of conventional raster-order approaches, which hinder inference efficiency and zero-shot generalization…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Haopeng Li , Jinyue Yang , Guoqi Li , Huan Wang

In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Yefei He , Feng Chen , Yuanyu He , Shaoxuan He , Hong Zhou , Kaipeng Zhang , Bohan Zhuang

Recently, autoregressive (AR) language models have emerged as a dominant approach in speech synthesis, offering expressive generation and scalable training. However, conventional AR speech synthesis models relying on the next-token…

Sound · Computer Science 2025-06-30 Bohan Li , Zhihan Li , Haoran Wang , Hanglei Zhang , Yiwei Guo , Hankun Wang , Xie Chen , Kai Yu

Autoregressive image modeling relies on visual tokenizers to compress images into compact latent representations. We design an end-to-end training pipeline that jointly optimizes reconstruction and generation, enabling direct supervision…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Wenda Chu , Bingliang Zhang , Jiaqi Han , Yizhuo Li , Linjie Yang , Yisong Yue , Qiushan Guo
‹ Prev 1 2 3 10 Next ›