Related papers: Collaborative Decoding Makes Visual Auto-Regressiv…

Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression

Visual Autoregressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction approach, which yields substantial improvements in efficiency, scalability, and zero-shot generalization. Nevertheless, the…

Machine Learning · Computer Science 2025-05-27 Kunjun Li , Zigeng Chen , Cheng-Yen Yang , Jenq-Neng Hwang

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

There exists recent work in computer vision, named VAR, that proposes a new autoregressive paradigm for image generation. Diverging from the vanilla next-token prediction, VAR structurally reformulates the image generation into a coarse to…

Computer Vision and Pattern Recognition · Computer Science 2024-11-18 Sucheng Ren , Yaodong Yu , Nataniel Ruiz , Feng Wang , Alan Yuille , Cihang Xie

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

Though rectified flow models have achieved remarkable performance in image, video, and 3D generation, their practical deployments are challenged by slow inference speeds. Prior acceleration methods reuse cached features from previous steps,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Junwen Tan , Jinglin Liang , Hongyuan Chen , Shuangping Huang

Continuous Speculative Decoding for Autoregressive Image Generation

Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation. However, the heavy autoregressive inference burden imposes significant overhead. In Large Language Models (LLMs), speculative decoding…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Zili Wang , Robert Zhang , Kun Ding , Qi Yang , Fei Li , Shiming Xiang

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

Recent years have witnessed the great success of vision transformer (ViT), which has achieved state-of-the-art performance on multiple computer vision benchmarks. However, ViT models suffer from vast amounts of parameters and high…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Guanyu Xu , Zhiwei Hao , Yong Luo , Han Hu , Jianping An , Shiwen Mao

Depth Adaptive Efficient Visual Autoregressive Modeling

Visual Autoregressive (VAR) modeling inefficiently applies a fixed computational depth to each position when generating high-resolution images. While existing methods accelerate inference by pruning tokens using frequency maps, their binary…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Chunliang Li , Tianze Cao , Sanyuan Zhao

Fast Large Language Model Collaborative Decoding via Speculation

Large Language Model (LLM) collaborative decoding techniques improve output quality by combining the outputs of multiple models at each generation step, but they incur high computational costs. In this paper, we introduce Collaborative…

Computation and Language · Computer Science 2025-05-30 Jiale Fu , Yuchu Jiang , Junkai Chen , Jiaming Fan , Xin Geng , Xu Yang

CASCADE: Context-Aware Relaxation for Speculative Image Decoding

Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Selin Yildirim , Subhajit Dutta Chowdhury , Mohammad Mahdi Kamani , Vikram Appia , Deming Chen

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Discrete visual tokenizers transform images into a sequence of tokens, enabling token-based visual generation akin to language models. However, this process is inherently challenging, as it requires both compressing visual signals into a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Zeyu Liu , Zanlin Ni , Yeguo Hua , Xin Deng , Xiao Ma , Cheng Zhong , Gao Huang

Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Wenhan Yang , Haofeng Huang , Yueyu Hu , Ling-Yu Duan , Jiaying Liu

Visual Implicit Autoregressive Modeling

Visual Autoregressive Modeling (VAR) based on next-scale prediction achieves strong generation quality, but their explicit deep stacks fix the amount of computation per scale and inflate memory at high resolutions. We introduce Visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Pengfei Jiang , Jixiang Luo , Luxi Lin , Zhaohong Huang , Xuelong Li

Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples

While inference-time scaling has significantly enhanced generative quality in large language and diffusion models, its application to vector-quantized (VQ) visual autoregressive modeling (VAR) remains unexplored. We introduce VAR-Scaling,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-13 Weidong Tang , Xinyan Wan , Siyu Li , Xiumei Wang

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

Image compression and reconstruction are crucial for various digital applications. While contemporary neural compression methods achieve impressive compression rates, the adoption of such technology has been largely hindered by the…

Machine Learning · Computer Science 2025-10-06 Ethan G. Rogers , Cheng Wang

Learning to Expand Images for Efficient Visual Autoregressive Modeling

Autoregressive models have recently shown great promise in visual generation by leveraging discrete token sequences akin to language modeling. However, existing approaches often suffer from inefficiency, either due to token-by-token…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Ruiqing Yang , Kaixin Zhang , Zheng Zhang , Shan You , Tao Huang

Fast Autoregressive Video Generation with Diagonal Decoding

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yang Ye , Junliang Guo , Haoyu Wu , Tianyu He , Tim Pearce , Tabish Rashid , Katja Hofmann , Jiang Bian

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Keyu Tian , Yi Jiang , Zehuan Yuan , Bingyue Peng , Liwei Wang

CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding

Vision-language models (VLMs) have demonstrated strong capabilities in multimodal perception and reasoning. However, deploying large VLMs on mobile devices remains challenging due to their substantial computational and memory demands. A…

Artificial Intelligence · Computer Science 2026-05-05 Yuanyuan Jia , Shunpu Tang , Qianqian Yang

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Mengqi Huang , Zhendong Mao , Zhuowei Chen , Yongdong Zhang

Non-Autoregressive Coarse-to-Fine Video Captioning

It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Bang Yang , Yuexian Zou , Fenglin Liu , Can Zhang

LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

Visual Autoregressive (VAR) has emerged as a promising approach in image generation, offering competitive potential and performance comparable to diffusion-based models. However, current AR-based visual generation models require substantial…

Computer Vision and Pattern Recognition · Computer Science 2024-11-27 Rui Xie , Tianchen Zhao , Zhihang Yuan , Rui Wan , Wenxi Gao , Zhenhua Zhu , Xuefei Ning , Yu Wang