English
Related papers

Related papers: High Efficiency Image Compression for Large Visual…

200 papers

The rapid success of Vision Large Language Models (VLLMs) often depends on the high-resolution images with abundant visual tokens, which hinders training and deployment efficiency. Current training-free visual token compression methods…

Computer Vision and Pattern Recognition · Computer Science 2025-02-27 Jianjian Li , Junquan Fan , Feng Tang , Gang Huang , Shitao Zhu , Songlin Liu , Nian Xie , Wulong Liu , Yong Liao

This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Chia-Hao Kao , Cheng Chien , Yu-Jen Tseng , Yi-Hsin Chen , Alessandro Gnutti , Shao-Yuan Lo , Wen-Hsiao Peng , Riccardo Leonardi

The application of Large Vision-Language Models (LVLMs) for analyzing images and videos is an exciting and rapidly evolving field. In recent years, we've seen significant growth in high-quality image-text datasets for fine-tuning image…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Han Wang , Yuxiang Nie , Yongjie Ye , Deng GuanYu , Yanjie Wang , Shuai Li , Haiyang Yu , Jinghui Lu , Can Huang

With the rapid development of Vision-Language Models (VLMs) and the growing demand for their applications, efficient compression of the image inputs has become increasingly important. Existing VLMs predominantly digest and understand…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Zifu Zhang , Tongda Xu , Siqi Li , Shengxi Li , Yue Zhang , Mai Xu , Yan Wang

The rapid progress of large Vision-Language Models (VLMs) has enabled a wide range of applications, such as image understanding and Visual Question Answering (VQA). Query images are often uploaded to the cloud, where VLMs are typically…

Image and Video Processing · Electrical Eng. & Systems 2026-04-02 Bardia Azizian , Ivan V. Bajic

Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Wenhan Yang , Haofeng Huang , Yueyu Hu , Ling-Yu Duan , Jiaying Liu

Multimodal Large Language Models (MLLMs) encounter significant computational and memory bottlenecks from the massive number of visual tokens generated by high-resolution images or multi-image inputs. Previous token compression techniques…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Jiaying Zhu , Yurui Zhu , Xin Lu , Wenrui Yan , Dong Li , Kunlin Liu , Xueyang Fu , Zheng-Jun Zha

This research introduces a transformative framework for integrating Vision-Enhanced Large Language Models (LLMs) with advanced transformer-based architectures to tackle challenges in high-resolution image synthesis and multimodal data…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Karthikeya KV

In recent years, large-scale vision-language models (VLMs) have demonstrated remarkable performance on multimodal understanding and reasoning tasks. However, handling high-dimensional visual features often incurs substantial computational…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Xiaoyang Guo , Keze Wang

Although Large Vision Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, their scalability and deployment are constrained by massive computational requirements. In particular, the massive amount of…

Machine Learning · Computer Science 2026-04-14 Surendra Pathak , Bo Han

Existing visual token compression methods for Multimodal Large Language Models (MLLMs) predominantly operate as post-encoder modules, limiting their potential for efficiency gains. To address this limitation, we propose LaCo (Layer-wise…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Juntao Liu , Liqiang Niu , Wenchao Chen , Jie Zhou , Fandong Meng

There has been a growing trend in compressing and transmitting videos from terminals for machine vision tasks. Nevertheless, most video coding optimization method focus on minimizing distortion according to human perceptual metrics,…

Multimedia · Computer Science 2025-12-18 Fei Zhao , Mengxi Guo , Shijie Zhao , Junlin Li , Li Zhang , Xiaodong Xie

Large Language Models (LLMs) have achieved remarkable success in source code understanding, yet as software systems grow in scale, computational efficiency has become a critical bottleneck. Currently, these models rely on a text-based…

Computation and Language · Computer Science 2026-04-29 Yuling Shi , Chaoxiang Xie , Zhensu Sun , Yeheng Chen , Chenxu Zhang , Longfei Yun , Chengcheng Wan , Hongyu Zhang , David Lo , Xiaodong Gu

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding tasks. However, the increasing demand for high-resolution image and long-video understanding results in substantial token counts,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Junjie Chen , Xuyang Liu , Zichen Wen , Yiyu Wang , Siteng Huang , Honggang Chen

In recent years, neural network-based image compression techniques have been able to outperform traditional codecs and have opened the gates for the development of learning-based video codecs. However, to take advantage of the high temporal…

Image and Video Processing · Electrical Eng. & Systems 2020-08-25 Aishwarya Jadhav

Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Roy Xie , Dan Friedman , Donghan Yu , Bowen Pan , Christopher Fifty , Jang-Hyun Kim , Xianzhi Du , Zhe Gan , Vivek Rathod , Bhuwan Dhingra

Long video understanding is inherently challenging for vision-language models (VLMs) because of the extensive number of frames. With each video frame typically expanding into tens or hundreds of tokens, the limited context length of large…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Zheyu Zhang , Ziqi Pang , Shixing Chen , Xiang Hao , Vimal Bhat , Yu-Xiong Wang

A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to…

Computer Vision and Pattern Recognition · Computer Science 2016-03-03 George Toderici , Sean M. O'Malley , Sung Jin Hwang , Damien Vincent , David Minnen , Shumeet Baluja , Michele Covell , Rahul Sukthankar

Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to jointly optimize efficiency…

Recent advancements in deep learning have driven significant progress in lossless image compression. With the emergence of Large Language Models (LLMs), preliminary attempts have been made to leverage the extensive prior knowledge embedded…

Image and Video Processing · Electrical Eng. & Systems 2025-02-25 Junhao Du , Chuqin Zhou , Ning Cao , Gang Chen , Yunuo Chen , Zhengxue Cheng , Li Song , Guo Lu , Wenjun Zhang
‹ Prev 1 2 3 10 Next ›