Related papers: Single-pass Adaptive Image Tokenization for Minimu…

CAT: Content-Adaptive Image Tokenization

Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity. To address this, we introduce Content-Adaptive Tokenizer (CAT), which dynamically adjusts…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Junhong Shen , Kushal Tirumala , Michihiro Yasunaga , Ishan Misra , Luke Zettlemoyer , Lili Yu , Chunting Zhou

Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models

Vision transformers in vision-language models typically use the same amount of compute for every image, regardless of whether it is simple or complex. We propose ICAR (Image Complexity-Aware Retrieval), an adaptive computation approach that…

Information Retrieval · Computer Science 2026-01-16 Mikel Williams-Lekuona , Georgina Cosma

Self-supervised and Weakly Supervised Contrastive Learning for Frame-wise Action Representations

Previous work on action representation learning focused on global representations for short video clips. In contrast, many practical applications, such as video alignment, strongly demand learning the intensive representation of long…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Minghao Chen , Renbo Tu , Chenxi Huang , Yuqi Lin , Boxi Wu , Deng Cai

CALLIC: Content Adaptive Learning for Lossless Image Compression

Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Daxin Li , Yuanchao Bai , Kai Wang , Junjun Jiang , Xianming Liu , Wen Gao

Adaptive Length Image Tokenization via Recurrent Allocation

Current vision systems typically assign fixed-length representations to images, regardless of the information content. This contrasts with human intelligence - and even large language models - which allocate varying representational…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Shivam Duggal , Phillip Isola , Antonio Torralba , William T. Freeman

Representation Learning via Consistent Assignment of Views to Clusters

We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive…

Machine Learning · Computer Science 2023-10-23 Thalles Silva , Adín Ramírez Rivera

CARL: Criticality-Aware Agentic Reinforcement Learning

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm…

Machine Learning · Computer Science 2026-05-12 Leyang Shen , Yang Zhang , Chun Kai Ling , Xiaoyan Zhao , Tat-Seng Chua

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Prior works on action representation learning mainly focus on designing various architectures to extract the global representations for short video clips. In contrast, many practical applications such as video alignment have strong demand…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Minghao Chen , Fangyun Wei , Chong Li , Deng Cai

InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression

Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a significant bottleneck for current tokenizers, which…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Haotian Ye , Qiyuan He , Jiaqi Han , Puheng Li , Jiaojiao Fan , Zekun Hao , Fitsum Reda , Yogesh Balaji , Huayu Chen , Sheng Liu , Angela Yao , James Zou , Stefano Ermon , Haoxiang Wang , Ming-Yu Liu

Make A Long Image Short: Adaptive Token Length for Vision Transformers

The vision transformer splits each image into a sequence of tokens with fixed length and processes the tokens in the same way as words in natural language processing. More tokens normally lead to better performance but considerably…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Yichen Zhu , Yuqin Zhu , Jie Du , Yi Wang , Zhicai Ou , Feifei Feng , Jian Tang

Soft Tail-dropping for Adaptive Visual Tokenization

We present Soft Tail-dropping Adaptive Tokenizer (STAT), a 1D discrete visual tokenizer that adaptively chooses the number of output tokens per image according to its structural complexity and level of detail. STAT encodes an image into a…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Zeyuan Chen , Kai Zhang , Zhuowen Tu , Yuanjun Xiong

Make A Long Image Short: Adaptive Token Length for Vision Transformers

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in…

Machine Learning · Computer Science 2023-07-06 Qiqi Zhou , Yichen Zhu

CARL: Content-Aware Representation Learning for Heterogeneous Networks

Heterogeneous networks not only present a challenge of heterogeneity in the types of nodes and relations, but also the attributes and content associated with the nodes. While recent works have looked at representation learning on…

Social and Information Networks · Computer Science 2018-05-15 Chuxu Zhang , Ananthram Swami , Nitesh V. Chawla

KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding

Knowledge-Intensive Visual Grounding (KVG) requires models to localize objects using fine-grained, domain-specific entity names rather than generic referring expressions. Although Multimodal Large Language Models (MLLMs) possess rich entity…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Xinyu Ma , Ziyang Ding , Zhicong Luo , Chi Chen , Zonghao Guo , Derek F. Wong , Zhen Zhao , Xiaoyi Feng , Maosong Sun

A theory of incremental compression

The ability to find short representations, i.e. to compress data, is crucial for many intelligent systems. We present a theory of incremental compression showing that arbitrary data strings, that can be described by a set of features, can…

Information Theory · Computer Science 2020-09-15 Arthur Franz , Oleksandr Antonenko , Roman Soletskyi

KARMA: Efficient Structural Defect Segmentation via Kolmogorov-Arnold Representation Learning

Semantic segmentation of structural defects in civil infrastructure remains challenging due to variable defect appearances, harsh imaging conditions, and significant class imbalance. Current deep learning methods, despite their…

Computer Vision and Pattern Recognition · Computer Science 2025-11-10 Md Meftahul Ferdaus , Mahdi Abdelguerfi , Elias Ioup , Steven Sloan , Kendall N. Niles , Ken Pathak

TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) are becoming increasingly popular, while the high computational cost associated with multimodal data input, particularly from visual tokens, poses a significant challenge. Existing training-based…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Xudong Tan , Peng Ye , Chongjun Tu , Jianjian Cao , Yaoxin Yang , Lin Zhang , Dongzhan Zhou , Tao Chen

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Xin Xiao , Bohong Wu , Jiacong Wang , Chunyuan Li , Xun Zhou , Haoyuan Guo

CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

Human-driven vehicles (HVs) exhibit complex and diverse behaviors. Accurately modeling such behavior is crucial for validating Robot Vehicles (RVs) in simulation and realizing the potential of mixed traffic control. However, existing…

Robotics · Computer Science 2024-07-10 Bibek Poudel , Weizi Li , Shuai Li

One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression

Current image tokenization methods require a large number of tokens to capture the information contained within images. Although the amount of information varies across images, most image tokenizers only support fixed-length tokenization,…

Computer Vision and Pattern Recognition · Computer Science 2025-01-20 Keita Miwa , Kento Sasaki , Hidehisa Arai , Tsubasa Takahashi , Yu Yamaguchi