English
Related papers

Related papers: Tokenizing Semantic Segmentation with Run Length E…

200 papers

Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Mengcheng Lan , Chaofeng Chen , Yue Zhou , Jiaxing Xu , Yiping Ke , Xinjiang Wang , Litong Feng , Wayne Zhang

Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks. However, effectively integrating image segmentation into these models remains a significant challenge. In this work, we propose a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Mengcheng Lan , Chaofeng Chen , Jiaxing Xu , Zongrui Li , Yiping Ke , Xudong Jiang , Yingchen Yu , Yunqing Zhao , Song Bai

Tokenization plays a critical role in language modeling, yet existing approaches such as Byte-Pair Encoding (BPE) or WordPiece operate purely on frequency statistics, ignoring the underlying semantic structure of text. This leads to…

Computation and Language · Computer Science 2025-08-22 Dong Liu , Yanxuan Yu

Recent segmentation methods leveraging Multi-modal Large Language Models (MLLMs) have shown reliable object-level segmentation and enhanced spatial perception. However, almost all previous methods predominantly rely on specialist mask…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Anqi Zhang , Xiaokang Ji , Guangyu Gao , Jianbo Jiao , Chi Harold Liu , Yunchao Wei

Temporally localizing user-queried events through natural language is a crucial capability for video models. Recent methods predominantly adapt video LLMs to generate event boundary timestamps for temporal localization tasks, which struggle…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Zongshang Pang , Mayu Otani , Yuta Nakashima

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Zechen Bai , Tong He , Haiyang Mei , Pichao Wang , Ziteng Gao , Joya Chen , Lei Liu , Zheng Zhang , Mike Zheng Shou

Segmenting an object in a video presents significant challenges. Each pixel must be accurately labelled, and these labels must remain consistent across frames. The difficulty increases when the segmentation is with arbitrary granularity,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-20 Amirhossein Alimohammadi , Sauradip Nag , Saeid Asgari Taghanaki , Andrea Tagliasacchi , Ghassan Hamarneh , Ali Mahdavi Amiri

Gloss-free Sign Language Translation (SLT) has advanced rapidly, achieving strong performances without relying on gloss annotations. However, these gains have often come with increased model complexity and high computational demands,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 JianHe Low , Ozge Mercanoglu Sincan , Richard Bowden

Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more. For most languages, this tokenization step straightforward.…

Computation and Language · Computer Science 2022-03-22 Idan Brusilovsky , Reut Tsarfaty

With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Zhichao Huang , Chutong Meng , Tom Ko

How to learn discriminative video representation from unlabeled videos is challenging but crucial for video analysis. The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Xinyu Sun , Peihao Chen , Liangwei Chen , Changhao Li , Thomas H. Li , Mingkui Tan , Chuang Gan

Source code segmentation, dividing code into functionally coherent segments, is crucial for knowledge retrieval and maintenance in software development. While enabling efficient navigation and comprehension of large codebases, manual and…

Software Engineering · Computer Science 2025-07-15 Abdelhalim Dahou , Ansgar Scherp , Sebastian Kurten , Brigitte Mathiak , Madhu Chauhan

Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and…

Machine Learning · Computer Science 2018-09-20 Oliver Nina , Washington Garcia , Scott Clouse , Alper Yilmaz

Semantic segmentation is a key computer vision task that has been actively researched for decades. In recent years, supervised methods have reached unprecedented accuracy, however they require many pixel-level annotations for every new…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Nir Zabari , Yedid Hoshen

Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally…

Computer Vision and Pattern Recognition · Computer Science 2020-05-19 Mennatullah Siam , Mostafa Gamal , Moemen Abdel-Razek , Senthil Yogamani , Martin Jagersand

Referring Image Segmentation (RIS) aims to segment target objects expressed in natural language within a scene at the pixel level. Various recent RIS models have achieved state-of-the-art performance by generating contextual tokens to model…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Minhyeok Lee , Dogyoon Lee , Jungho Lee , Suhwan Cho , Heeseung Choi , Ig-Jae Kim , Sangyoun Lee

Semantic Segmentation combines two sub-tasks: the identification of pixel-level image masks and the application of semantic labels to those masks. Recently, so-called Foundation Models have been introduced; general models trained on very…

Computer Vision and Pattern Recognition · Computer Science 2023-10-03 David Balaban , Justin Medich , Pranay Gosar , Justin Hart

Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may…

Data Structures and Algorithms · Computer Science 2023-12-29 Xutan Peng , Yi Zhang , Dejia Peng , Jiafa Zhu

Token-based video representation has emerged as a promising approach for enabling large language models (LLMs) to interpret video content. However, existing token reduction techniques, such as pruning and merging, often disrupt essential…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Haichao Zhang , Yun Fu

In this work, we propose a novel approach to densely ground visual entities from a long caption. We leverage a large multimodal model (LMM) to extract semantic nouns, a class-agnostic segmentation model to generate entity-level…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Lu Qi , Yi-Wen Chen , Lehan Yang , Tiancheng Shen , Xiangtai Li , Weidong Guo , Yu Xu , Ming-Hsuan Yang
‹ Prev 1 2 3 10 Next ›