English
Related papers

Related papers: Modular Linear Tokenization (MLT)

200 papers

This work addresses the task of weakly-supervised object localization. The goal is to learn object localization using only image-level class labels, which are much easier to obtain compared to bounding box annotations. This task is…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 David Kim , Sinhae Cha , Byeongkeun Kang

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT…

The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities. Such proficiency makes VLMs well-suited for real-world applications,…

Artificial Intelligence · Computer Science 2025-10-15 Aniello Panariello , Emanuele Frascaroli , Pietro Buzzega , Lorenzo Bonicelli , Angelo Porrello , Simone Calderara

Binary quantization represents the most extreme form of compression, reducing weights to +/-1 for maximal memory and computational efficiency. While recent sparsity-aware binarization achieves sub-1-bit compression via weight pruning, it…

Machine Learning · Computer Science 2026-04-10 Hao Gu , Lujun Li , Hao Wang , Lei Wang , Zheyu Wang , Bei Liu , Jiacheng Liu , Qiyuan Zhu , Sirui Han , Yike Guo

Frame rate is a crucial consideration in cardiac ultrasound imaging and 3D sonography. Several methods have been proposed in the medical ultrasound literature aiming at accelerating the image acquisition. In this paper, we consider one such…

Computer Vision and Pattern Recognition · Computer Science 2018-08-24 Sanketh Vedula , Ortal Senouf , Grigoriy Zurakhov , Alex M. Bronstein , Michael Zibulevsky , Oleg Michailovich , Dan Adam , Diana Gaitini

Transformers are slow to train on videos due to extremely large numbers of input tokens, even though many video tokens are repeated over time. Existing methods to remove such uninformative tokens either have significant overhead, negating…

Computer Vision and Pattern Recognition · Computer Science 2024-11-11 Rohan Choudhury , Guanglei Zhu , Sihan Liu , Koichiro Niinuma , Kris M. Kitani , László Jeni

High\-cardinality categorical variables pose significant challenges in machine learning, particularly in terms of computational efficiency and model interpretability. Traditional one\-hot encoding often results in high\-dimensional sparse…

Machine Learning · Computer Science 2025-01-13 Zixuan Liang

Multimodal Large Language Models (MLLMs) have shown immense promise in universal multimodal retrieval, which aims to find relevant items of various modalities for a given query. But their practical application is often hindered by the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Qi Li , Yanzhe Zhao , Yongxin Zhou , Yameng Wang , Yandong Yang , Yuanjia Zhou , Jue Wang , Zuojian Wang , Jinxiang Liu

Multimodal Large Language Models (MLLMs) encounter significant computational and memory bottlenecks from the massive number of visual tokens generated by high-resolution images or multi-image inputs. Previous token compression techniques…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Jiaying Zhu , Yurui Zhu , Xin Lu , Wenrui Yan , Dong Li , Kunlin Liu , Xueyang Fu , Zheng-Jun Zha

Recently, multimodal large language models (MLLMs) have emerged as a key approach in achieving artificial general intelligence. In particular, vision-language MLLMs have been developed to generate not only text but also visual outputs from…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Donghwan Chi , Hyomin Kim , Yoonjin Oh , Yongjin Kim , Donghoon Lee , Daejin Jo , Jongmin Kim , Junyeob Baek , Sungjin Ahn , Sungwoong Kim

Self-attention and transformers have been widely used in deep learning. Recent efforts have been devoted to incorporating transformer blocks into different neural architectures, including those with convolutions, leading to various visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Yancheng Wang , Yingzhen Yang

Purpose: Magnetic polarizability tensors (MPTs) provide an economical characterisation of conducting magnetic metallic objects and their spectral signature can aid in the solution of metal detection inverse problems, such as scrap metal…

Numerical Analysis · Mathematics 2024-05-01 James Elgy , Paul D. Ledger

Lithography, transferring chip design masks to the silicon wafer, is the most important phase in modern semiconductor manufacturing flow. Due to the limitations of lithography systems, Extensive design optimizations are required to tackle…

Machine Learning · Computer Science 2024-05-07 Haoyu Yang , Haoxing Ren

Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image. We propose a modular superpixel tokenization strategy which decouples tokenization and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Marius Aasan , Odd Kolbjørnsen , Anne Schistad Solberg , Adín Ramirez Rivera

Automated front-end engineering drastically reduces development cycles and minimizes manual coding overhead. While Generative AI has shown promise in translating designs to code, current solutions often produce monolithic scripts, failing…

Information Retrieval · Computer Science 2025-12-23 Chong Liu , Ming Zhang , Fei Li , Hao Zhou , Xiaoshuang Chen , Ye Yuan

Pre-trained Transformer models like T5 and BART have advanced the state of the art on a wide range of text generation tasks. Compressing these models into smaller ones has become critically important for practical use. Common neural network…

Computation and Language · Computer Science 2023-06-06 Wangchunshu Zhou , Ronan Le Bras , Yejin Choi

The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Renjie Pi , Lewei Yao , Jiahui Gao , Jipeng Zhang , Tong Zhang

Target encoding is an effective technique to deliver better performance for conventional machine learning methods, and recently, for deep neural networks as well. However, the existing target encoding approaches require significant increase…

Machine Learning · Computer Science 2019-10-22 Mayoore S. Jaiswal , Bumsoo Kang , Jinho Lee , Minsik Cho

In this paper, we propose Mixture of Layer-Wise Tokens (MoLT), a parameter- and memory-efficient adaptation framework for audio-visual learning. The key idea of MoLT is to replace conventional, computationally heavy sequential adaptation at…

Sound · Computer Science 2025-12-02 Kyeongha Rho , Hyeongkeun Lee , Jae Won Cho , Joon Son Chung

The Mapbox Vector Tile (MVT) format is widely considered the leading open standard for large-scale map visualization, as evidenced by its widespread adoption by major technology companies such as AWS, Meta, and Microsoft for their products…

Information Theory · Computer Science 2025-08-15 Markus Tremmel , Roland Zink
‹ Prev 1 2 3 10 Next ›