Related papers: Modular Linear Tokenization (MLT)

Multiscale Vision Transformer With Deep Clustering-Guided Refinement for Weakly Supervised Object Localization

This work addresses the task of weakly-supervised object localization. The goal is to learn object localization using only image-level class labels, which are much easier to obtain compared to bounding box annotations. This task is…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 David Kim , Sinhae Cha , Byeongkeun Kang

Byte Latent Transformer: Patches Scale Better Than Tokens

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT…

Computation and Language · Computer Science 2024-12-16 Artidoro Pagnoni , Ram Pasunuru , Pedro Rodriguez , John Nguyen , Benjamin Muller , Margaret Li , Chunting Zhou , Lili Yu , Jason Weston , Luke Zettlemoyer , Gargi Ghosh , Mike Lewis , Ari Holtzman , Srinivasan Iyer

Modular Embedding Recomposition for Incremental Learning

The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities. Such proficiency makes VLMs well-suited for real-world applications,…

Artificial Intelligence · Computer Science 2025-10-15 Aniello Panariello , Emanuele Frascaroli , Pietro Buzzega , Lorenzo Bonicelli , Angelo Porrello , Simone Calderara

BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

Binary quantization represents the most extreme form of compression, reducing weights to +/-1 for maximal memory and computational efficiency. While recent sparsity-aware binarization achieves sub-1-bit compression via weight pruning, it…

Machine Learning · Computer Science 2026-04-10 Hao Gu , Lujun Li , Hao Wang , Lei Wang , Zheyu Wang , Bei Liu , Jiacheng Liu , Qiyuan Zhu , Sirui Han , Yike Guo

High quality ultrasonic multi-line transmission through deep learning

Frame rate is a crucial consideration in cardiac ultrasound imaging and 3D sonography. Several methods have been proposed in the medical ultrasound literature aiming at accelerating the image acquisition. In this paper, we consider one such…

Computer Vision and Pattern Recognition · Computer Science 2018-08-24 Sanketh Vedula , Ortal Senouf , Grigoriy Zurakhov , Alex M. Bronstein , Michael Zibulevsky , Oleg Michailovich , Dan Adam , Diana Gaitini

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

Transformers are slow to train on videos due to extremely large numbers of input tokens, even though many video tokens are repeated over time. Existing methods to remove such uninformative tokens either have significant overhead, negating…

Computer Vision and Pattern Recognition · Computer Science 2024-11-11 Rohan Choudhury , Guanglei Zhu , Sihan Liu , Koichiro Niinuma , Kris M. Kitani , László Jeni

Efficient Representations for High-Cardinality Categorical Variables in Machine Learning

High\-cardinality categorical variables pose significant challenges in machine learning, particularly in terms of computational efficiency and model interpretability. Traditional one\-hot encoding often results in high\-dimensional sparse…

Machine Learning · Computer Science 2025-01-13 Zixuan Liang

Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs

Multimodal Large Language Models (MLLMs) have shown immense promise in universal multimodal retrieval, which aims to find relevant items of various modalities for a given query. But their practical application is often hindered by the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Qi Li , Yanzhe Zhao , Yongxin Zhou , Yameng Wang , Yandong Yang , Yuanjia Zhou , Jue Wang , Zuojian Wang , Jinxiang Liu

VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs

Multimodal Large Language Models (MLLMs) encounter significant computational and memory bottlenecks from the massive number of visual tokens generated by high-resolution images or multi-image inputs. Previous token compression techniques…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Jiaying Zhu , Yurui Zhu , Xin Lu , Wenrui Yan , Dong Li , Kunlin Liu , Xueyang Fu , Zheng-Jun Zha

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM

Recently, multimodal large language models (MLLMs) have emerged as a key approach in achieving artificial general intelligence. In particular, vision-language MLLMs have been developed to generate not only text but also visual outputs from…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Donghwan Chi , Hyomin Kim , Yoonjin Oh , Yongjin Kim , Donghoon Lee , Daejin Jo , Jongmin Kim , Junyeob Baek , Sungjin Ahn , Sungwoong Kim

Efficient Visual Transformer by Learnable Token Merging

Self-attention and transformers have been widely used in deep learning. Recent efforts have been devoted to incorporating transformer blocks into different neural architectures, including those with convolutions, leading to various visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Yancheng Wang , Yingzhen Yang

Efficient Computation of Magnetic Polarizability Tensor Spectral Signatures for Object Characterisation in Metal Detection

Purpose: Magnetic polarizability tensors (MPTs) provide an economical characterisation of conducting magnetic metallic objects and their spectral signature can aid in the solution of metal detection inverse problems, such as scrap metal…

Numerical Analysis · Mathematics 2024-05-01 James Elgy , Paul D. Ledger

ILILT: Implicit Learning of Inverse Lithography Technologies

Lithography, transferring chip design masks to the silicon wafer, is the most important phase in modern semiconductor manufacturing flow. Due to the limitations of lithography systems, Extensive design optimizations are required to tackle…

Machine Learning · Computer Science 2024-05-07 Haoyu Yang , Haoxing Ren

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image. We propose a modular superpixel tokenization strategy which decouples tokenization and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Marius Aasan , Odd Kolbjørnsen , Anne Schistad Solberg , Adín Ramirez Rivera

Modular Layout Synthesis (MLS): Front-end Code via Structure Normalization and Constrained Generation

Automated front-end engineering drastically reduces development cycles and minimizes manual coding overhead. While Generative AI has shown promise in translating designs to code, current solutions often produce monolithic scripts, failing…

Information Retrieval · Computer Science 2025-12-23 Chong Liu , Ming Zhang , Fei Li , Hao Zhou , Xiaoshuang Chen , Ye Yuan

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

Pre-trained Transformer models like T5 and BART have advanced the state of the art on a wide range of text generation tasks. Compressing these models into smaller ones has become critically important for practical use. Common neural network…

Computation and Language · Computer Science 2023-06-06 Wangchunshu Zhou , Ronan Le Bras , Yejin Choi

PerceptionGPT: Effectively Fusing Visual Perception into LLM

The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Renjie Pi , Lewei Yao , Jiahui Gao , Jipeng Zhang , Tong Zhang

Target encoding is an effective technique to deliver better performance for conventional machine learning methods, and recently, for deep neural networks as well. However, the existing target encoding approaches require significant increase…

Machine Learning · Computer Science 2019-10-22 Mayoore S. Jaiswal , Bumsoo Kang , Jinho Lee , Minsik Cho

MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning

In this paper, we propose Mixture of Layer-Wise Tokens (MoLT), a parameter- and memory-efficient adaptation framework for audio-visual learning. The key idea of MoLT is to replace conventional, computationally heavy sequential adaptation at…

Sound · Computer Science 2025-12-02 Kyeongha Rho , Hyeongkeun Lee , Jae Won Cho , Joon Son Chung

MapLibre Tile: A Next Generation Vector Tile Format

The Mapbox Vector Tile (MVT) format is widely considered the leading open standard for large-scale map visualization, as evidenced by its widespread adoption by major technology companies such as AWS, Meta, and Microsoft for their products…

Information Theory · Computer Science 2025-08-15 Markus Tremmel , Roland Zink