Related papers: Finding Visual Task Vectors

Learning Visual Prompts for Guiding the Attention of Vision Transformers

Visual prompting infuses visual information into the input image to adapt models toward specific predictions and tasks. Recently, manually crafted markers such as red circles are shown to guide the model to attend to a target region on the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Razieh Rezaei , Masoud Jalili Sabet , Jindong Gu , Daniel Rueckert , Philip Torr , Ashkan Khakzar

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time. However, these algorithms are designed for different tasks mostly not…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Xiwen Liang , Minzhe Niu , Jianhua Han , Hang Xu , Chunjing Xu , Xiaodan Liang

Vision-Language Models Create Cross-Modal Task Representations

Autoregressive vision-language models (VLMs) can handle many tasks within a single model, yet the representations that enable this capability remain opaque. We find that VLMs align conceptually equivalent inputs into a shared task vector,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Grace Luo , Trevor Darrell , Amir Bar

Prompting Visual-Language Models for Efficient Video Understanding

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation. This paper presents a simple but…

Computer Vision and Pattern Recognition · Computer Science 2022-07-18 Chen Ju , Tengda Han , Kunhao Zheng , Ya Zhang , Weidi Xie

Visual Prompting via Image Inpainting

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s)…

Computer Vision and Pattern Recognition · Computer Science 2022-09-02 Amir Bar , Yossi Gandelsman , Trevor Darrell , Amir Globerson , Alexei A. Efros

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones to achieve consistency, thus…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Ning Liao , Bowen Shi , Xiaopeng Zhang , Min Cao , Junchi Yan , Qi Tian

Task Vectors in In-Context Learning: Emergence, Formation, and Benefit

In-context learning is a remarkable capability of transformers, referring to their ability to adapt to specific tasks based on a short history or context. Previous research has found that task-specific information is locally encoded within…

Machine Learning · Computer Science 2025-01-17 Liu Yang , Ziqian Lin , Kangwook Lee , Dimitris Papailiopoulos , Robert Nowak

Editing Models with Task Arithmetic

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a…

Machine Learning · Computer Science 2023-04-03 Gabriel Ilharco , Marco Tulio Ribeiro , Mitchell Wortsman , Suchin Gururangan , Ludwig Schmidt , Hannaneh Hajishirzi , Ali Farhadi

Do different prompting methods yield a common task representation in language models?

Demonstrations and instructions are two primary approaches for prompting language models to perform in-context learning (ICL) tasks. Do identical tasks elicited in different ways result in similar representations of the task? An improved…

Computation and Language · Computer Science 2025-12-02 Guy Davidson , Todd M. Gureckis , Brenden M. Lake , Adina Williams

Task2Vec: Task Embedding for Meta-Learning

We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined…

Machine Learning · Computer Science 2019-02-12 Alessandro Achille , Michael Lam , Rahul Tewari , Avinash Ravichandran , Subhransu Maji , Charless Fowlkes , Stefano Soatto , Pietro Perona

Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation

Recent vision-language-action (VLA) models for multi-task robot manipulation often rely on fixed camera setups and shared visual encoders, which limit their performance under occlusions and during cross-task transfer. To address these…

Robotics · Computer Science 2026-03-19 Yongjie Bai , Zhouxia Wang , Yang Liu , Kaijun Luo , Yifan Wen , Mingtong Dai , Weixing Chen , Ziliang Chen , Lingbo Liu , Guanbin Li , Liang Lin

Decomposing Task Vectors for Refined Model Editing

Large pre-trained models have transformed machine learning, yet adapting these models effectively to exhibit precise, concept-specific behaviors remains a significant challenge. Task vectors, defined as the difference between fine-tuned and…

Machine Learning · Computer Science 2025-12-30 Hamed Damirchi , Ehsan Abbasnejad , Zhen Zhang , Javen Shi

Visual Robot Task Planning

Prospection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in…

Robotics · Computer Science 2018-04-03 Chris Paxton , Yotam Barnoy , Kapil Katyal , Raman Arora , Gregory D. Hager

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each…

Computation and Language · Computer Science 2026-05-14 Robert Belanec , Simon Ostermann , Ivan Srba , Maria Bielikova

Learning A Low-Level Vision Generalist via Visual Task Prompt

Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Xiangyu Chen , Yihao Liu , Yuandong Pu , Wenlong Zhang , Jiantao Zhou , Yu Qiao , Chao Dong

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

With the development of Vision Foundation Models (VFMs) in recent years, Visual In-Context Learning (VICL) has become a better choice compared to modifying models in most scenarios. Different from retraining or fine-tuning model, VICL does…

Artificial Intelligence · Computer Science 2025-01-16 Yan Zhu , Huan Ma , Changqing Zhang

Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Ayush Singh , Mansi Gupta , Shivank Garg , Abhinav Kumar , Vansh Agrawal

Explicit Visual Prompting for Universal Foreground Segmentations

Foreground segmentation is a fundamental problem in computer vision, which includes salient object detection, forgery detection, defocus blur detection, shadow detection, and camouflage object detection. Previous works have typically relied…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Weihuang Liu , Xi Shen , Chi-Man Pun , Xiaodong Cun

Is Visual in-Context Learning for Compositional Medical Tasks within Reach?

In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks and adapt to new tasks during test time without re-training. Unlike previous approaches, our focus is on training…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Simon Reiß , Zdravko Marinov , Alexander Jaus , Constantin Seibold , M. Saquib Sarfraz , Erik Rodner , Rainer Stiefelhagen

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images

In daily life, graphic symbols, such as traffic signs and brand logos, are ubiquitously utilized around us due to its intuitive expression beyond language boundary. We tackle an open-set graphic symbol recognition problem by one-shot…

Computer Vision and Pattern Recognition · Computer Science 2019-04-19 Junsik Kim , Tae-Hyun Oh , Seokju Lee , Fei Pan , In So Kweon