Kate Saenko — Scifaro

Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation

Procedural content generation has enabled vast virtual worlds through levels, maps, and quests, but large-scale character generation remains underexplored. We identify two alignment-induced biases in existing methods: a positive moral bias,…

Computation and Language · Computer Science 2026-05-05 Maan Qraitem , Kate Saenko , Bryan A. Plummer

Mull-Tokens: Modality-Agnostic Latent Thinking

Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimodal models exploring the potential of reasoning with images are brittle and do…

Computer Vision and Pattern Recognition · Computer Science 2026-05-01 Arijit Ray , Ahmed Abdelkader , Chengzhi Mao , Bryan A. Plummer , Kate Saenko , Ranjay Krishna , Leonidas Guibas , Wen-Sheng Chu

BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework for infant-inspired vision-language modeling that…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Shengao Wang , Wenqi Wang , Zecheng Wang , Max Whitton , Michael Wakeham , Arjun Chandra , Joey Huang , Pengyue Zhu , Helen Chen , David Li , Jeffrey Li , Shawn Li , Andrew Zagula , Amy Zhao , Andrew Zhu , Sayaka Nakamura , Yuki Yamamoto , Jerry Jun Yokono , Aaron Mueller , Bryan A. Plummer , Kate Saenko , Venkatesh Saligrama , Boqing Gong

SAM 3: Segment Anything with Concepts

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Nicolas Carion , Laura Gustafson , Yuan-Ting Hu , Shoubhik Debnath , Ronghang Hu , Didac Suris , Chaitanya Ryali , Kalyan Vasudev Alwala , Haitham Khedr , Andrew Huang , Jie Lei , Tengyu Ma , Baishan Guo , Arpit Kalla , Markus Marks , Joseph Greer , Meng Wang , Peize Sun , Roman Rädle , Triantafyllos Afouras , Effrosyni Mavroudi , Katherine Xu , Tsung-Han Wu , Yu Zhou , Liliane Momeni , Rishi Hazra , Shuangrui Ding , Sagar Vaze , Francois Porcher , Feng Li , Siyuan Li , Aishwarya Kamath , Ho Kei Cheng , Piotr Dollár , Nikhila Ravi , Kate Saenko , Pengchuan Zhang , Christoph Feichtenhofer

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Reasoning about motion and space is a fundamental cognitive capability that is required by multiple real-world applications. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Arijit Ray , Jiafei Duan , Ellis Brown , Reuben Tan , Dina Bashkirova , Rose Hendrix , Kiana Ehsani , Aniruddha Kembhavi , Bryan A. Plummer , Ranjay Krishna , Kuo-Hao Zeng , Kate Saenko

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children. A "visual analogy" is an abstract rule inferred from one image and applied to another. While benchmarks exist for…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Eunice Yiu , Maan Qraitem , Anisa Noor Majhi , Charlie Wong , Yutong Bai , Shiry Ginosar , Alison Gopnik , Kate Saenko

The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification

Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Dante Francisco Wasmuht , Otto Brookes , Maximillian Schall , Pablo Palencia , Chris Beirne , Tilo Burghardt , Majid Mirmehdi , Hjalmar Kühl , Mimi Arandjelovic , Sam Pottie , Peter Bermant , Brandon Asheim , Yi Jin Toh , Adam Elzinga , Jason Holmberg , Andrew Whitworth , Eleanor Flatt , Laura Gustafson , Chaitanya Ryali , Yuan-Ting Hu , Baishan Guo , Andrew Westbury , Kate Saenko , Didac Suris

Scaling Up Temporal Domain Generalization via Temporal Experts Averaging

Temporal Domain Generalization (TDG) aims to generalize across temporal distribution shifts, e.g., lexical change over time. Prior work often addresses this by predicting future model weights. However, full model prediction is prohibitively…

Machine Learning · Computer Science 2025-10-01 Aoming Liu , Kevin Miller , Venkatesh Saligrama , Kate Saenko , Boqing Gong , Ser-Nam Lim , Bryan A. Plummer

SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data

Compositionality, or correctly recognizing scenes as compositions of atomic visual concepts, remains difficult for multimodal large language models (MLLMs). Even state of the art MLLMs such as GPT-4o can make mistakes in distinguishing…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Samarth Mishra , Kate Saenko , Venkatesh Saligrama

Federated Adversarial Domain Adaptation

Federated learning improves data privacy and efficiency in machine learning performed over networks of distributed devices, such as mobile phones, IoT and wearable devices, etc. Yet models trained with federated learning can still fail to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Xingchao Peng , Zijun Huang , Yizhe Zhu , Kate Saenko

Web Artifact Attacks Disrupt Vision Language Models

Vision-language models (VLMs) (e.g. CLIP, LLaVA) are trained on large-scale, lightly curated web datasets, leading them to learn unintended correlations between semantic concepts and unrelated visual signals. These associations degrade…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Maan Qraitem , Piotr Teterwak , Kate Saenko , Bryan A. Plummer

Sim-Anchored Learning for On-the-Fly Adaptation

Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design…

Robotics · Computer Science 2025-05-02 Bassel El Mabsout , Shahin Roozkhosh , Siddharth Mysore , Kate Saenko , Renato Mancuso

Is Large-Scale Pretraining the Secret to Good Domain Generalization?

Multi-Source Domain Generalization (DG) is the task of training on multiple source domains and achieving high classification performance on unseen target domains. Recent methods combine robust features from web-scale pretrained backbones…

Computer Vision and Pattern Recognition · Computer Science 2025-04-23 Piotr Teterwak , Kuniaki Saito , Theodoros Tsiligkaridis , Bryan A. Plummer , Kate Saenko

SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models

Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications. Existing approaches require prompt tuning or architectural…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Kevin Miller , Samarth Mishra , Aditya Gangrade , Kate Saenko , Venkatesh Saligrama

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in…

Computer Vision and Pattern Recognition · Computer Science 2025-02-14 Maan Qraitem , Nazia Tasnim , Piotr Teterwak , Kate Saenko , Bryan A. Plummer

OP-LoRA: The Blessing of Dimensionality

Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor…

Machine Learning · Computer Science 2024-12-16 Piotr Teterwak , Kate Saenko , Bryan A. Plummer , Ser-Nam Lim

ERM++: An Improved Baseline for Domain Generalization

Domain Generalization (DG) aims to develop classifiers that can generalize to new, unseen data distributions, a critical capability when collecting new domain-specific data is impractical. A common DG baseline minimizes the empirical risk…

Machine Learning · Computer Science 2024-12-11 Piotr Teterwak , Kuniaki Saito , Theodoros Tsiligkaridis , Kate Saenko , Bryan A. Plummer

Tell Me What's Next: Textual Foresight for Generic UI Representations

Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-09 Andrea Burns , Kate Saenko , Bryan A. Plummer

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

Visual recognition models are prone to learning spurious correlations induced by a biased training set where certain conditions $B$ (\eg, Indoors) are over-represented in certain classes $Y$ (\eg, Big Dogs). Synthetic data from…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Maan Qraitem , Kate Saenko , Bryan A. Plummer

SLANT: Spurious Logo ANalysis Toolkit

Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Maan Qraitem , Piotr Teterwak , Kate Saenko , Bryan A. Plummer