Pascal Tilli — Scifaro

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

Visual Document Retrieval (VDR) models mostly rely on late interaction architectures, in which documents are represented by a set of local patch embeddings and then matched against query tokens. While efficient, this architecture…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Pascal Tilli , Mohsen Mesgar

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to…

Computation and Language · Computer Science 2026-05-08 Esra Dönmez , Pascal Tilli , Hsiu-Yu Yang , Thang Vu , Carina Silberer

Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions

Dual encoder architectures like Clip models map two types of inputs into a shared embedding space and predict similarities between them. Despite their wide application, it is, however, not understood how these models compare their two…

Computer Vision and Pattern Recognition · Computer Science 2025-08-14 Lucas Möller , Pascal Tilli , Ngoc Thang Vu , Sebastian Padó

Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering

Explainable artificial intelligence (XAI) aims to make machine learning models more transparent. While many approaches focus on generating explanations post-hoc, interpretable approaches, which generate the explanations intrinsically…

Computation and Language · Computer Science 2024-12-12 Pascal Tilli , Ngoc Thang Vu

Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc…

Computation and Language · Computer Science 2024-03-28 Pascal Tilli , Ngoc Thang Vu

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also…

Sound · Computer Science 2023-10-27 Florian Lux , Pascal Tilli , Sarina Meyer , Ngoc Thang Vu

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of…

Sound · Computer Science 2022-10-21 Sarina Meyer , Pascal Tilli , Pavel Denisov , Florian Lux , Julia Koch , Ngoc Thang Vu

Speaker Anonymization with Phonetic Intermediate Representations

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using…

Sound · Computer Science 2022-07-12 Sarina Meyer , Florian Lux , Pavel Denisov , Julia Koch , Pascal Tilli , Ngoc Thang Vu

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based…

Computer Vision and Pattern Recognition · Computer Science 2021-10-12 Dirk Väth , Pascal Tilli , Ngoc Thang Vu