English
Related papers

Related papers: Contextual Document Embeddings

200 papers

A limitation of modern document retrieval embedding methods is that they typically encode passages (chunks) from the same documents independently, often overlooking crucial contextual information from the rest of the document that could…

Information Retrieval · Computer Science 2025-06-09 Max Conti , Manuel Faysse , Gautier Viaud , Antoine Bosselut , Céline Hudelot , Pierre Colombo

Learning vectors that capture the meaning of concepts remains a fundamental challenge. Somewhat surprisingly, perhaps, pre-trained language models have thus far only enabled modest improvements to the quality of such concept embeddings.…

Computation and Language · Computer Science 2023-05-18 Na Li , Hanane Kteich , Zied Bouraoui , Steven Schockaert

Contrastive learning has been the dominant approach to training dense retrieval models. In this work, we investigate the impact of ranking context - an often overlooked aspect of learning dense retrieval models. In particular, we examine…

Information Retrieval · Computer Science 2023-10-24 George Zerveas , Navid Rekabsaz , Daniel Cohen , Carsten Eickhoff

Text embeddings are numerical representations of text data, where words, phrases, or entire documents are converted into vectors of real numbers. These embeddings capture semantic meanings and relationships between text elements in a…

Information Retrieval · Computer Science 2025-01-20 Fusheng Wei , Robert Neary , Han Qin , Qiang Mao , Jianping Zhang

Embedding-based similarity metrics between text sequences can be influenced not just by the content dimensions we most care about, but can also be biased by spurious attributes like the text's source or language. These document confounders…

Computation and Language · Computer Science 2025-09-25 Yu Fan , Yang Tian , Shauli Ravfogel , Mrinmaya Sachan , Elliott Ash , Alexander Hoyle

Dense retrieval conducts text retrieval in the embedding space and has shown many advantages compared to sparse retrieval. Existing dense retrievers optimize representations of queries and documents with contrastive training and map them to…

Information Retrieval · Computer Science 2021-07-19 Yizhi Li , Zhenghao Liu , Chenyan Xiong , Zhiyuan Liu

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a…

Information Retrieval · Computer Science 2019-05-03 Tolgahan Cakaloglu , Christian Szegedy , Xiaowei Xu

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal…

Computation and Language · Computer Science 2022-04-26 Miaoran Zhang , Marius Mosbach , David Ifeoluwa Adelani , Michael A. Hedderich , Dietrich Klakow

Sentence compression is the task of creating a shorter version of an input sentence while keeping important information. In this paper, we extend the task of compression by deletion with the use of contextual embeddings. Different from…

Information Retrieval · Computer Science 2020-06-08 Minh-Tien Nguyen , Bui Cong Minh , Dung Tien Le , Le Thai Linh

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into…

Computation and Language · Computer Science 2024-10-21 Thomas Uriot

A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words…

Information Retrieval · Computer Science 2016-02-04 Bhaskar Mitra , Eric Nalisnick , Nick Craswell , Rich Caruana

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that…

Computation and Language · Computer Science 2021-10-22 Takashi Wada , Tomoharu Iwata , Yuji Matsumoto , Timothy Baldwin , Jey Han Lau

External knowledge is often useful for natural language understanding tasks. We introduce a contextual text representation model called Conceptual-Contextual (CC) embeddings, which incorporates structured knowledge into text…

Computation and Language · Computer Science 2020-03-13 Xiao Zhang , Dejing Dou , Ji Wu

Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite…

Machine Learning · Computer Science 2024-11-22 Alicja Ziarko , Albert Q. Jiang , Bartosz Piotrowski , Wenda Li , Mateja Jamnik , Piotr Miłoś

In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method…

Computation and Language · Computer Science 2021-01-26 Masahiro Kaneko , Danushka Bollegala

Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural…

Computer Vision and Pattern Recognition · Computer Science 2019-04-11 Noa Garcia , Benjamin Renoust , Yuta Nakashima

Recent work in cross-lingual contextual word embedding learning cannot handle multi-sense words well. In this work, we explore the characteristics of contextual word embeddings and show the link between contextual word embeddings and word…

Computation and Language · Computer Science 2019-09-20 Zheng Zhang , Ruiqing Yin , Jun Zhu , Pierre Zweigenbaum

We study the settings for which deep contextual embeddings (e.g., BERT) give large improvements in performance relative to classic pretrained embeddings (e.g., GloVe), and an even simpler baseline---random word embeddings---focusing on the…

Computation and Language · Computer Science 2020-05-20 Simran Arora , Avner May , Jian Zhang , Christopher Ré

Deep language models learning a hierarchical representation proved to be a powerful tool for natural language processing, text mining and information retrieval. However, representations that perform well for retrieval must capture semantic…

Information Retrieval · Computer Science 2019-05-24 Tolgahan Cakaloglu , Xiaowei Xu

Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be over-compressed in the embeddings. Consequently,…

Computation and Language · Computer Science 2025-07-08 Michael Günther , Isabelle Mohr , Daniel James Williams , Bo Wang , Han Xiao
‹ Prev 1 2 3 10 Next ›