English
Related papers

Related papers: Hierarchical Document Encoder for Parallel Corpus …

200 papers

This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings. Our embedding models are trained to produce similar representations exclusively for bilingual sentence pairs that are translations of…

In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN)…

Computation and Language · Computer Science 2019-06-18 Yinfei Yang , Gustavo Hernandez Abrego , Steve Yuan , Mandy Guo , Qinlan Shen , Daniel Cer , Yun-hsuan Sung , Brian Strope , Ray Kurzweil

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding.…

Computation and Language · Computer Science 2023-07-06 Sonal Sannigrahi , Josef van Genabith , Cristina Espana-Bonet

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of…

Computation and Language · Computer Science 2016-12-15 Radu Soricut , Nan Ding

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language…

Computation and Language · Computer Science 2017-09-18 Nikolaos Pappas , Andrei Popescu-Belis

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages. We propose a novel unsupervised method to derive multilingual sentence embeddings relying only on…

Computation and Language · Computer Science 2021-05-24 Ivana Kvapilıkova , Mikel Artetxe , Gorka Labaka , Eneko Agirre , Ondřej Bojar

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are…

Computation and Language · Computer Science 2024-11-11 John X. Morris , Alexander M. Rush

Generative retrieval employs sequence models for conditional generation of document IDs based on a query (DSI (Tay et al., 2022); NCI (Wang et al., 2022); inter alia). While this has led to improved performance in zero-shot retrieval, it is…

Information Retrieval · Computer Science 2025-02-27 Tongfei Chen , Ankita Sharma , Adam Pauls , Benjamin Van Durme

Visual data and text data are composed of information at multiple granularities. A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. Similarly, a…

Computer Vision and Pattern Recognition · Computer Science 2018-10-18 Bowen Zhang , Hexiang Hu , Fei Sha

In countries that speak multiple main languages, mixing up different languages within a conversation is commonly called code-switching. Previous works addressing this challenge mainly focused on word-level aspects such as word embeddings.…

Computation and Language · Computer Science 2019-09-19 Genta Indra Winata , Zhaojiang Lin , Jamin Shin , Zihan Liu , Pascale Fung

Massively multilingual pretrained transformers (MMTs) have tremendously pushed the state of the art on multilingual NLP and cross-lingual transfer of NLP models in particular. While a large body of work leveraged MMTs to mine parallel data…

Computation and Language · Computer Science 2023-05-12 Onur Galoğlu , Robert Litschko , Goran Glavaš

Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the…

Computation and Language · Computer Science 2018-09-07 Roger A. Stein , Patricia A. Jaques , Joao F. Valiati

Deep language models learning a hierarchical representation proved to be a powerful tool for natural language processing, text mining and information retrieval. However, representations that perform well for retrieval must capture semantic…

Information Retrieval · Computer Science 2019-05-24 Tolgahan Cakaloglu , Xiaowei Xu

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that…

Computation and Language · Computer Science 2021-10-22 Takashi Wada , Tomoharu Iwata , Yuji Matsumoto , Timothy Baldwin , Jey Han Lau

We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual…

Computation and Language · Computer Science 2016-03-01 Ivan Vulić , Marie-Francine Moens

Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such…

Computation and Language · Computer Science 2019-01-29 Ming-Wei Chang , Kristina Toutanova , Kenton Lee , Jacob Devlin

Sentence embedding tasks are important in natural language processing (NLP), but improving their performance while keeping them reliable is still hard. This paper presents a framework that combines pseudo-label generation and model ensemble…

Computation and Language · Computer Science 2025-01-28 Ziwei Liu , Qi Zhang , Lifu Gao

In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by…

Information Retrieval · Computer Science 2019-08-16 Felix Hamann , Nadja Kurz , Adrian Ulges

The importance of qualitative parallel data in machine translation has long been determined but it has always been very difficult to obtain such in sufficient quantity for the majority of world languages, mainly because of the associated…

In recent times, word embeddings are taking a significant role in sentiment analysis. As the generation of word embeddings needs huge corpora, many applications use pretrained embeddings. In spite of the success, word embeddings suffers…

Computation and Language · Computer Science 2020-06-03 Satanik Mitra , Mamata Jenamani
‹ Prev 1 2 3 10 Next ›