English
Related papers

Related papers: Exploiting Sentence Order in Document Alignment

200 papers

Multilingual sentence representations pose a great advantage for low-resource languages that do not have enough data to build monolingual models on their own. These multilingual sentence representations have been separately exploited by few…

Computation and Language · Computer Science 2021-06-15 Dilan Sachintha , Lakmali Piyarathna , Charith Rajitha , Surangika Ranathunga

Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Such aligned data can be used for a variety of NLP tasks from training cross-lingual…

Computation and Language · Computer Science 2020-10-13 Ahmed El-Kishky , Francisco Guzmán

Document alignment is necessary for the hierarchical mining (Ba\~n\'on et al., 2020; Morishita et al., 2022), which aligns documents across source and target languages within the same web domain. Several high precision sentence…

Computation and Language · Computer Science 2025-10-20 Xiaotian Wang , Takehito Utsuro , Masaaki Nagata

Document-level neural machine translation (NMT) has outperformed sentence-level NMT on a number of datasets. However, document-level NMT is still not widely adopted in real-world translation systems mainly due to the lack of large-scale…

Computation and Language · Computer Science 2023-04-21 Yusser Al Ghussin , Jingyi Zhang , Josef van Genabith

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal…

Computation and Language · Computer Science 2022-04-26 Miaoran Zhang , Marius Mosbach , David Ifeoluwa Adelani , Michael A. Hedderich , Dietrich Klakow

In this paper, we introduce a divide-and-conquer algorithm to improve sentence alignment speed. We utilize external bilingual sentence embeddings to find accurate hard delimiters for the parallel texts to be aligned. We use Monte Carlo…

Computation and Language · Computer Science 2022-01-19 Wu Zhang

In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN)…

Computation and Language · Computer Science 2019-06-18 Yinfei Yang , Gustavo Hernandez Abrego , Steve Yuan , Mandy Guo , Qinlan Shen , Daniel Cer , Yun-hsuan Sung , Brian Strope , Ray Kurzweil

Text alignment is crucial to the accuracy of Machine Translation (MT) systems, some NLP tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on…

Computation and Language · Computer Science 2015-10-01 Krzysztof Wołk , Krzysztof Marasek

Discovering the logical sequence of events is one of the cornerstones in Natural Language Understanding. One approach to learn the sequence of events is to study the order of sentences in a coherent text. Sentence ordering can be applied in…

Computation and Language · Computer Science 2021-08-26 Melika Golestani , Seyedeh Zahra Razavi , Zeinab Borhanifard , Farnaz Tahmasebian , Hesham Faili

We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of…

Computation and Language · Computer Science 2023-11-16 Steinþór Steingrímsson , Hrafn Loftsson , Andy Way

This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a convolutional neural network structure to model similarity between two sentences. Due to the…

Computation and Language · Computer Science 2018-09-25 Yonghui Huang , Yunhui Li , Yi Luan

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on…

Computation and Language · Computer Science 2021-12-28 Mikel Artetxe , Holger Schwenk

Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs. The great…

Computation and Language · Computer Science 2021-08-13 Zi-Yi Dou , Graham Neubig

The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve…

Computation and Language · Computer Science 2021-09-01 Chao Jiang , Mounica Maddela , Wuwei Lan , Yang Zhong , Wei Xu

Objective: Today's neural machine translation (NMT) can achieve near human-level translation quality and greatly facilitates international communications, but the lack of parallel corpora poses a key problem to the development of…

Computation and Language · Computer Science 2022-02-08 Shengxuan Luo , Huaiyuan Ying , Jiao Li , Sheng Yu

There are two main approaches to recent extractive summarization: the sentence-level framework, which selects sentences to include in a summary individually, and the summary-level framework, which generates multiple candidate summaries and…

Computation and Language · Computer Science 2025-02-25 Taewan Kwon , Sangyong Lee

Unsupervised extractive summarization aims to extract salient sentences from a document as the summary without labeled data. Recent literatures mostly research how to leverage sentence similarity to rank sentences in the order of salience.…

Computation and Language · Computer Science 2023-02-27 Shichao Sun , Ruifeng Yuan , Wenjie Li , Sujian Li

High-quality parallel corpora are essential for Machine Translation (MT) research and translation teaching. However, Arabic-English resources remain scarce and existing datasets mainly consist of simple one-to-one mappings. In this paper,…

Computation and Language · Computer Science 2026-01-05 Baorong Huang , Ali Asiri

Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods…

Computation and Language · Computer Science 2021-06-17 Ehsaneddin Asgari , Masoud Jalili Sabet , Philipp Dufter , Christopher Ringlstetter , Hinrich Schütze

Sentence scoring aims at measuring the likelihood score of a sentence and is widely used in many natural language processing scenarios, like reranking, which is to select the best sentence from multiple candidates. Previous works on…

Computation and Language · Computer Science 2022-10-20 Kaitao Song , Yichong Leng , Xu Tan , Yicheng Zou , Tao Qin , Dongsheng Li
‹ Prev 1 2 3 10 Next ›