Related papers: BinaryAlign: Word Alignment as Binary Sequence Lab…

SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a…

Computation and Language · Computer Science 2023-03-29 Abdullatif Köksal , Silvia Severini , Hinrich Schütze

MirrorAlign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

Word alignment is essential for the downstream cross-lingual language understanding and generation tasks. Recently, the performance of the neural word alignment models has exceeded that of statistical models. However, they heavily rely on…

Computation and Language · Computer Science 2022-05-11 Di Wu , Liang Ding , Shuo Yang , Mingyang Li

SentAlign: Accurate and Scalable Sentence Alignment

We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of…

Computation and Language · Computer Science 2023-11-16 Steinþór Steingrímsson , Hrafn Loftsson , Andy Way

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. In our commitment to advance these fields, we present SpeechAlign, a framework designed to evaluate the underexplored field of source-target alignment…

Computation and Language · Computer Science 2024-04-26 Belen Alastruey , Aleix Sant , Gerard I. Gállego , David Dale , Marta R. Costa-jussà

PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment

Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. However, the spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory…

Computation and Language · Computer Science 2024-11-19 Jiahuan Li , Shujian Huang , Aarron Ching , Xinyu Dai , Jiajun Chen

Adaptative Bilingual Aligning Using Multilingual Sentence Embedding

In this paper, we present an adaptive bitextual alignment system called AIlign. This aligner relies on sentence embeddings to extract reliable anchor points that can guide the alignment path, even for texts whose parallelism is fragmentary…

Computation and Language · Computer Science 2024-03-19 Olivier Kraif

Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment

Word alignment which aims to extract lexicon translation equivalents between source and target sentences, serves as a fundamental tool for natural language processing. Recent studies in this area have yielded substantial improvements by…

Computation and Language · Computer Science 2022-10-11 Siyu Lai , Zhen Yang , Fandong Meng , Yufeng Chen , Jinan Xu , Jie Zhou

A Model for Fine-Grained Alignment of Multilingual Texts

While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bi- or multilingual texts which are aligned on a level in-between are a useful resource for many purposes.…

Computation and Language · Computer Science 2007-05-23 Lea Cyrus , Hendrik Feddes

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs. The great…

Computation and Language · Computer Science 2021-08-13 Zi-Yi Dou , Graham Neubig

Mask-Align: Self-Supervised Neural Word Alignment

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing…

Computation and Language · Computer Science 2021-05-18 Chi Chen , Maosong Sun , Yang Liu

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target…

Computation and Language · Computer Science 2020-05-01 Masaaki Nagata , Chousa Katsuki , Masaaki Nishino

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment

Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces. In this paper, we show it is possible to produce much higher…

Computation and Language · Computer Science 2021-06-15 Haoyue Shi , Luke Zettlemoyer , Sida I. Wang

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of…

Computation and Language · Computer Science 2023-10-25 Samuel Cahyawijaya , Holy Lovenia , Tiezheng Yu , Willy Chung , Pascale Fung

Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

This work presents methods for learning cross-lingual sentence representations using paired or unpaired bilingual texts. We hypothesize that the cross-lingual alignment strategy is transferable, and therefore a model trained to align only…

Computation and Language · Computer Science 2022-03-17 Chih-chan Tien , Shane Steinert-Threlkeld

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data. Existing attempts usually face the problem of coarse alignment, e.g., the vision encoder struggles in localizing an…

Computer Vision and Pattern Recognition · Computer Science 2024-03-27 Qinying Liu , Wei Wu , Kecheng Zheng , Zhan Tong , Jiawei Liu , Yu Liu , Wei Chen , Zilei Wang , Yujun Shen

Multilevel Text Alignment with Cross-Document Attention

Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document…

Computation and Language · Computer Science 2020-10-06 Xuhui Zhou , Nikolaos Pappas , Noah A. Smith

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio…

Machine Learning · Computer Science 2026-05-08 Adhiraj Banerjee , Vipul Arora

WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for…

Computation and Language · Computer Science 2023-10-20 Qiyu Wu , Masaaki Nagata , Yoshimasa Tsuruoka

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in…

Computation and Language · Computer Science 2021-04-19 Masoud Jalili Sabet , Philipp Dufter , François Yvon , Hinrich Schütze

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task.…

Computation and Language · Computer Science 2021-09-14 Zewen Chi , Li Dong , Bo Zheng , Shaohan Huang , Xian-Ling Mao , Heyan Huang , Furu Wei