Related papers: SilverAlign: MT-Based Silver Data Algorithm For Ev…

BinaryAlign: Word Alignment as Binary Sequence Labeling

Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment…

Computation and Language · Computer Science 2024-07-19 Gaetan Lopez Latouche , Marc-André Carbonneau , Ben Swanson

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the…

Computation and Language · Computer Science 2021-09-09 Leonardo F. R. Ribeiro , Jonas Pfeiffer , Yue Zhang , Iryna Gurevych

Subword Sampling for Low Resource Word Alignment

Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods…

Computation and Language · Computer Science 2021-06-17 Ehsaneddin Asgari , Masoud Jalili Sabet , Philipp Dufter , Christopher Ringlstetter , Hinrich Schütze

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. In our commitment to advance these fields, we present SpeechAlign, a framework designed to evaluate the underexplored field of source-target alignment…

Computation and Language · Computer Science 2024-04-26 Belen Alastruey , Aleix Sant , Gerard I. Gállego , David Dale , Marta R. Costa-jussà

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in…

Computation and Language · Computer Science 2021-04-19 Masoud Jalili Sabet , Philipp Dufter , François Yvon , Hinrich Schütze

Leveraging Neural Machine Translation for Word Alignment

The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms. The training data is, however, the same as for machine translation (MT)…

Computation and Language · Computer Science 2021-04-01 Vilém Zouhar , Daria Pylypenko

SentAlign: Accurate and Scalable Sentence Alignment

We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of…

Computation and Language · Computer Science 2023-11-16 Steinþór Steingrímsson , Hrafn Loftsson , Andy Way

WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for…

Computation and Language · Computer Science 2023-10-20 Qiyu Wu , Masaaki Nagata , Yoshimasa Tsuruoka

A Sentence Meaning Based Alignment Method for Parallel Text Corpora Preparation

Text alignment is crucial to the accuracy of Machine Translation (MT) systems, some NLP tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on…

Computation and Language · Computer Science 2015-10-01 Krzysztof Wołk , Krzysztof Marasek

From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models

Classifying patents by their relevance to the UN Sustainable Development Goals (SDGs) is crucial for tracking how innovation addresses global challenges. However, the absence of a large, labeled dataset limits the use of supervised…

Computation and Language · Computer Science 2025-09-12 Grazia Sveva Ascione , Nicolò Tamagnone

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target…

Computation and Language · Computer Science 2020-05-01 Masaaki Nagata , Chousa Katsuki , Masaaki Nishino

MirrorAlign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

Word alignment is essential for the downstream cross-lingual language understanding and generation tasks. Recently, the performance of the neural word alignment models has exceeded that of statistical models. However, they heavily rely on…

Computation and Language · Computer Science 2022-05-11 Di Wu , Liang Ding , Shuo Yang , Mingyang Li

Investigating Text Simplification Evaluation

Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments.…

Computation and Language · Computer Science 2021-07-30 Laura Vásquez-Rodríguez , Matthew Shardlow , Piotr Przybyła , Sophia Ananiadou

Neural Network-based Word Alignment through Score Aggregation

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs. To enable unsupervised training, we use an aggregation operation that summarizes…

Computation and Language · Computer Science 2016-07-01 Joel Legrand , Michael Auli , Ronan Collobert

SpecAlign: A Semantic Alignment Framework for SystemVerilog Assertion Generation

Existing Large Language Model (LLM) approaches to SystemVerilog Assertion (SVA) generation primarily focus on syntactic validity and formal verification outcomes, while semantic alignment between generated assertions and natural language…

Artificial Intelligence · Computer Science 2026-05-26 Jaime Rafael Imperial , Hao Zheng

Reformatted Alignment

The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper…

Computation and Language · Computer Science 2024-04-18 Run-Ze Fan , Xuefeng Li , Haoyang Zou , Junlong Li , Shwai He , Ethan Chern , Jiewen Hu , Pengfei Liu

Text Alignment Is An Efficient Unified Model for Massive NLP Tasks

Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks,…

Computation and Language · Computer Science 2023-11-03 Yuheng Zha , Yichi Yang , Ruichen Li , Zhiting Hu

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Zhiyu Tan , Xiaomeng Yang , Luozheng Qin , Mengping Yang , Cheng Zhang , Hao Li

Learning with Silver Standard Data for Zero-shot Relation Extraction

The superior performance of supervised relation extraction (RE) methods heavily relies on a large amount of gold standard data. Recent zero-shot relation extraction methods converted the RE task to other NLP tasks and used off-the-shelf…

Computation and Language · Computer Science 2024-03-26 Tianyin Wang , Jianwei Wang , Ziqian Zeng

Mask-Align: Self-Supervised Neural Word Alignment

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing…

Computation and Language · Computer Science 2021-05-18 Chi Chen , Maosong Sun , Yang Liu