Related papers: ABL: Alignment-Based Learning

Bootstrapping Syntax and Recursion using Alignment-Based Learning

This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language…

Machine Learning · Computer Science 2009-09-25 Menno van Zaanen

Bootstrapping Structure using Similarity

In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences…

Machine Learning · Computer Science 2007-05-23 Menno van Zaanen

Bootstrapping Structure into Language: Alignment-Based Learning

This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability. Instances of the framework can be applied to an…

Machine Learning · Computer Science 2007-05-23 Menno M. van Zaanen

An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information

In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence…

Computation and Language · Computer Science 2007-05-23 Akshar Bharati , V. Sriram , A. Vamshi Krishna , Rajeev Sangal , S. M. Bendre

Adaptative Bilingual Aligning Using Multilingual Sentence Embedding

In this paper, we present an adaptive bitextual alignment system called AIlign. This aligner relies on sentence embeddings to extract reliable anchor points that can guide the alignment path, even for texts whose parallelism is fragmentary…

Computation and Language · Computer Science 2024-03-19 Olivier Kraif

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set…

Computation and Language · Computer Science 2017-01-11 Omer Levy , Anders Søgaard , Yoav Goldberg

Closed Form Word Embedding Alignment

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate,…

Computation and Language · Computer Science 2020-11-19 Sunipa Dev , Safia Hassan , Jeff M. Phillips

Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification

Text simplification plays a crucial role in improving the accessibility and comprehensibility of written information for diverse audiences, including language learners and readers with limited literacy. Despite its importance, large-scale,…

Computation and Language · Computer Science 2026-05-12 Kenji Hilasaca , Nouran Khallaf , Serge Sharoff

Learning Translation Rules From A Bilingual Corpus

This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations,…

cmp-lg · Computer Science 2008-02-03 Ilyas Cicekli , H. Altay Guvenir

Neural Baselines for Word Alignment

Word alignments identify translational correspondences between words in a parallel sentence pair and is used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems , or to perform quality…

Computation and Language · Computer Science 2020-09-29 Anh Khoa Ngo Ho , François Yvon

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision. Inspired by the success in unsupervised cross-lingual word embeddings, in this paper…

Computation and Language · Computer Science 2018-09-24 Yu-An Chung , Wei-Hung Weng , Schrasing Tong , James Glass

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each…

Computation and Language · Computer Science 2021-12-30 Hila Gonen , Ganesh Jawahar , Djamé Seddah , Yoav Goldberg

A statistical learning algorithm for word segmentation

In natural speech, the speaker does not pause between words, yet a human listener somehow perceives this continuous stream of phonemes as a series of distinct words. The detection of boundaries between spoken words is an instance of a…

Computation and Language · Computer Science 2011-06-28 Jerry R. Van Aken

SimCSE: Simple Contrastive Learning of Sentence Embeddings

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive…

Computation and Language · Computer Science 2022-05-19 Tianyu Gao , Xingcheng Yao , Danqi Chen

explanation-based learning of data oriented parsing

This paper presents a new view of Explanation-Based Learning (EBL) of natural language parsing. Rather than employing EBL for specializing parsers by inferring new ones, this paper suggests employing EBL for learning how to reduce ambiguity…

cmp-lg · Computer Science 2008-02-03 Khalil Sima'an

When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection

Contrastive Language-Image Pre-training (CLIP) achieves remarkable performance in various downstream tasks through the alignment of image and text input embeddings and holds great promise for anomaly detection. However, our empirical…

Computer Vision and Pattern Recognition · Computer Science 2024-07-25 Adam Goodge , Bryan Hooi , Wee Siong Ng

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Such aligned data can be used for a variety of NLP tasks from training cross-lingual…

Computation and Language · Computer Science 2020-10-13 Ahmed El-Kishky , Francisco Guzmán

Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora

Parallel corpora have driven great progress in the field of Text Simplification. However, most sentence alignment algorithms either offer a limited range of alignment types supported, or simply ignore valuable clues present in comparable…

Computation and Language · Computer Science 2016-12-14 Gustavo Henrique Paetzold , Lucia Specia

Hybrid Alignment Training for Large Language Models

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and…

Computation and Language · Computer Science 2024-06-24 Chenglong Wang , Hang Zhou , Kaiyan Chang , Bei Li , Yongyu Mu , Tong Xiao , Tongran Liu , Jingbo Zhu

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs. The great…

Computation and Language · Computer Science 2021-08-13 Zi-Yi Dou , Graham Neubig