Related papers: Bootstrapping Structure using Similarity

Bootstrapping Syntax and Recursion using Alignment-Based Learning

This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language…

Machine Learning · Computer Science 2009-09-25 Menno van Zaanen

ABL: Alignment-Based Learning

This paper introduces a new type of grammar learning algorithm, inspired by string edit distance (Wagner and Fischer, 1974). The algorithm takes a corpus of flat sentences as input and returns a corpus of labelled, bracketed sentences. The…

Machine Learning · Computer Science 2007-05-23 Menno van Zaanen

Bootstrapping Structure into Language: Alignment-Based Learning

This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability. Instances of the framework can be applied to an…

Machine Learning · Computer Science 2007-05-23 Menno M. van Zaanen

Sentence Structure and Word Relationship Modeling for Emphasis Selection

Emphasis Selection is a newly proposed task which focuses on choosing words for emphasis in short sentences. Traditional methods only consider the sequence information of a sentence while ignoring the rich sentence structure and word…

Computation and Language · Computer Science 2021-08-31 Haoran Yang , Wai Lam

Learning string edit distance

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…

cmp-lg · Computer Science 2008-02-03 Eric Sven Ristad , Peter N. Yianilos

Structural-Aware Sentence Similarity with Recursive Optimal Transport

Measuring sentence similarity is a classic topic in natural language processing. Light-weighted similarities are still of particular practical significance even when deep learning models have succeeded in many other tasks. Some…

Computation and Language · Computer Science 2020-02-04 Zihao Wang , Yong Zhang , Hao Wu

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each…

Computation and Language · Computer Science 2021-12-30 Hila Gonen , Ganesh Jawahar , Djamé Seddah , Yoav Goldberg

An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information

In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence…

Computation and Language · Computer Science 2007-05-23 Akshar Bharati , V. Sriram , A. Vamshi Krishna , Rajeev Sangal , S. M. Bendre

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

An important task in NLP applications such as sentence simplification is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. We introduce a novel dataset and a new model for this `split…

Computation and Language · Computer Science 2021-09-13 Joongwon Kim , Mounica Maddela , Reno Kriz , Wei Xu , Chris Callison-Burch

Learning Translation Rules From A Bilingual Corpus

This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations,…

cmp-lg · Computer Science 2008-02-03 Ilyas Cicekli , H. Altay Guvenir

A New Sentence Ordering Method Using BERT Pretrained Model

Building systems with capability of natural language understanding (NLU) has been one of the oldest areas of AI. An essential component of NLU is to detect logical succession of events contained in a text. The task of sentence ordering is…

Computation and Language · Computer Science 2021-08-30 Melika Golestani , Seyedeh Zahra Razavi , Heshaam Faili

Measuring Sentences Similarity: A Survey

This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification,…

Computation and Language · Computer Science 2019-10-10 Mamdouh Farouk

Co-training an Unsupervised Constituency Parser with Weak Supervision

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an…

Computation and Language · Computer Science 2022-03-22 Nickil Maveli , Shay B. Cohen

Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs…

Computation and Language · Computer Science 2007-05-23 Marco Baroni , Johannes Matiasek , Harald Trost

Bootstrapped Adaptive Threshold Selection for Statistical Model Selection and Estimation

A central goal of neuroscience is to understand how activity in the nervous system is related to features of the external world, or to features of the nervous system itself. A common approach is to model neural responses as a weighted…

Machine Learning · Statistics 2015-05-14 Kristofer E. Bouchard

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set…

Computation and Language · Computer Science 2017-01-11 Omer Levy , Anders Søgaard , Yoav Goldberg

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our…

Social and Information Networks · Computer Science 2018-04-10 Dario Pavllo , Tiziano Piccardi , Robert West

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and…

Computation and Language · Computer Science 2019-06-06 Yunsu Kim , Hendrik Rosendahl , Nick Rossenbach , Jan Rosendahl , Shahram Khadivi , Hermann Ney

AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge

Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on…

Computation and Language · Computer Science 2023-09-26 Tim Schopf , Emanuel Gerber , Malte Ostendorff , Florian Matthes

STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings

This paper introduces STRASS: Summarization by TRAnsformation Selection and Scoring. It is an extractive text summarization method which leverages the semantic information in existing sentence embedding spaces. Our method creates an…

Computation and Language · Computer Science 2019-07-18 Léo Bouscarrat , Antoine Bonnefoy , Thomas Peel , Cécile Pereira