English
Related papers

Related papers: Determining the Unithood of Word Sequences using a…

200 papers

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring…

Artificial Intelligence · Computer Science 2008-10-02 Wilson Wong , Wei Liu , Mohammed Bennamoun

Word embedding, specially with its recent developments, promises a quantification of the similarity between terms. However, it is not clear to which extent this similarity value can be genuinely meaningful and useful for subsequent tasks.…

Computation and Language · Computer Science 2018-04-05 Navid Rekabsaz , Mihai Lupu , Allan Hanbury

In this paper I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words and, unlike other popular measures, is not directly dependent upon token…

Computation and Language · Computer Science 2023-08-25 Sergei Monakhov

Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this…

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly…

Computation and Language · Computer Science 2018-11-13 Ryan Cotterell , Hinrich Schütze

Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is…

Computation and Language · Computer Science 2017-03-13 Christina Lioma , Niels Dalum Hansen

We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-of-speech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the…

cmp-lg · Computer Science 2008-02-03 John Carroll , Ted Briscoe

Sentence similarity is considered the basis of many natural language tasks such as information retrieval, question answering and text summarization. The semantic meaning between compared text fragments is based on the words semantic…

Information Retrieval · Computer Science 2016-10-17 Issa Atoum , Ahmed Otoom , Narayanan Kulathuramaiyer

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar…

cmp-lg · Computer Science 2008-02-03 Ted Briscoe , John Carroll

In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted…

Computation and Language · Computer Science 2014-07-15 Sabina Šišović , Sanda Martinčić-Ipšić , Ana Meštrović

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…

Computation and Language · Computer Science 2025-06-06 Clara Meister , Tiago Pimentel , Gian Wiher , Ryan Cotterell

Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and…

Computation and Language · Computer Science 2026-05-07 Yingshan Susan Wang , Linlu Qiu , Zhaofeng Wu , Roger P. Levy , Yoon Kim

This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of…

Computation and Language · Computer Science 2022-05-24 Patrick Lee , Martha Gavidia , Anna Feldman , Jing Peng

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and…

Computation and Language · Computer Science 2007-05-23 Ido Dagan , Lillian Lee , Fernando C. N. Pereira

In this project we propose a new approach for emotion recognition using web-based similarity (e.g. confidence, PMI and PMING). We aim to extract basic emotions from short sentences with emotional content (e.g. news titles, tweets,…

Computation and Language · Computer Science 2017-01-12 Valentina Franzoni , Giulio Biondi , Alfredo Milani , Yuanxi Li

Search techniques make use of elementary information such as term frequencies and document lengths in computation of similarity weighting. They can also exploit richer statistics, in particular the number of documents in which any two terms…

Information Retrieval · Computer Science 2020-07-20 Bodo Billerbeck , Justin Zobel , Nicholas Lester , Nick Craswell

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity…

Computation and Language · Computer Science 2017-02-13 J. Ferrero , F. Agnes , L. Besacier , D. Schwab

Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences. In this paper we define probabilistic models…

Computation and Language · Computer Science 2020-05-19 Mingda Chen , Kevin Gimpel

This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text.…

Computation and Language · Computer Science 2007-05-23 Michael R. Brent

This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification,…

Computation and Language · Computer Science 2019-10-10 Mamdouh Farouk
‹ Prev 1 2 3 10 Next ›