Related papers: Intention-based Segmentation: Human Reliability an…

Cross-lingual and cross-domain discourse segmentation of entire documents

Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold…

Computation and Language · Computer Science 2017-04-25 Chloé Braud , Ophélie Lacroix , Anders Søgaard

Text Segmentation Using Exponential Models

This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To…

cmp-lg · Computer Science 2008-02-03 Doug Beeferman , Adam Berger , John Lafferty

Automatic Discourse Segmentation: Review and Perspectives

Multilingual discourse parsing is a very prominent research topic. The first stage for discourse parsing is discourse segmentation. The study reported in this article addresses a review of two on-line available discourse segmenters (for…

Information Retrieval · Computer Science 2020-05-04 Iria da Cunha , Juan-Manuel Torres-Moreno

Using Contextual Information for Sentence-level Morpheme Segmentation

Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence…

Computation and Language · Computer Science 2024-12-18 Prabin Bhandari , Abhishek Paudel

Detecting Intentional Lexical Ambiguity in English Puns

The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget's Thesaurus to discover two groups of words which, in a pun, form around two…

Computation and Language · Computer Science 2017-07-19 Elena Mikhalkova , Yuri Karyakin

Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line…

Computation and Language · Computer Science 2025-02-11 Yang Zhong , Diane Litman

Citations are not opinions: a corpus linguistics approach to understanding how citations are made

Citation content analysis seeks to understand citations based on the language used during the making of a citation. A key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of…

Digital Libraries · Computer Science 2021-04-19 Domenic Rosati

Measuring Sentences Similarity: A Survey

This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification,…

Computation and Language · Computer Science 2019-10-10 Mamdouh Farouk

Linear Segmentation and Segment Significance

We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along with a zero-sum weighting scheme, determines…

Computation and Language · Computer Science 2007-05-23 Min-Yen Kan , Judith L. Klavans , Kathleen R. McKeown

Topic Segmentation Model Focusing on Local Context

Topic segmentation is important in understanding scientific documents since it can not only provide better readability but also facilitate downstream tasks such as information retrieval and question answering by creating appropriate…

Computation and Language · Computer Science 2023-01-06 Jeonghwan Lee , Jiyeong Han , Sunghoon Baek , Min Song

An Information Structural Approach to Spoken Language Generation

This paper presents an architecture for the generation of spoken monologues with contextually appropriate intonation. A two-tiered information structure representation is used in the high-level content planning and sentence planning stages…

cmp-lg · Computer Science 2008-02-03 Scott Prevost

Probing Natural Language Inference Models through Semantic Fragments

Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word…

Computation and Language · Computer Science 2019-12-03 Kyle Richardson , Hai Hu , Lawrence S. Moss , Ashish Sabharwal

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective

Recent debates over adults' theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking can be relatively effortful. How, then, should speakers and listeners allocate…

Computation and Language · Computer Science 2020-05-13 Robert D. Hawkins , Hyowon Gweon , Noah D. Goodman

Automating Easy Read Text Segmentation

Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments, to…

Computation and Language · Computer Science 2025-07-21 Jesús Calleja , Thierry Etchegoyhen , David Ponce

Comparative Discourse Analysis of Parallel Texts

A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent blocks of text. These representations have been proposed to deal with sub-topic text segmentation. In a parallel…

cmp-lg · Computer Science 2008-02-03 Pim van der Eijk

Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction

Recent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the…

Computation and Language · Computer Science 2017-02-13 Ashutosh Modi , Ivan Titov , Vera Demberg , Asad Sayeed , Manfred Pinkal

Computational Analysis of Conversation Dynamics through Participant Responsivity

Growing literature explores toxicity and polarization in discourse, with comparatively less work on characterizing what makes dialogue prosocial and constructive. We explore conversational discourse and investigate a method for…

Computation and Language · Computer Science 2025-11-04 Margaret Hughes , Brandon Roy , Elinor Poole-Dayan , Deb Roy , Jad Kabbara

Topic Segmentation and Labeling in Asynchronous Conversations

Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog…

Computation and Language · Computer Science 2014-02-05 Shafiq Rayhan Joty , Giuseppe Carenini , Raymond T Ng

Linguistic Structure from a Bottleneck on Sequential Information Processing

Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systematicity arises in codes that are constrained by a…

Computation and Language · Computer Science 2025-11-19 Richard Futrell , Michael Hahn

Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions

Instruction-tuned large language models have shown remarkable performance in aligning generated text with user intentions across various tasks. However, maintaining human-like discourse structure in the generated text remains a challenging…

Computation and Language · Computer Science 2023-12-20 Yinhong Liu , Yixuan Su , Ehsan Shareghi , Nigel Collier