Related papers: Intention-based Segmentation: Human Reliability an…

Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction

We propose a novel approach that utilizes inter-speaker relative cues to distinguish target speakers and extract their voices from mixtures. Continuous cues (e.g., temporal order, age, pitch level) are grouped by relative differences, while…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-10 Wang Dai , Archontis Politis , Tuomas Virtanen

One Vector is Not Enough: Entity-Augmented Distributional Semantics for Discourse Relations

Discourse relations bind smaller linguistic units into coherent texts. However, automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked arguments. A more subtle challenge is…

Computation and Language · Computer Science 2014-11-26 Yangfeng Ji , Jacob Eisenstein

Segmentation of Expository Texts by Hierarchical Agglomerative Clustering

We propose a method for segmentation of expository texts based on hierarchical agglomerative clustering. The method uses paragraphs as the basic segments for identifying hierarchical discourse structure in the text, applying lexical…

cmp-lg · Computer Science 2016-08-31 Yaakov Yaari

Unsupervised Word Segmentation with Bi-directional Neural Language Model

We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the…

Computation and Language · Computer Science 2021-03-03 Lihao Wang , Zongyi Li , Xiaoqing Zheng

Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs

Generating a long, coherent text such as a paragraph requires a high-level control of different levels of relations between sentences (e.g., tense, coreference). We call such a logical connection between sentences as a (paragraph) flow. In…

Computation and Language · Computer Science 2019-09-02 Dongyeop Kang , Hiroaki Hayashi , Alan W Black , Eduard Hovy

Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People

Conversational tones -- the manners and attitudes in which speakers communicate -- are essential to effective communication. Amidst the increasing popularization of Large Language Models (LLMs) over recent years, it becomes necessary to…

Computation and Language · Computer Science 2024-06-07 Dun-Ming Huang , Pol Van Rijn , Ilia Sucholutsky , Raja Marjieh , Nori Jacoby

Sequence Prediction with Neural Segmental Models

Segments that span contiguous parts of inputs, such as phonemes in speech, named entities in sentences, actions in videos, occur frequently in sequence prediction problems. Segmental models, a class of models that explicitly hypothesizes…

Computation and Language · Computer Science 2018-06-14 Hao Tang

Conversational Contextual Cues: The Case of Personalization and History for Response Ranking

We investigate the task of modeling open-domain, multi-turn, unstructured, multi-participant, conversational dialogue. We specifically study the effect of incorporating different elements of the conversation. Unlike previous efforts, which…

Computation and Language · Computer Science 2016-06-02 Rami Al-Rfou , Marc Pickett , Javier Snaider , Yun-hsuan Sung , Brian Strope , Ray Kurzweil

Investigating Confidence Estimation Measures for Speaker Diarization

Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise,…

Sound · Computer Science 2024-06-26 Anurag Chowdhury , Abhinav Misra , Mark C. Fuhs , Monika Woszczyna

Understanding confounding effects in linguistic coordination: an information-theoretic approach

We suggest an information-theoretic approach for measuring stylistic coordination in dialogues. The proposed measure has a simple predictive interpretation and can account for various confounding factors through proper conditioning. We…

Computation and Language · Computer Science 2015-08-28 Shuyang Gao , Greg Ver Steeg , Aram Galstyan

Utilisation des grammaires probabilistes dans les t\^aches de segmentation et d'annotation prosodique

Nous pr\'esentons dans cette contribution une approche \`a la fois symbolique et probabiliste permettant d'extraire l'information sur la segmentation du signal de parole \`a partir d'information prosodique. Nous utilisons pour ce faire des…

Machine Learning · Computer Science 2008-12-18 Irina Nesterenko , Stéphane Rauzy

Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models

Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic…

Computation and Language · Computer Science 2023-10-27 Reshmi Ghosh , Harjeet Singh Kajal , Sharanya Kamath , Dhuri Shrivastava , Samyadeep Basu , Hansi Zeng , Soundararajan Srinivasan

NaturalTurn: A Method to Segment Speech into Psychologically Meaningful Conversational Turns

Conversation is a subject of increasing interest in the social, cognitive, and computational sciences. Yet as conversational datasets continue to increase in size and complexity, researchers lack scalable methods to segment speech-to-text…

Computation and Language · Computer Science 2025-11-13 Gus Cooney , Andrew Reece

Linear Semantic Segmentation for Low-Resource Spoken Dialects

Semantic segmentation is a core component of discourse analysis, yet existing models are primarily developed and evaluated on high-resource written text, limiting their effectiveness on low-resource spoken varieties. In particular,…

Computation and Language · Computer Science 2026-05-08 Kirill Chirkunov , Younes Samih , Abed Alhakim Freihat , Hanan Aldarmaki

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI…

Computation and Language · Computer Science 2023-05-26 Sihao Chen , Senaka Buthpitiya , Alex Fabrikant , Dan Roth , Tal Schuster

Towards Semantic Query Segmentation

Query Segmentation is one of the critical components for understanding users' search intent in Information Retrieval tasks. It involves grouping tokens in the search query into meaningful phrases which help downstream tasks like search…

Information Retrieval · Computer Science 2017-07-26 Ajinkya Kale , Thrivikrama Taula , Sanjika Hewavitharana , Amit Srivastava

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

The purpose of speech tokenization is to transform a speech signal into a sequence of discrete representations, serving as the foundation for speech language models (SLMs). While speech tokenization has many options, their effect on the…

Computation and Language · Computer Science 2025-06-03 Shunsuke Kando , Yusuke Miyao , Shinnosuke Takamichi

Probing Syntax in Large Language Models: Successes and Remaining Challenges

The syntactic structures of sentences can be readily read-out from the activations of large language models (LLMs). However, the ``structural probes'' that have been developed to reveal this phenomenon are typically evaluated on an…

Computation and Language · Computer Science 2025-08-12 Pablo J. Diego-Simón , Emmanuel Chemla , Jean-Rémi King , Yair Lakretz

Experiments on predictability of word in context and information rate in natural language

Based on data from a large-scale experiment with human subjects, we conclude that the logarithm of probability to guess a word in context (unpredictability) depends linearly on the word length. This result holds both for poetry and prose,…

Information Theory · Computer Science 2007-07-16 Dmitrii Manin

Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks

Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of language-impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence…

Computation and Language · Computer Science 2017-08-17 Marcos Vinícius Treviso , Christopher Shulby , Sandra Maria Aluísio