Related papers: Language Segmentation

Text Segmentation as a Supervised Learning Task

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as…

Computation and Language · Computer Science 2018-03-28 Omri Koshorek , Adir Cohen , Noam Mor , Michael Rotman , Jonathan Berant

Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models

Word segmentation stands as a cornerstone of Natural Language Processing (NLP). Based on the concept of "comprehend first, segment later", we propose a new framework to explore the limit of unsupervised word segmentation with Large Language…

Computation and Language · Computer Science 2025-05-27 Zihong Zhang , Liqi He , Zuchao Li , Lefei Zhang , Hai Zhao , Bo Du

On the Difficulty of Segmenting Words with Attention

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition,…

Computation and Language · Computer Science 2021-09-22 Ramon Sanabria , Hao Tang , Sharon Goldwater

Comparing Neural- and N-Gram-Based Language Models for Word Segmentation

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and…

Computation and Language · Computer Science 2018-12-04 Yerai Doval , Carlos Gómez-Rodríguez

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual…

Computation and Language · Computer Science 2017-03-08 Jan Deriu , Aurelien Lucchi , Valeria De Luca , Aliaksei Severyn , Simon Müller , Mark Cieliebak , Thomas Hofmann , Martin Jaggi

A Masked Segmental Language Model for Unsupervised Natural Language Segmentation

Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech…

Computation and Language · Computer Science 2021-09-07 C. M. Downey , Fei Xia , Gina-Anne Levow , Shane Steinert-Threlkeld

Universal Word Segmentation: Implementation and Interpretation

Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different…

Computation and Language · Computer Science 2018-07-10 Yan Shao , Christian Hardmeier , Joakim Nivre

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving…

Computation and Language · Computer Science 2025-01-08 Avishai Elmakies , Omri Abend , Yossi Adi

Automating Easy Read Text Segmentation

Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments, to…

Computation and Language · Computer Science 2025-07-21 Jesús Calleja , Thierry Etchegoyhen , David Ponce

Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels

Unsupervised word segmentation in audio utterances is challenging as, in speech, there is typically no gap between words. In a preliminary experiment, we show that recent deep self-supervised features are very effective for word…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-04 Tzeviya Sylvia Fuchs , Yedid Hoshen

Weakly-Supervised Text Instance Segmentation

Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Xinyan Zu , Haiyang Yu , Bin Li , Xiangyang Xue

Unsupervised Word Segmentation with Bi-directional Neural Language Model

We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the…

Computation and Language · Computer Science 2021-03-03 Lihao Wang , Zongyi Li , Xiaoqing Zheng

Training-Free Semantic Segmentation via LLM-Supervision

Recent advancements in open vocabulary models, like CLIP, have notably advanced zero-shot classification and segmentation by utilizing natural language for class-specific embeddings. However, most research has focused on improving model…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Wenfang Sun , Yingjun Du , Gaowen Liu , Ramana Kompella , Cees G. M. Snoek

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Simon Malan , Benjamin van Niekerk , Herman Kamper

Large Language Models Enable Few-Shot Clustering

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised…

Computation and Language · Computer Science 2023-07-04 Vijay Viswanathan , Kiril Gashteovski , Carolin Lawrence , Tongshuang Wu , Graham Neubig

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling segmentation and labeling for documents and…

Computation and Language · Computer Science 2022-09-29 Hakan Inan , Rashi Rungta , Yashar Mehdad

Unsupervised Cross-lingual Transfer of Word Embedding Spaces

Cross-lingual transfer of word embeddings aims to establish the semantic mappings among words in different languages by learning the transformation functions over the corresponding word embedding spaces. Successfully solving this problem…

Computation and Language · Computer Science 2018-09-12 Ruochen Xu , Yiming Yang , Naoki Otani , Yuexin Wu

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature…

Computation and Language · Computer Science 2021-06-14 Herman Kamper , Benjamin van Niekerk

Weakly-Supervised Semantic Segmentation via Sub-category Exploration

Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions. However, such response maps generated by the classification network usually focus on…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Yu-Ting Chang , Qiaosong Wang , Wei-Chih Hung , Robinson Piramuthu , Yi-Hsuan Tsai , Ming-Hsuan Yang

Weakly Supervised Text Classification using Supervision Signals from a Language Model

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision…

Computation and Language · Computer Science 2022-05-16 Ziqian Zeng , Weimin Ni , Tianqing Fang , Xiang Li , Xinran Zhao , Yangqiu Song