English
Related papers

Related papers: Language Segmentation

200 papers

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as…

Computation and Language · Computer Science 2018-03-28 Omri Koshorek , Adir Cohen , Noam Mor , Michael Rotman , Jonathan Berant

Word segmentation stands as a cornerstone of Natural Language Processing (NLP). Based on the concept of "comprehend first, segment later", we propose a new framework to explore the limit of unsupervised word segmentation with Large Language…

Computation and Language · Computer Science 2025-05-27 Zihong Zhang , Liqi He , Zuchao Li , Lefei Zhang , Hai Zhao , Bo Du

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition,…

Computation and Language · Computer Science 2021-09-22 Ramon Sanabria , Hao Tang , Sharon Goldwater

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and…

Computation and Language · Computer Science 2018-12-04 Yerai Doval , Carlos Gómez-Rodríguez

This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual…

Computation and Language · Computer Science 2017-03-08 Jan Deriu , Aurelien Lucchi , Valeria De Luca , Aliaksei Severyn , Simon Müller , Mark Cieliebak , Thomas Hofmann , Martin Jaggi

Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech…

Computation and Language · Computer Science 2021-09-07 C. M. Downey , Fei Xia , Gina-Anne Levow , Shane Steinert-Threlkeld

Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different…

Computation and Language · Computer Science 2018-07-10 Yan Shao , Christian Hardmeier , Joakim Nivre

In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving…

Computation and Language · Computer Science 2025-01-08 Avishai Elmakies , Omri Abend , Yossi Adi

Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments, to…

Computation and Language · Computer Science 2025-07-21 Jesús Calleja , Thierry Etchegoyhen , David Ponce

Unsupervised word segmentation in audio utterances is challenging as, in speech, there is typically no gap between words. In a preliminary experiment, we show that recent deep self-supervised features are very effective for word…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-04 Tzeviya Sylvia Fuchs , Yedid Hoshen

Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Xinyan Zu , Haiyang Yu , Bin Li , Xiangyang Xue

We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the…

Computation and Language · Computer Science 2021-03-03 Lihao Wang , Zongyi Li , Xiaoqing Zheng

Recent advancements in open vocabulary models, like CLIP, have notably advanced zero-shot classification and segmentation by utilizing natural language for class-specific embeddings. However, most research has focused on improving model…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Wenfang Sun , Yingjun Du , Gaowen Liu , Ramana Kompella , Cees G. M. Snoek

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Simon Malan , Benjamin van Niekerk , Herman Kamper

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised…

Computation and Language · Computer Science 2023-07-04 Vijay Viswanathan , Kiril Gashteovski , Carolin Lawrence , Tongshuang Wu , Graham Neubig

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling segmentation and labeling for documents and…

Computation and Language · Computer Science 2022-09-29 Hakan Inan , Rashi Rungta , Yashar Mehdad

Cross-lingual transfer of word embeddings aims to establish the semantic mappings among words in different languages by learning the transformation functions over the corresponding word embedding spaces. Successfully solving this problem…

Computation and Language · Computer Science 2018-09-12 Ruochen Xu , Yiming Yang , Naoki Otani , Yuexin Wu

We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature…

Computation and Language · Computer Science 2021-06-14 Herman Kamper , Benjamin van Niekerk

Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions. However, such response maps generated by the classification network usually focus on…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Yu-Ting Chang , Qiaosong Wang , Wei-Chih Hung , Robinson Piramuthu , Yi-Hsuan Tsai , Ming-Hsuan Yang

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision…

Computation and Language · Computer Science 2022-05-16 Ziqian Zeng , Weimin Ni , Tianqing Fang , Xiang Li , Xinran Zhao , Yangqiu Song
‹ Prev 1 2 3 10 Next ›