Related papers: Text segmentation on multilabel documents: A dista…

Text Segmentation as a Supervised Learning Task

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as…

Computation and Language · Computer Science 2018-03-28 Omri Koshorek , Adir Cohen , Noam Mor , Michael Rotman , Jonathan Berant

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling segmentation and labeling for documents and…

Computation and Language · Computer Science 2022-09-29 Hakan Inan , Rashi Rungta , Yashar Mehdad

Structural Text Segmentation of Legal Documents

The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs. However, such downstream systems typically require documents to be…

Computation and Language · Computer Science 2021-05-18 Dennis Aumiller , Satya Almasian , Sebastian Lackner , Michael Gertz

One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification

Distance-based unsupervised text classification is a method within text classification that leverages the semantic similarity between a label and a text to determine label relevance. This method provides numerous benefits, including fast…

Computation and Language · Computer Science 2025-10-14 Jens Van Nooten , Andriy Kosar , Guy De Pauw , Walter Daelemans

OntoSeg: a Novel Approach to Text Segmentation using Ontological Similarity

Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document…

Computation and Language · Computer Science 2015-11-30 Mostafa Bayomi , Killian Levacher , M. Rami Ghorab , Séamus Lawless

Toward Unifying Text Segmentation and Long Document Summarization

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem…

Computation and Language · Computer Science 2022-11-01 Sangwoo Cho , Kaiqiang Song , Xiaoyang Wang , Fei Liu , Dong Yu

Topic Segmentation Model Focusing on Local Context

Topic segmentation is important in understanding scientific documents since it can not only provide better readability but also facilitate downstream tasks such as information retrieval and question answering by creating appropriate…

Computation and Language · Computer Science 2023-01-06 Jeonghwan Lee , Jiyeong Han , Sunghoon Baek , Min Song

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Subhajit Maity , Sanket Biswas , Siladittya Manna , Ayan Banerjee , Josep Lladós , Saumik Bhattacharya , Umapada Pal

Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images

Text segmentation, the task of dividing a document into sections, is often a prerequisite for performing additional natural language processing tasks. Existing text segmentation methods have typically been developed and tested using clean,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Carol Anderson , Phil Crone

Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

Text segmentation tasks have a very wide range of application values, such as image editing, style transfer, watermark removal, etc.However, existing public datasets are of poor quality of pixel-level labels that have been shown to be…

Computer Vision and Pattern Recognition · Computer Science 2023-08-28 Yibo Wang , Yunhu Ye , Yuanpeng Mao , Yanwei Yu , Yuanping Song

Minimally Supervised Categorization of Text with Metadata

Document categorization, which aims to assign a topic label to each document, plays a fundamental role in a wide variety of applications. Despite the success of existing studies in conventional supervised document classification, they are…

Computation and Language · Computer Science 2023-10-24 Yu Zhang , Yu Meng , Jiaxin Huang , Frank F. Xu , Xuan Wang , Jiawei Han

Weakly-Supervised Text Instance Segmentation

Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Xinyan Zu , Haiyang Yu , Bin Li , Xiangyang Xue

Unsupervised Label Refinement Improves Dataless Text Classification

Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label…

Computation and Language · Computer Science 2020-12-09 Zewei Chu , Karl Stratos , Kevin Gimpel

Using Contextual Information for Sentence-level Morpheme Segmentation

Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence…

Computation and Language · Computer Science 2024-12-18 Prabin Bhandari , Abhishek Paudel

Making Efficient Use of a Domain Expert's Time in Relation Extraction

Scarcity of labeled data is one of the most frequent problems faced in machine learning. This is particularly true in relation extraction in text mining, where large corpora of texts exists in many application domains, while labeling of…

Machine Learning · Computer Science 2018-07-13 Linara Adilova , Sven Giesselbach , Stefan Rüping

Label Dependencies-aware Set Prediction Networks for Multi-label Text Classification

Multi-label text classification involves extracting all relevant labels from a sentence. Given the unordered nature of these labels, we propose approaching the problem as a set prediction task. To address the correlation between labels, we…

Computation and Language · Computer Science 2024-03-15 Du Xinkai , Han Quanjie , Sun Yalin , Lv Chao , Sun Maosong

Topic Segmentation and Labeling in Asynchronous Conversations

Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog…

Computation and Language · Computer Science 2014-02-05 Shafiq Rayhan Joty , Giuseppe Carenini , Raymond T Ng

Improve Text Classification Accuracy with Intent Information

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and…

Computation and Language · Computer Science 2020-01-06 Goran Glavaš , Swapna Somasundaran

Knowledge Distillation for Semantic Segmentation: A Label Space Unification Approach

An increasing number of datasets sharing similar domains for semantic segmentation have been published over the past few years. But despite the growing amount of overall data, it is still difficult to train bigger and better models due to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-27 Anton Backhaus , Thorsten Luettel , Mirko Maehlisch