Related papers: Automatic Page Segmentation Without Decompressing …

Word and character segmentation directly in run-length compressed handwritten document images

From the literature, it is demonstrated that performing text-line segmentation directly in the run-length compressed handwritten document images significantly reduces the computational time and memory space. In this paper, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2019-09-12 Amarnath R , P. Nagabhushan , Mohammed Javed

Direct Processing of Document Images in Compressed Domain

With the rapid increase in the volume of Big data of this digital era, fax documents, invoices, receipts, etc are traditionally subjected to compression for the efficiency of data storage and transfer. However, in order to process these…

Computer Vision and Pattern Recognition · Computer Science 2014-10-15 Mohammed Javed , P. Nagabhushan , B. B. Chaudhuri

Extraction of Line Word Character Segments Directly from Run Length Compressed Printed Text Documents

Segmentation of a text-document into lines, words and characters, which is considered to be the crucial pre-processing stage in Optical Character Recognition (OCR) is traditionally carried out on uncompressed documents, although most of the…

Computer Vision and Pattern Recognition · Computer Science 2014-04-01 Mohammed Javed , P. Nagabhushan , B. B. Chaudhuri

Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-21 Amarnath R , P. Nagabhushan

Text Line Segmentation of Historical Documents: a Survey

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Laurence Likforman-Sulem , Abderrazak Zahour , Bruno Taconet

Automatic Text Line Segmentation Directly in JPEG Compressed Document Images

JPEG is one of the popular image compression algorithms that provide efficient storage and transmission capabilities in consumer electronics, and hence it is the most preferred image format over the internet world. In the present digital…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Bulla Rajesh , Mohammed Javed , P Nagabhushan

Semi Automatic Color Segmentation of Document Pages

-This paper presents a semi automatic method used to segment color documents into different uniform color plans. The practical application is dedicated to administrative documents segmentation. In these documents, like in many other cases,…

Computer Vision and Pattern Recognition · Computer Science 2016-09-28 Stéphane Bres , Véronique Eglin , Vincent Poulain

Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm

In this research work, we perform text line segmentation directly in compressed representation of an unconstrained handwritten document image. In this relation, we make use of text line terminal points which is the current state-of-the-art.…

Computer Vision and Pattern Recognition · Computer Science 2019-02-01 Amarnath R , P Nagabhushan

Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images

Text segmentation, the task of dividing a document into sections, is often a prerequisite for performing additional natural language processing tasks. Existing text segmentation methods have typically been developed and tested using clean,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Carol Anderson , Phil Crone

Direct Processing of Run Length Compressed Document Image for Segmentation and Characterization of a Specified Block

Extracting a block of interest referred to as segmenting a specified block in an image and studying its characteristics is of general research interest, and could be a challenging if such a segmentation task has to be carried out directly…

Computer Vision and Pattern Recognition · Computer Science 2014-02-19 Mohammed Javed , P. Nagabhushan , B. B. Chaudhuri

Text Segmentation as a Supervised Learning Task

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as…

Computation and Language · Computer Science 2018-03-28 Omri Koshorek , Adir Cohen , Noam Mor , Michael Rotman , Jonathan Berant

Automatic Detection of Font Size Straight from Run Length Compressed Text Documents

Automatic detection of font size finds many applications in the area of intelligent OCRing and document image analysis, which has been traditionally practiced over uncompressed documents, although in real life the documents exist in…

Computer Vision and Pattern Recognition · Computer Science 2014-02-19 Mohammed Javed , P. Nagabhushan , B. B. Chaudhuri

Genetic Programming for Document Segmentation and Region Classification Using Discipulus

Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human…

Computer Vision and Pattern Recognition · Computer Science 2013-03-05 N. Priyadharshini , M. S. Vijaya

Page Segmentation using Visual Adjacency Analysis

Page segmentation is a web page analysis process that divides a page into cohesive segments, such as sidebars, headers, and footers. Current page segmentation approaches use either the DOM, textual content, or rendering style information of…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Mohammad Bajammal , Ali Mesbah

Split-Correctness in Information Extraction

Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so…

Databases · Computer Science 2021-05-21 Johannes Doleschal , Benny Kimelfeld , Wim Martens , Frank Neven , Matthias Niewerth

Recent Trends in Linear Text Segmentation: a Survey

Linear Text Segmentation is the task of automatically tagging text documents with topic shifts, i.e. the places in the text where the topics change. A well-established area of research in Natural Language Processing, drawing from…

Computation and Language · Computer Science 2024-11-26 Iacopo Ghinassi , Lin Wang , Chris Newell , Matthew Purver

Toward Unifying Text Segmentation and Long Document Summarization

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem…

Computation and Language · Computer Science 2022-11-01 Sangwoo Cho , Kaiqiang Song , Xiaoyang Wang , Fei Liu , Dong Yu

Document Summarization with Text Segmentation

In this paper, we exploit the innate document segment structure for improving the extractive summarization task. We build two text segmentation models and find the most optimal strategy to introduce their output predictions in an extractive…

Computation and Language · Computer Science 2023-01-24 Lesly Miculicich , Benjamin Han

CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on…

Computation and Language · Computer Science 2025-04-03 Tongke Ni , Yang Fan , Junru Zhou , Xiangping Wu , Qingcai Chen

Automating Easy Read Text Segmentation

Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments, to…

Computation and Language · Computer Science 2025-07-21 Jesús Calleja , Thierry Etchegoyhen , David Ponce