Related papers: Structuring an unordered text document

Text Segmentation as a Supervised Learning Task

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as…

Computation and Language · Computer Science 2018-03-28 Omri Koshorek , Adir Cohen , Noam Mor , Michael Rotman , Jonathan Berant

Inferring Strategies for Sentence Ordering in Multidocument News Summarization

The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the…

Artificial Intelligence · Computer Science 2011-06-10 R. Barzilay , N. Elhadad

Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions

Though many algorithms can be used to automatically summarize legal case decisions, most fail to incorporate domain knowledge about how important sentences in a legal decision relate to a representation of its document structure. For…

Computation and Language · Computer Science 2022-11-08 Yang Zhong , Diane Litman

An Unsupervised Semantic Sentence Ranking Scheme for Text Documents

This paper presents Semantic SentenceRank (SSR), an unsupervised scheme for automatically ranking sentences in a single document according to their relative importance. In particular, SSR extracts essential words and phrases from a text…

Information Retrieval · Computer Science 2020-05-06 Hao Zhang , Jie Wang

Unfolding the Structure of a Document using Deep Learning

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

Structural Text Segmentation of Legal Documents

The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs. However, such downstream systems typically require documents to be…

Computation and Language · Computer Science 2021-05-18 Dennis Aumiller , Satya Almasian , Sebastian Lackner , Michael Gertz

Automated Text Summarization Base on Lexicales Chain and graph Using of WordNet and Wikipedia Knowledge Base

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of…

Information Retrieval · Computer Science 2012-04-10 Mohsen Pourvali , Mohammad Saniee Abadeh

Test Model for Text Categorization and Text Summarization

Text Categorization is the task of automatically sorting a set of documents into categories from a predefined set and Text Summarization is a brief and accurate representation of input text such that the output covers the most important…

Information Retrieval · Computer Science 2013-05-14 Khushboo Thakkar , Urmila Shrawankar

Multi-Document Summarization using Distributed Bag-of-Words Model

As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised…

Computation and Language · Computer Science 2018-06-12 Kaustubh Mani , Ishan Verma , Hardik Meisheri , Lipika Dey

Toward Unifying Text Segmentation and Long Document Summarization

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem…

Computation and Language · Computer Science 2022-11-01 Sangwoo Cho , Kaiqiang Song , Xiaoyang Wang , Fei Liu , Dong Yu

Understanding and representing the semantics of large structured documents

Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function…

Computation and Language · Computer Science 2018-07-27 Muhammad Mahbubur Rahman , Tim Finin

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Graph-based Semantical Extractive Text Analysis

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to…

Computation and Language · Computer Science 2022-12-20 Mina Samizadeh

Keywords lie far from the mean of all words in local vector space

Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based…

Computation and Language · Computer Science 2020-08-24 Eirini Papagiannopoulou , Grigorios Tsoumakas , Apostolos N. Papadopoulos

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for…

Computation and Language · Computer Science 2007-05-23 Regina Barzilay , Lillian Lee

Plans for Evaluating Structured Generative Search Summaries

We propose a framework for evaluating structured generative search summaries that are placed atop organic web search results. A structured summary, generated by a large language model, typically consists of an overview, several sections…

Information Retrieval · Computer Science 2026-05-27 Tetsuya Sakai , Jina Lee , Hanpei Fang , Young-In Song

Making Sense of Unstructured Text Data

Many network analysis tasks in social sciences rely on pre-existing data sources that were created with explicit relations or interactions between entities under consideration. Examples include email logs, friends and followers networks on…

Social and Information Networks · Computer Science 2017-04-20 Lin Li , William M. Campbell , Cagri Dagli , Joseph P. Campbell

Integrating Unstructured Text into Causal Inference: Empirical Evidence from Real Data

Causal inference, a critical tool for informing business decisions, traditionally relies heavily on structured data. However, in many real-world scenarios, such data can be incomplete or unavailable. This paper presents a framework that…

Machine Learning · Computer Science 2026-02-17 Boning Zhou , Ziyu Wang , Han Hong , Haoqi Hu

Topological Sort for Sentence Ordering

Sentence ordering is the task of arranging the sentences of a given text in the correct order. Recent work using deep neural networks for this task has framed it as a sequence prediction problem. In this paper, we propose a new framing of…

Computation and Language · Computer Science 2020-05-04 Shrimai Prabhumoye , Ruslan Salakhutdinov , Alan W Black

A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data

The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights. Traditional document summarization approaches often fail to meet individual user requirements and lack…

Information Retrieval · Computer Science 2023-07-13 Samira Ghodratnama , Amin Beheshti , Mehrdad Zakershahrak