English
Related papers

Related papers: Unsupervised Data Extraction from Computer-generat…

200 papers

Automating information extraction from form-like documents at scale is a pressing need due to its potential impact on automating business workflows across many industries like financial services, insurance, and healthcare. The key challenge…

Machine Learning · Computer Science 2022-01-14 Beliz Gunel , Navneet Potti , Sandeep Tata , James B. Wendt , Marc Najork , Jing Xie

Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product…

Artificial Intelligence · Computer Science 2020-10-21 Shivali Agarwal , Shubham Atreja , Vikas Agarwal

Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via…

Computation and Language · Computer Science 2023-02-20 Rishabh Joshi , Vidhisha Balachandran , Emily Saldanha , Maria Glenski , Svitlana Volkova , Yulia Tsvetkov

Improving data quality in unstructured documents is a long-standing challenge. Unstructured data, especially in textual form, inherently lacks defined semantics, which poses significant challenges for effective processing and for ensuring…

Databases · Computer Science 2025-02-26 Besat Kassaie , Frank Wm. Tompa

This technical memo describes Information Extraction from the point-of-view of a potential user of the technology. No knowledge of language processing is assumed. Information Extraction is a process which takes unseen texts as input and…

cmp-lg · Computer Science 2008-02-03 Hamish Cunningham

Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly…

Computation and Language · Computer Science 2018-09-07 Kamil Bennani-Smires , Claudiu Musat , Andreea Hossmann , Michael Baeriswyl , Martin Jaggi

While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable…

Computation and Language · Computer Science 2023-02-21 Bradley Butcher , Miri Zilka , Darren Cook , Jiri Hron , Adrian Weller

Extracting information from documents usually relies on natural language processing methods working on one-dimensional sequences of text. In some cases, for example, for the extraction of key information from semi-structured documents, such…

Computation and Language · Computer Science 2021-06-29 Oliver Bensch , Mirela Popa , Constantin Spille

Information extraction (IE) from unstructured documents remains a critical challenge in data processing pipelines. Traditional optical character recognition (OCR) methods and conventional parsing engines demonstrate limited effectiveness…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Aditya Parikh

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying…

Information Retrieval · Computer Science 2021-11-08 Richi Nayak , Thirunavukarasu Balasubramaniam , Sangeetha Kutty , Sachindra Banduthilaka , Erin Peterson

The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored. The problem so far has been framed in a standard supervised way and consists…

Computation and Language · Computer Science 2021-11-05 Matthias Gallé , Jos Rozen , Germán Kruszewski , Hady Elsahar

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective…

Recently, automatically extracting information from visually rich documents (e.g., tickets and resumes) has become a hot and vital research topic due to its widespread commercial value. Most existing methods divide this task into two…

Computer Vision and Pattern Recognition · Computer Science 2022-07-15 Zhanzhan Cheng , Peng Zhang , Can Li , Qiao Liang , Yunlu Xu , Pengfei Li , Shiliang Pu , Yi Niu , Fei Wu

Many documents, that we call templatized documents, are programmatically generated by populating fields in a visual template. Effective data extraction from these documents is crucial to supporting downstream analytical tasks. Current data…

Databases · Computer Science 2025-01-14 Yiming Lin , Mawil Hasan , Rohan Kosalge , Alvin Cheung , Aditya G. Parameswaran

In recent years, text summarization methods have attracted much attention again thanks to the researches on neural network models. Most of the current text summarization methods based on neural network models are supervised methods which…

Computation and Language · Computer Science 2024-01-25 Dehao Tao , Yingzhu Xiong , Zhongliang Yang , Yongfeng Huang

We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge…

Information Retrieval · Computer Science 2024-04-12 Sriraghavendra Ramaswamy

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks,…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Peng Zhang , Yunlu Xu , Zhanzhan Cheng , Shiliang Pu , Jing Lu , Liang Qiao , Yi Niu , Fei Wu

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations…

Information Retrieval · Computer Science 2017-09-19 Christophe Van Gysel , Maarten de Rijke , Marcel Worring

Efficiently identifying keyphrases that represent a given document is a challenging task. In the last years, plethora of keyword detection approaches were proposed. These approaches can be based on statistical (frequency-based) properties…

Information Retrieval · Computer Science 2023-12-25 Blaž Škrlj , Boshko Koloski , Senja Pollak

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a…

Computation and Language · Computer Science 2019-07-31 Eirini Papagiannopoulou , Grigorios Tsoumakas
‹ Prev 1 2 3 10 Next ›