English
Related papers

Related papers: Object Recognition from Scientific Document based …

200 papers

With the widespread use of the internet, it has become increasingly crucial to extract specific information from vast amounts of academic articles efficiently. Data mining techniques are generally employed to solve this issue. However, data…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Jinghong Li , Koichi Ota , Wen Gu , Shinobu Hasegawa

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific…

Computer Vision and Pattern Recognition · Computer Science 2020-10-23 Huichen Yang , William H. Hsu

We study the problem of object detection over scanned images of scientific documents. We consider images that contain objects of varying aspect ratios and sizes and range from coarse elements such as tables and figures to fine elements such…

Computer Vision and Pattern Recognition · Computer Science 2019-10-31 Ankur Goswami , Joshua McGrath , Shanan Peters , Theodoros Rekatsinas

Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs,…

Computation and Language · Computer Science 2017-09-05 Muhammad Mahbubur Rahman , Tim Finin

This study explores three approaches to processing table data in scientific papers to enhance extractive question answering and develop a software tool for the systematic review process. The methods evaluated include: (1) Optical Character…

Information Retrieval · Computer Science 2025-08-27 Dongyoun Kim , Hyung-do Choi , Youngsun Jang , John Kim

Retrieving accurate details from documents is a crucial task, especially when handling a combination of scanned images and native digital formats. This document presents a combined framework for text extraction that merges Optical Character…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Rasha Sinha , Rekha B S

Scientific documents contain tables that list important information in a concise fashion. Structure and content extraction from tables embedded within PDF research documents is a very challenging task due to the existence of visual features…

Information Retrieval · Computer Science 2022-11-01 Pratik Kayal , Mrinal Anand , Harsh Desai , Mayank Singh

This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text,…

Scientific retrieval is essential for advancing scientific knowledge discovery. Within this process, document reranking plays a critical role in refining first-stage retrieval results. However, standard LLM listwise reranking faces…

Information Retrieval · Computer Science 2025-08-19 Runchu Tian , Xueqiang Xu , Bowen Jin , SeongKu Kang , Jiawei Han

Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with scientific text require expensive manual annotation and tend to be small and limited to only one or a few…

Computation and Language · Computer Science 2021-05-26 Dustin Wright , Isabelle Augenstein

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take…

Computation and Language · Computer Science 2022-01-11 Shicheng Tan , Shu Zhao , Yanping Zhang

The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of…

Information Retrieval · Computer Science 2020-04-09 Robin Brochier , Antoine Gourru , Adrien Guille , Julien Velcin

We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting…

Software Engineering · Computer Science 2022-04-12 Suresh Parthasarathy , Lincy Pattanaik , Anirudh Khatry , Arun Iyer , Arjun Radhakrishna , Sriram Rajamani , Mohammad Raza

Recently, automatically extracting information from visually rich documents (e.g., tickets and resumes) has become a hot and vital research topic due to its widespread commercial value. Most existing methods divide this task into two…

Computer Vision and Pattern Recognition · Computer Science 2022-07-15 Zhanzhan Cheng , Peng Zhang , Can Li , Qiao Liang , Yunlu Xu , Pengfei Li , Shiliang Pu , Yi Niu , Fei Wu

The exponential growth of scientific literature in PDF format necessitates advanced tools for efficient and accurate document understanding, summarization, and content optimization. Traditional methods fall short in handling complex layouts…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Kun Qian , Wenjie Li , Tianyu Sun , Wenhong Wang , Wenhan Luo

Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant…

Computer Vision and Pattern Recognition · Computer Science 2018-12-17 Vishwanath D , Rohit Rahul , Gunjan Sehgal , Swati , Arindam Chowdhury , Monika Sharma , Lovekesh Vig , Gautam Shroff , Ashwin Srinivasan

Effective science mapping relies on high-quality representations of scientific documents. As an important task in scientometrics and information studies, science mapping is often challenged by the complex and heterogeneous nature of…

Digital Libraries · Computer Science 2025-12-16 Zhentao Liang , Nees Jan van Eck , Xuehua Wu , Jin Mao , Gang Li

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language…

Computation and Language · Computer Science 2022-08-16 Ismail Oussaid , William Vanhuffel , Pirashanth Ratnamogan , Mhamed Hajaiej , Alexis Mathey , Thomas Gilles

The increasing volume of scholarly publications requires advanced tools for efficient knowledge discovery and management. This paper introduces ongoing work on a system using Large Language Models (LLMs) for the semantic extraction of key…

Digital Libraries · Computer Science 2025-10-07 Samy Ateia , Udo Kruschwitz , Melanie Scholz , Agnes Koschmider , Moayad Almohaishi
‹ Prev 1 2 3 10 Next ›