Related papers: Reference String Extraction Using Line-Based Condi…

Citation Data-set for Machine Learning Citation Styles and Entity Extraction from Citation Strings

Citation parsing is fundamental for search engines within academia and the protection of intellectual property. Meticulous extraction is further needed when evaluating the similarity of documents and calculating their citation impact.…

Digital Libraries · Computer Science 2018-05-23 Niall Martin Ryan

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference…

Machine Learning · Computer Science 2012-07-19 Ben Wellner , Andrew McCallum , Fuchun Peng , Michael Hay

Learning Soft Linear Constraints with Application to Citation Field Extraction

Accurately segmenting a citation string into fields for authors, titles, etc. is a challenging task because the output typically obeys various global constraints. Previous work has shown that modeling soft constraints, where the model is…

Computation and Language · Computer Science 2014-10-20 Sam Anzaroot , Alexandre Passos , David Belanger , Andrew McCallum

A Linguistic Model for Terminology Extraction based Conditional Random Fields

In this paper, we show the possibility of using a linear Conditional Random Fields (CRF) for terminology extraction from a specialized text corpus.

Computation and Language · Computer Science 2014-02-18 Fethi Fkih , Mohamed Nazih Omri , Imen Toumia

A Simple Extraction Procedure for Bibliographical Author Field

A procedure for bibliographic author metadata extraction from scholarly texts is presented. The author segments are identified based on capitalization and line break patterns. Two main author layout templates, which can retrieve from a…

Digital Libraries · Computer Science 2009-02-05 Pere Constans

Partition-based Field Normalization: An approach to highly specialized publication records

Field normalized citation rates are well-established indicators for research performance from the broadest aggregation levels such as countries, down to institutes and research teams. When applied to still more specialized publication sets…

Digital Libraries · Computer Science 2013-07-26 Nadine Rons

EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching

Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and the quality of identified reference segments. The classic citation matching configuration used in…

Digital Libraries · Computer Science 2019-06-12 Behnam Ghavimi , Wolfgang Otto , Philipp Mayr

Relation extraction from clinical texts using domain invariant convolutional neural network

In recent years extracting relevant information from biomedical and clinical texts such as research articles, discharge summaries, or electronic health records have been a subject of many research efforts and shared challenges. Relation…

Computation and Language · Computer Science 2016-07-01 Sunil Kumar Sahu , Ashish Anand , Krishnadev Oruganty , Mahanandeeshwar Gattu

Sequence-Based Extractive Summarisation for Scientific Articles

This paper presents the results of research on supervised extractive text summarisation for scientific articles. We show that a simple sequential tagging model based only on the text within a document achieves high results against a simple…

Computation and Language · Computer Science 2022-04-08 Daniel Kershaw , Rob Koeling

Preference Learning in Terminology Extraction: A ROC-based approach

A key data preparation step in Text Mining, Term Extraction selects the terms, or collocation of words, attached to specific concepts. In this paper, the task of extracting relevant collocations is achieved through a supervised learning…

Machine Learning · Computer Science 2016-08-16 Jérôme Azé , Mathieu Roche , Yves Kodratoff , Michèle Sebag

Dictionary based methods for information extraction

In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are…

Statistical Mechanics · Physics 2009-11-10 A. Baronchelli , E. Caglioti , V. Loreto , E. Pizzi

Citation Recommendation: Approaches and Datasets

Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing…

Information Retrieval · Computer Science 2020-09-09 Michael Färber , Adam Jatowt

Data Augmentation Techniques for Process Extraction from Scientific Publications

We present data augmentation techniques for process extraction tasks in scientific publications. We cast the process extraction task as a sequence labeling task where we identify all the entities in a sentence and label them according to…

Computation and Language · Computer Science 2025-04-16 Yuni Susanti

Enhancing Keyphrase Extraction from Academic Articles with their Reference Information

With the development of Internet technology, the phenomenon of information overload is becoming more and more obvious. It takes a lot of time for users to obtain the information they need. However, keyphrases that summarize document…

Information Retrieval · Computer Science 2021-12-01 Chengzhi Zhang , Lei Zhao , Mengyuan Zhao , Yingyi Zhang

Generating Extractive Summaries of Scientific Paradigms

Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization…

Information Retrieval · Computer Science 2014-02-05 Vahed Qazvinian , Dragomir R. Radev , Saif M. Mohammad , Bonnie Dorr , David Zajic , Michael Whidby , Taesun Moon

Relation Extraction using Explicit Context Conditioning

Relation Extraction (RE) aims to label relations between groups of marked entities in raw text. Most current RE models learn context-aware representations of the target entities that are then used to establish relation between them. This…

Computation and Language · Computer Science 2019-02-26 Gaurav Singh , Parminder Bhatia

Which techniques does your application use?: An information extraction framework for scientific articles

Every field of research consists of multiple application areas with various techniques routinely used to solve problems in these wide range of application areas. With the exponential growth in research volumes, it has become difficult to…

Computation and Language · Computer Science 2016-08-24 Soham Dan , Sanyam Agarwal , Mayank Singh , Pawan Goyal , Animesh Mukherjee

CitationIE: Leveraging the Citation Graph for Scientific Information Extraction

Automatically extracting key information from scientific documents has the potential to help scientists work more efficiently and accelerate the pace of scientific progress. Prior work has considered extracting document-level entity…

Digital Libraries · Computer Science 2021-06-04 Vijay Viswanathan , Graham Neubig , Pengfei Liu

Are Triggers Needed for Document-Level Event Extraction?

Most existing work on event extraction has focused on sentence-level texts and presumes the identification of a trigger-span -- a word or phrase in the input that evokes the occurrence of an event of interest. Event arguments are then…

Computation and Language · Computer Science 2025-06-30 Shaden Shaar , Wayne Chen , Maitreyi Chatterjee , Barry Wang , Wenting Zhao , Claire Cardie

ACM-CR: A Manually Annotated Test Collection for Citation Recommendation

Citation recommendation is intended to assist researchers in the process of searching for relevant papers to cite by recommending appropriate citations for a given input text. Existing test collections for this task are noisy and unreliable…

Information Retrieval · Computer Science 2021-08-18 Florian Boudin