Related papers: Hypertext Entity Extraction in Webpage

Entity Extraction with Knowledge from Web Scale Corpora

Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several…

Computation and Language · Computer Science 2019-11-22 Zeyi Wen , Zeyu Huang , Rui Zhang

A Web Scale Entity Extraction System

Understanding the semantic meaning of content on the web through the lens of entities and concepts has many practical advantages. However, when building large-scale entity extraction systems, practitioners are facing unique challenges…

Computation and Language · Computer Science 2021-10-04 Xuanting Cai , Quanbin Ma , Pan Li , Jianyu Liu , Qi Zeng , Zhengkan Yang , Pushkar Tripathi

Learning to Extract Structured Entities Using Language Models

Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically…

Computation and Language · Computer Science 2024-10-03 Haolun Wu , Ye Yuan , Liana Mikaelyan , Alexander Meulemans , Xue Liu , James Hensman , Bhaskar Mitra

A Technical Report: Entity Extraction using Both Character-based and Token-based Similarity

Entity extraction is fundamental to many text mining tasks such as organisation name recognition. A popular approach to entity extraction is based on matching sub-string candidates in a document against a dictionary of entities. To handle…

Databases · Computer Science 2017-02-14 Zeyi Wen , Dong Deng , Rui Zhang , Kotagiri Ramamohanarao

A New Entity Extraction Method Based on Machine Reading Comprehension

Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the…

Computation and Language · Computer Science 2021-08-23 Xiaobo Jiang , Kun He , Jiajun He , Guangyu Yan

Entity Context Graph: Learning Entity Representations fromSemi-Structured Textual Sources on the Web

Knowledge is captured in the form of entities and their relationships and stored in knowledge graphs. Knowledge graphs enhance the capabilities of applications in many different areas including Web search, recommendation, and natural…

Machine Learning · Computer Science 2021-03-31 Kalpa Gunaratna , Yu Wang , Hongxia Jin

Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained…

Computation and Language · Computer Science 2023-10-24 Zhongping Zhang , Yiwen Gu , Bryan A. Plummer

Document-level Entity-based Extraction as Template Generation

Document-level entity-based extraction (EE), aiming at extracting entity-centric information such as entity roles and entity relations, is key to automatic knowledge acquisition from text corpora for various domains. Most document-level EE…

Computation and Language · Computer Science 2021-09-13 Kung-Hsiang Huang , Sam Tang , Nanyun Peng

Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities

Entity linking is the task of identifying mentions of entities in text, and linking them to entries in a knowledge base. This task is especially difficult in microblogs, as there is little additional text to provide disambiguating context;…

Computation and Language · Computer Science 2016-09-27 Yi Yang , Ming-Wei Chang , Jacob Eisenstein

Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

Retrieving images from natural language descriptions is a core task at the intersection of computer vision and natural language processing, with wide-ranging applications in search engines, media archiving, and digital content management.…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Dao Sy Duy Minh , Huynh Trung Kiet , Nguyen Lam Phu Quy , Phu-Hoa Pham , Tran Chi Nguyen

Leveraging Contextual Information for Effective Entity Salience Detection

In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a…

Computation and Language · Computer Science 2024-04-04 Rajarshi Bhowmik , Marco Ponza , Atharva Tendle , Anant Gupta , Rebecca Jiang , Xingyu Lu , Qian Zhao , Daniel Preotiuc-Pietro

Extraction of Core Contents from Web Pages

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text…

Information Retrieval · Computer Science 2014-03-11 Sandeep Sirsat

Web Page Content Extraction Based on Multi-feature Fusion

With the rapid development of Internet technology, people have more and more access to a variety of web page resources. At the same time, the current rapid development of deep learning technology is often inseparable from the huge amount of…

Information Retrieval · Computer Science 2022-10-27 Bowen Yu , Junping Du , Yingxia Shao

AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First -- Using Relation Extraction to Identify Entities

In this paper, we present an end-to-end joint entity and relation extraction approach based on transformer-based language models. We apply the model to the task of linking mathematical symbols to their descriptions in LaTeX documents. In…

Computation and Language · Computer Science 2022-05-05 Nicholas Popovic , Walter Laurito , Michael Färber

Entity Tagging: Extracting Entities in Text Without Mention Supervision

Detection and disambiguation of all entities in text is a crucial task for a wide range of applications. The typical formulation of the problem involves two stages: detect mention boundaries and link all mentions to a knowledge base. For a…

Information Retrieval · Computer Science 2022-09-14 Christina Du , Kashyap Popat , Louis Martin , Fabio Petroni

Joint Extraction of Events and Entities within a Document Context

Events and entities are closely related; entities are often actors or participants in events and events without entities are uncommon. The interpretation of events and entities is highly contextually dependent. Existing work in information…

Computation and Language · Computer Science 2016-09-14 Bishan Yang , Tom Mitchell

Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions

Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key…

Information Retrieval · Computer Science 2022-07-11 Raquib Bin Yousuf , Subhodip Biswas , Kulendra Kumar Kaushal , James Dunham , Rebecca Gelles , Sathappan Muthiah , Nathan Self , Patrick Butler , Naren Ramakrishnan

WebIE: Faithful and Robust Information Extraction on the Web

Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata…

Computation and Language · Computer Science 2023-06-16 Chenxi Whitehouse , Clara Vania , Alham Fikri Aji , Christos Christodoulopoulos , Andrea Pierleoni

Multi-Relation Extraction in Entity Pairs using Global Context

In document-level relation extraction, entities may appear multiple times in a document, and their relationships can shift from one context to another. Accurate prediction of the relationship between two entities across an entire document…

Computation and Language · Computer Science 2025-08-01 Nilesh , Atul Gupta , Avinash C Panday

Hypergraph-of-Entity: A General Model for Entity-Oriented Search

The hypergraph-of-entity was conceptually proposed as a general model for entity-oriented search. However, only the performance for ad hoc document retrieval had been assessed. We continue this line of research by also evaluating ad hoc…

Information Retrieval · Computer Science 2021-09-02 José Devezas , Sérgio Nunes