Related papers: Hypertext Entity Extraction in Webpage
Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several…
Understanding the semantic meaning of content on the web through the lens of entities and concepts has many practical advantages. However, when building large-scale entity extraction systems, practitioners are facing unique challenges…
Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically…
Entity extraction is fundamental to many text mining tasks such as organisation name recognition. A popular approach to entity extraction is based on matching sub-string candidates in a document against a dictionary of entities. To handle…
Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the…
Knowledge is captured in the form of entities and their relationships and stored in knowledge graphs. Knowledge graphs enhance the capabilities of applications in many different areas including Web search, recommendation, and natural…
Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained…
Document-level entity-based extraction (EE), aiming at extracting entity-centric information such as entity roles and entity relations, is key to automatic knowledge acquisition from text corpora for various domains. Most document-level EE…
Entity linking is the task of identifying mentions of entities in text, and linking them to entries in a knowledge base. This task is especially difficult in microblogs, as there is little additional text to provide disambiguating context;…
Retrieving images from natural language descriptions is a core task at the intersection of computer vision and natural language processing, with wide-ranging applications in search engines, media archiving, and digital content management.…
In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a…
The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text…
With the rapid development of Internet technology, people have more and more access to a variety of web page resources. At the same time, the current rapid development of deep learning technology is often inseparable from the huge amount of…
In this paper, we present an end-to-end joint entity and relation extraction approach based on transformer-based language models. We apply the model to the task of linking mathematical symbols to their descriptions in LaTeX documents. In…
Detection and disambiguation of all entities in text is a crucial task for a wide range of applications. The typical formulation of the problem involves two stages: detect mention boundaries and link all mentions to a knowledge base. For a…
Events and entities are closely related; entities are often actors or participants in events and events without entities are uncommon. The interpretation of events and entities is highly contextually dependent. Existing work in information…
Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key…
Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata…
In document-level relation extraction, entities may appear multiple times in a document, and their relationships can shift from one context to another. Accurate prediction of the relationship between two entities across an entire document…
The hypergraph-of-entity was conceptually proposed as a general model for entity-oriented search. However, only the performance for ad hoc document retrieval had been assessed. We continue this line of research by also evaluating ad hoc…