Related papers: WebRelate: Integrating Web Data with Spreadsheets …

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Web search is an essential way for humans to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of structural reading comprehension (SRC) on web.…

Computation and Language · Computer Science 2021-11-09 Xingyu Chen , Zihan Zhao , Lu Chen , Danyang Zhang , Jiabao Ji , Ao Luo , Yuxuan Xiong , Kai Yu

WebRED: Effective Pretraining And Finetuning For Relation Extraction On The Web

Relation extraction is used to populate knowledge bases that are important to many applications. Prior datasets used to train relation extraction models either suffer from noisy labels due to distant supervision, are limited to certain…

Computation and Language · Computer Science 2021-02-22 Robert Ormandi , Mohammad Saleh , Erin Winter , Vinay Rao

Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web

The Web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities; rather, they are intricately connected through complex relationships. Semantic relationships…

Information Retrieval · Computer Science 2024-08-28 Kate Lin , Tarfah Alrashed , Natasha Noy

Recommending Related Tables

Tables are an extremely powerful visual and interactive tool for structuring and manipulating data, making spreadsheet programs one of the most popular computer applications. In this paper we introduce and address the task of recommending…

Information Retrieval · Computer Science 2019-07-26 Shuo Zhang , Krisztian Balog

Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and Opportunities

Relation extraction is a Natural Language Processing task that aims to extract relationships from textual data. It is a critical step for information extraction. Due to its wide-scale applicability, research in relation extraction has…

Computation and Language · Computer Science 2024-11-27 Anushka Swarup , Avanti Bhandarkar , Olivia P. Dizon-Paradis , Ronald Wilson , Damon L. Woodard

An Annotated Corpus of Webtables for Information Extraction Tasks

Information Extraction is a well-researched area of Natural Language Processing with applications in web search and question answering concerned with identifying entities and relationships between them as expressed in a given context,…

Information Retrieval · Computer Science 2020-11-17 Erin Macdonald , Denilson Barbosa

Public Data Integration with WebSmatch

Integrating open data sources can yield high value information but raises major problems in terms of metadata extraction, data source integration and visualization of integrated data. In this paper, we describe WebSmatch, a flexible…

Digital Libraries · Computer Science 2012-05-16 R. Coletta , E. Castanier , P. Valduriez , C. Frisch , D. Ngo , Z. Bellahsene

Web Data Extraction, Applications and Techniques: A Survey

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and…

Information Retrieval · Computer Science 2017-03-07 Emilio Ferrara , Pasquale De Meo , Giacomo Fiumara , Robert Baumgartner

RTE: A Tool for Annotating Relation Triplets from Text

In this work, we present a Web-based annotation tool `Relation Triplets Extractor' \footnote{https://abera87.github.io/annotate/} (RTE) for annotating relation triplets from the text. Relation extraction is an important task for extracting…

Computation and Language · Computer Science 2021-08-19 Ankan Mullick , Animesh Bera , Tapas Nayak

WebDS: An End-to-End Benchmark for Web-based Data Science

Many real-world data science tasks involve complex web-based interactions: finding appropriate data available on the internet, synthesizing multimodal data from different locations, and producing summarized analyses. Existing web benchmarks…

Computation and Language · Computer Science 2026-03-05 Ethan Hsu , Hong Meng Yam , Ines Bouissou , Aaron Murali John , Raj Thota , Josh Koe , Vivek Sarath Putta , G K Dharesan , Alexander Spangher , Shikhar Murty , Tenghao Huang , Christopher D. Manning

Relating Web pages to enable information-gathering tasks

We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the…

Information Retrieval · Computer Science 2010-05-20 Amitabha Bagchi , Garima Lahoti

Content-Based Table Retrieval for Web Queries

Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to…

Computation and Language · Computer Science 2017-06-09 Zhao Yan , Duyu Tang , Nan Duan , Junwei Bao , Yuanhua Lv , Ming Zhou , Zhoujun Li

Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search -- Extended Technical Report

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various…

Databases · Computer Science 2020-06-15 Till Blume , Ansgar Scherp

Web Service Interface for Data Collection

Data collection is a key component of an information system. The widespread penetration of ICT tools in organizations and institutions has resulted in a shift in the way the data is collected. Data may be collected in printed-form, by…

Computers and Society · Computer Science 2013-03-27 Ruchika Thukral , Anita Goel

ObjTables: structured spreadsheets that promote data quality, reuse, and integration

A central challenge in science is to understand how systems behaviors emerge from complex networks. This often requires aggregating, reusing, and integrating heterogeneous information. Supplementary spreadsheets to articles are a key data…

Databases · Computer Science 2020-08-10 Jonathan R. Karr , Wolfram Liebermeister , Arthur P. Goldberg , John A. P. Sekar , Bilal Shaikh

Fusing Data with Correlations

Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce…

Databases · Computer Science 2015-03-03 Ravali Pochampally , Anish Das Sarma , Xin Luna Dong , Alexandra Meliou , Divesh Srivastava

DataJoint: A Simpler Relational Data Model

The relational data model offers unrivaled rigor and precision in defining data structure and querying complex data. Yet the use of relational databases in scientific data pipelines is limited due to their perceived unwieldiness. We propose…

Databases · Computer Science 2018-07-31 Dimitri Yatsenko , Edgar Y. Walker , Andreas S. Tolias

Untidy Data: The Unreasonable Effectiveness of Tables

Working with data in table form is usually considered a preparatory and tedious step in the sensemaking pipeline; a way of getting the data ready for more sophisticated visualization and analytical tools. But for many people, spreadsheets…

Human-Computer Interaction · Computer Science 2021-06-30 Lyn Bartram , Michael Correll , Melanie Tory

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information…

Artificial Intelligence · Computer Science 2025-10-09 Jingbo Yang , Bairu Hou , Wei Wei , Shiyu Chang , Yujia Bao

Document-Level Relation Extraction with Relation Correlation Enhancement

Document-level relation extraction (DocRE) is a task that focuses on identifying relations between entities within a document. However, existing DocRE models often overlook the correlation between relations and lack a quantitative analysis…

Information Retrieval · Computer Science 2023-10-23 Yusheng Huang , Zhouhan Lin