Aggregate Estimation Over Dynamic Hidden Web Databases

Weimo Liu; Saravanan Thirumuruganathan; Nan Zhang; Gautam Das

Aggregate Estimation Over Dynamic Hidden Web Databases

Databases 2014-05-02 v2

Authors: Weimo Liu , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das

Abstract

Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

Keywords

relational database association rule mining web browsing

Cite

@article{arxiv.1403.2763,
  title  = {Aggregate Estimation Over Dynamic Hidden Web Databases},
  author = {Weimo Liu and Saravanan Thirumuruganathan and Nan Zhang and Gautam Das},
  journal= {arXiv preprint arXiv:1403.2763},
  year   = {2014}
}

Aggregate Estimation Over Dynamic Hidden Web Databases

Abstract

Keywords

Cite

Related papers