Aggregate Estimation Over Dynamic Hidden Web Databases
Abstract
Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).
Cite
@article{arxiv.1403.2763,
title = {Aggregate Estimation Over Dynamic Hidden Web Databases},
author = {Weimo Liu and Saravanan Thirumuruganathan and Nan Zhang and Gautam Das},
journal= {arXiv preprint arXiv:1403.2763},
year = {2014}
}