Related papers: Discover Aggregates Exceptions over Hidden Web Dat…
Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases…
A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static…
Data-mining techniques have frequently been developed for Spontaneous reporting databases. These techniques aim to find adverse drug events accurately and efficiently. Spontaneous reporting databases are prone to missing information, under…
The article describes a practical method for detecting outlier database connections in real-time. Outlier connections are detected with a specified level of confidence. The method is based on generalized security rules and a simple but…
Today, data analysts largely rely on intuition to determine whether missing or withheld rows of a dataset significantly affect their analyses. We propose a framework that can produce automatic contingency analysis, i.e., the range of values…
The increasing connectivity of data and cyber-physical systems has resulted in a growing number of cyber-attacks. Real-time detection of such attacks, through the identification of anomalous activity, is required so that mitigation and…
Efficient search operations in databases are paramount for timely retrieval of information various applications. This research introduces a novel approach, combining dynamicalgorithm1 selection and caching2 strategies, to optimize search…
Certain answers are a principled method for coping with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Prior work introduced…
Many web databases are "hidden" behind proprietary search interfaces that enforce the top-$k$ output constraint, i.e., each query returns at most $k$ of all matching tuples, preferentially selected and returned according to a proprietary…
Crawler-based search engines are the mostly used search engines among web and Internet users, involve web crawling, storing in database, ranking, indexing and displaying to the user. But it is noteworthy that because of increasing changes…
Current extragalactic databases are reviewed, including object-oriented databases, astronomical catalogues and compilations, as well as image archives and object catalogues from large-scale surveys. One challenge of the future will be to…
Anomaly detection methods are part of the systems where rare events may endanger an operation's profitability, safety, and environmental aspects. Although many state-of-the-art anomaly detection methods were developed to date, their…
Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today's era of data analysis, however, it poses…
In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a…
In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data…
We study the design of differentially private algorithms for adaptive analysis of dynamically growing databases, where a database accumulates new data entries while the analysis is ongoing. We provide a collection of tools for machine…
Monitoring network traffic data to detect any hidden patterns of anomalies is a challenging and time-consuming task that requires high computing resources. To this end, an appropriate summarization technique is of great importance, where it…
Data leakage and theft from databases is a dangerous threat to organizations. Data Security and Data Privacy protection systems (DSDP) monitor data access and usage to identify leakage or suspicious activities that should be investigated.…
Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories…
The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual…