Related papers: Exact Selectivity Computation for Modern In-Memory…
Selectivity estimation is important in query optimization, however accurate estimation is difficult when predicates are complex. Instead of existing database synopses and statistics not helpful for such cases, we introduce a new approach to…
Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…
Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world…
Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses -- including attribute-level histograms…
Differential computation (DC) is a highly general incremental computation/view maintenance technique that can maintain the output of an arbitrary and possibly recursive dataflow computation upon changes to its base inputs. As such, it is a…
Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…
In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…
Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor…
Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While…
Compared with the fixed-run designs, the sequential adaptive designs (SAD) are thought to be more efficient and effective. Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization…
Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today's databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes.…
Traditionally, query optimizers have been designed for computer systems that share a common architecture, consisting of a CPU, main memory and disk subsystem. The efficiency of query optimizers and their successful employment relied on the…
Traditional query optimizers are designed to be fast and stateless: each query is quickly optimized using approximate statistics, sent off to the execution engine, and promptly forgotten. Recent work on learned query optimization have shown…
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of…
View materialization, index selection, and plan caching are well-known techniques for optimization of query processing in database systems. The essence of these tasks is to select and save a subset of the most useful candidates…
Modern language models (LMs) increasingly require two critical resources: computational resources and data resources. Data selection techniques can effectively reduce the amount of training data required for fine-tuning LMs. However, their…
Query processing in search engines can be optimized for use for all queries. For this, system component parameters such as the weighting function or the automatic query expansion model can be optimized or learned from past queries. However,…
Efficient search operations in databases are paramount for timely retrieval of information various applications. This research introduces a novel approach, combining dynamicalgorithm1 selection and caching2 strategies, to optimize search…
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the…
Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been much less work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data…