Related papers: Forecasting SQL Query Cost at Twitter
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to…
The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of…
Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine…
Having access to realistic workloads for a given database instance is extremely important to enable stress and vulnerability testing, as well as to optimize for cost and performance. Recent advances in learned cost models have shown that…
Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost…
Accurate query runtime prediction is a critical component of effective query optimization in modern database systems. Traditional cost models, such as those used in PostgreSQL, rely on static heuristics that often fail to reflect actual…
Predicting query execution time is a fundamental issue underlying many database management tasks. Existing predictors rely on information such as cardinality estimates and system performance constants that are difficult to know exactly. As…
Query performance prediction, the task of predicting the latency of a query, is one of the most challenging problem in database management systems. Existing approaches rely on features and performance models engineered by human experts, but…
Cloud data warehouses bill compute based on slot-time consumed. In shared multi-tenant environments, query cost is highly variable and hard to estimate before execution, causing budget overruns and degraded scheduling. Static query-planner…
We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless…
While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-based cloud billing. This paper evaluates…
Query optimization is a pivotal part of every database management system (DBMS) since it determines the efficiency of query execution. Numerous works have introduced Machine Learning (ML) techniques to cost modeling, cardinality estimation,…
Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they…
Traditionally, query optimizers rely on cost models to choose the best execution plan from several candidates, making precise cost estimates critical for efficient query execution. In recent years, cost models based on machine learning have…
Predictive models are often used for real-time decision making. However, typical machine learning techniques ignore feature evaluation cost, and focus solely on the accuracy of the machine learning models obtained utilizing all the features…
Data lakes hold a growing amount of cold data that is infrequently accessed, yet require interactive response times. Serverless functions are seen as a way to address this use case since they offer an appealing alternative to maintaining…
Predicting the execution time of queries is an important problem with applications in scheduling, service level agreements and error detection. During query planning, a cost is associated with the chosen execution plan and used to rank…
The principal component of conventional database query optimizers is a cost model that is used to estimate expected performance of query plans. The accuracy of the cost model has direct impact on the optimality of execution plans selected…
Modern database systems rely on cost-based query optimizers to come up with good execution plans for input queries. Such query optimizers rely on cost models to estimate the costs of candidate query execution plans. A cost model represents…
Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes…