Related papers: Fine-Tuning Data Structures for Analytical Query P…
Widespread adoption of AI systems hinges on their ability to generate economic value that outweighs their inference costs. Evaluating this tradeoff requires metrics accounting for both performance and costs. Building on production theory,…
Database applications are typically written using a mixture of imperative languages and declarative frameworks for data processing. Application logic gets distributed across the declarative and imperative parts of a program. Often, there is…
As declarative query processing techniques expand in scope --- to the Web, data streams, network routers, and cloud platforms --- there is an increasing need for adaptive query processing techniques that can re-plan in the presence of…
Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of…
Deep learning compiler frameworks are gaining ground as a more portable back-end for deep learning applications on increasingly diverse hardware. However, they face the daunting challenge of matching performance offered by hand-tuned…
Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed…
Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost…
Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…
Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine…
We propose a computationally efficient and high-performance classification algorithm by incorporating class structural information in analysis dictionary learning. To achieve more consistent classification, we associate a class…
Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning…
This paper introduces a cost-efficient active learning (AL) framework for classification, featuring a novel query design called candidate set query. Unlike traditional AL queries requiring the oracle to examine all possible classes, our…
We present Deep-n-Cheap -- an open-source AutoML framework to search for deep learning models. This search includes both architecture and training hyperparameters, and supports convolutional neural networks and multi-layer perceptrons. Our…
This work investigates how the traditional image classification pipelines can be extended into a deep architecture, inspired by recent successes of deep neural networks. We propose a deep boosting framework based on layer-by-layer joint…
The principal component of conventional database query optimizers is a cost model that is used to estimate expected performance of query plans. The accuracy of the cost model has direct impact on the optimality of execution plans selected…
Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they cannot capture…
Nowadays, as machine-learned software quickly permeates our society, we are becoming increasingly vulnerable to programming errors in the data pre-processing or training software, as well as errors in the data itself. In this paper, we…
We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a time-bounded search procedure guided by a learned cost function, and…
Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very…
The organization of latent knowledge within large-scale models poses unique challenges when addressing overlapping representations and optimizing contextual accuracy. Conceptual redundancies embedded across layers often result in…