Related papers: DeepDB: Learn from Data, not from Queries!
This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research. The input to learning classification and…
Workload management for cloud databases must deal with the tasks of resource provisioning, query placement and query scheduling in a manner that meets the application's performance goals while minimizing the cost of using cloud resources.…
Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical…
Signal processing traditionally relies on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple…
Having access to realistic workloads for a given database instance is extremely important to enable stress and vulnerability testing, as well as to optimize for cost and performance. Recent advances in learned cost models have shown that…
Deep Learning (DL) applications are being used to solve problems in critical domains (e.g., autonomous driving or medical diagnosis systems). Thus, developers need to debug their systems to ensure that the expected behavior is delivered.…
Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations. Adaptation is done on a per-task basis, and retraining is needed…
When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…
The essence of deep learning is to exploit data to train a deep neural network (DNN) model. This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model. We…
Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very…
The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of…
There are significant benefits to serve deep learning models from relational databases. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system…
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and…
Business process simulation is a well-known approach to estimate the impact of changes to a process with respect to time and cost measures -- a practice known as what-if process analysis. The usefulness of such estimations hinges on the…
Real-world datasets are dirty and contain many errors. Examples of these issues are violations of integrity constraints, duplicates, and inconsistencies in representing data values and entities. Learning over dirty databases may result in…
Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…
Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because…
Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…
Although learning from data is effective and has achieved significant milestones, it has many challenges and limitations. Learning from data starts from observations and then proceeds to broader generalizations. This framework is…
Machine learning models based on neural networks (NNs) are enjoying ever-increasing attention in the DB community. However, an important issue has been largely overlooked, namely the challenge of dealing with the highly dynamic nature of…