Related papers: Opening the Black Boxes in Data Flow Optimization
In many massively parallel data management platforms, programs are represented as small imperative pieces of code connected in a data flow. This popular abstraction makes it hard to apply algebraic reordering techniques employed by…
Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based,…
Traditionally, query optimizers have been designed for computer systems that share a common architecture, consisting of a CPU, main memory and disk subsystem. The efficiency of query optimizers and their successful employment relied on the…
Black-box optimization is often encountered for decision-making in complex systems management, where the knowledge of system is limited. Under these circumstances, it is essential to balance the utilization of new information with…
Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…
The diverse world of machine learning applications has given rise to a plethora of algorithms and optimization methods, finely tuned to the specific regression or classification task at hand. We reduce the complexity of algorithm design for…
Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. However, it remains a challenge for users to apply BBO methods to their problems at hand…
In their hunt for highlights, i.e., interesting patterns in the data, data analysts have to issue groups of related queries and manually combine their results. To the extent that the analyst's goals are based on an intention on what to…
We consider black-box optimization in which only an extremely limited number of function evaluations, on the order of around 100, are affordable and the function evaluations must be performed in even fewer batches of a limited number of…
Interprocedural data-flow analyses form an expressive and useful paradigm of numerous static analysis applications, such as live variables analysis, alias analysis and null pointers analysis. The most widely-used framework for…
Recent years have seen an increased interest in large-scale analytical dataflows on non-relational data. These dataflows are compiled into execution graphs scheduled on large compute clusters. In many novel application areas the predominant…
LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic…
Many planning applications involve complex relationships defined on high-dimensional, continuous variables. For example, robotic manipulation requires planning with kinematic, collision, visibility, and motion constraints involving robot…
A calculus is presented for the stepwise refinement of abstract information flow architectures. We give a mathematical model for information flow components based on relations between input and output communication histories, and describe…
Quantum computing has shown promise for solving complex optimization problems in databases, such as join ordering and index selection. Prior work often submits formulated problems directly to black-box quantum or quantum-inspired solvers…
This paper concerns the analysis of information leaks in security systems. We address the problem of specifying and analyzing large systems in the (standard) channel model used in quantitative information flow (QIF). We propose several…
Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the…
Despite the success statistical physics has enjoyed at predicting the properties of materials for given parameters, the inverse problem, identifying which material parameters produce given, desired properties, is only beginning to be…
This article describes a very high-level language for clear description of distributed algorithms and optimizations necessary for generating efficient implementations. The language supports high-level control flows where complex…
Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by means of linear algebra programs, and then automatically generate efficient execution plans. In this context, optimization opportunities for fused…