Tupleware: Redefining Modern Analytics
Abstract
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems.
Cite
@article{arxiv.1406.6667,
title = {Tupleware: Redefining Modern Analytics},
author = {Andrew Crotty and Alex Galakatos and Kayhan Dursun and Tim Kraska and Ugur Cetintemel and Stan Zdonik},
journal= {arXiv preprint arXiv:1406.6667},
year = {2014}
}