Serverless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. In this paper, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure pay-as-you-go cost model. With Flint, a developer uses PySpark exactly as before, but without needing an actual Spark cluster. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics.
@article{arxiv.1803.06354,
title = {Serverless Data Analytics with Flint},
author = {Youngbin Kim and Jimmy Lin},
journal= {arXiv preprint arXiv:1803.06354},
year = {2018}
}
Comments
Published in the Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD 2018)