Related papers: In-Browser Split-Execution Support for Interactive…
This paper explores the novel and unconventional idea of implementing an analytical RDBMS in pure JavaScript so that it runs completely inside a browser with no external dependencies. Our prototype, called Afterburner, generates compiled…
This paper addresses how the benefits of cloud-based infrastructure can be harnessed for analytical workloads. Often the software handling analytical workloads is not developed by a professional programmer, but on an ad hoc basis by…
An increasing number of Analytics-as-a-Service solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion…
The increasing use of statistical data analysis in enterprise applications has created an arms race among database vendors to offer ever more sophisticated in-database analytics. One challenge in this race is that each new statistical…
Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based big data analytics (including descriptive and predictive analytics) in the cloud.…
This paper proposes a visual analytics framework that addresses the complex user interactions required through a command-line interface to run analyses in distributed data analysis systems. The visual analytics framework facilitates the…
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream…
While several attempts have been made to construct a scalable and flexible architecture for analysis of streaming data, no general model to tackle this task exists. Thus, our goal is to build a scalable and maintainable architecture for…
Online and offline analytics have been traditionally treated separately in software architecture design, and there is no existing general architecture that can support both. Our objective is to go beyond and introduce a scalable and…
As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process…
A large number of cloud middleware platforms and tools are deployed to support a variety of Internet of Things (IoT) data analytics tasks. It is a common practice that such cloud platforms are only used by its owners to achieve their…
JavaScript engines inside modern browsers are capable of running sophisticated multi-player games, rendering impressive 3D scenes, and supporting complex, interactive visualizations. Can this processing power be harnessed for information…
Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to…
A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost.…
Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to…
Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related with their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the…
Process Execution Engines are a vital part of Business Process Management (BPM) and Manufacturing Orchestration Management (MOM), as they allow the business or manufacturing logic (expressed in a graphical notation such as BPMN) to be…
Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large…
The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators…
One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding…