Related papers: A Synopses Data Engine for Interactive Extreme-Sca…
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems.…
We present S+t-SNE, an adaptation of the t-SNE algorithm designed to handle infinite data streams. The core idea behind S+t-SNE is to update the t-SNE embedding incrementally as new data arrives, ensuring scalability and adaptability to…
Motivated from the Context Aware Computing, and more particularly from the Data-Driven Process Adaptation approach, we propose the Semantic Context Space (SCS) Engine which aims to facilitate the provision of adaptable business processes.…
Data communication in cloud-based distributed stream data analytics often involves a collection of parallel and pipelined TCP flows. As the standard TCP congestion control mechanism is designed for achieving "fairness" among competing flows…
The combination of the infrastructure provided by the Internet of Things (IoT) with numerous processing nodes present at the Edge Computing (EC) ecosystem opens up new pathways to support intelligent applications. Such applications can be…
Pervasive computing involves the placement of processing services close to end users to support intelligent applications. With the advent of the Internet of Things (IoT) and the Edge Computing (EC), one can find room for placing services at…
Stream processing is extensively used in the IoT-to-Cloud spectrum to distill information from continuous streams of data. Streaming applications usually run in dedicated Stream Processing Engines (SPEs) that adopt the DataFlow model, which…
The advent of the Internet of Things (IoT) gives the opportunity to numerous devices to interact with their environment, collect and process data. Data are transferred, in an upwards mode, to the Cloud through the Edge Computing (EC)…
While Large Language Models (LLMs) excel at generalized reasoning, standard retrieval-augmented approaches fail to address the disconnected nature of long-term agentic memory. To bridge this gap, we introduce Synapse (Synergistic…
We investigate solutions to subgraph matching within a temporal stream of data. We present a high-level language for describing temporal subgraphs of interest, the Streaming Analytics Language (SAL). SAL programs are translated into C++…
As data volumes grow across applications, analytics of large amounts of data is becoming increasingly important. Big data processing frameworks such as Apache Hadoop, Apache AsterixDB, and Apache Spark have been built to meet this demand. A…
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire…
The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…
The explosive increase in volume, velocity, variety, and veracity of data generated by distributed and heterogeneous nodes such as IoT and other devices, continuously challenge the state of art in big data processing platforms and mining…
Real-world data from diverse domains require real-time scalable analysis. Large-scale data processing frameworks or engines such as Hadoop fall short when results are needed on-the-fly. Apache Spark's streaming library is increasingly…
This paper addresses online query processing for large-scale, incremental data analysis on a distributed stream processing engine (DSPE). Our goal is to convert any SQL-like query to an incremental DSPE program automatically. In contrast to…
Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory…
In the near future, massively parallel computing systems will be necessary to solve computation intensive applications. The key bottleneck in massively parallel implementation of numerical algorithms is the synchronization of data across…
Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks…
A key need in different disciplines is to perform analytics over fast-paced data streams, similar in nature to the traditional OLAP analytics in relational databases i.e., with filters and aggregates. Storing unbounded streams, however, is…