English
Related papers

Related papers: Tupleware: Redefining Modern Analytics

200 papers

Analytics on personal data, such as individuals' mobility, financial, and health data can be of significant benefit to society. Such data is already collected by smartphones, apps and services today, but liberal societies have so far…

Topological Data Analysis (TDA) is a recent approach to analyze data sets from the perspective of their topological structure. Its use for time series data has been limited. In this work, a system developed for a leading provider of cloud…

Machine Learning · Computer Science 2020-09-09 Rodrigo Rivera-Castro , Aleksandr Pletnev , Polina Pilyugina , Grecia Diaz , Ivan Nazarov , Wanyi Zhu , Evgeny Burnaev

The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems…

Machine Learning · Statistics 2019-06-24 Robin Vogel , Aurélien Bellet , Stephan Clémençon , Ons Jelassi , Guillaume Papa

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these…

Performance · Computer Science 2016-12-13 Patrick Dreher , Chansup Byun , Chris Hill , Vijay Gadepally , Bradley Kuszmaul , Jeremy Kepner

For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and…

Databases · Computer Science 2025-08-05 Bowen Wu , Wei Cui , Carlo Curino , Matteo Interlandi , Rathijit Sen

As new technologies move to the fore, our understanding of the world may seem to have shrunk in comparison, for despite new developments in research, much of it is reduced or rather, abstracted for marketability. Thus, the purpose of this…

Computers and Society · Computer Science 2017-01-24 Katherine Hughes

There is an increasing interest in executing complex analyses over large graphs, many of which require processing a large number of multi-hop neighborhoods or subgraphs. Examples include ego network analysis, motif counting, personalized…

Databases · Computer Science 2015-10-01 Abdul Quamar , Amol Deshpande , Jimmy Lin

Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datasets grow and span multiple related targets, there is an increasing need to exploit shared task…

Machine Learning · Computer Science 2025-11-14 Dimitrios Sinodinos , Jack Yi Wei , Narges Armanfard

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs,…

Hardware Architecture · Computer Science 2023-11-15 Diego Sanmartín Carrión , Vera Prohaska

Cloud data centers are evolving fast. At the same time, today's large-scale data analytics applications require non-trivial performance tuning that is often specific to the applications, workloads, and data center infrastructure. We propose…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Qizhen Zhang , Jiacheng Wu , Ang Chen , Vincent Liu , Boon Thau Loo

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao

Data lakes have emerged as a flexible and scalable solution for storing and analyzing large volumes of heterogeneous data, including structured, semi-structured, and unstructured formats. Despite their growing adoption in both industry and…

Databases · Computer Science 2026-01-28 Yi Lyu , Pei-Chieh Lo , Natan Lidukhover

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

A large number of cloud middleware platforms and tools are deployed to support a variety of Internet of Things (IoT) data analytics tasks. It is a common practice that such cloud platforms are only used by its owners to achieve their…

Networking and Internet Architecture · Computer Science 2016-06-28 Prem Prakash Jayaraman , Charith Perera , Dimitrios Georgakopoulos , Schahram Dustdar , Dhavalkumar Thakker , Rajiv Ranjan

Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. The new panchromatic, synoptic sky surveys require advanced tools for discovering patterns and trends hidden…

As software systems increase in complexity, conventional monitoring methods struggle to provide a comprehensive overview or identify performance issues, often missing unexpected problems. Observability, however, offers a holistic approach,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-29 Bartosz Balis , Konrad Czerepak , Albert Kuzma , Jan Meizner , Lukasz Wronski

As more and more users begin to use the cloud for their computing needs, datacenter operators are increasingly pressed to effectively allocate their resources among these client users. Yet while much work has been done in this area,…

Computers and Society · Computer Science 2012-12-11 Carlee Joe-Wong , Soumya Sen

The analyst effort in data cleaning is gradually shifting away from the design of hand-written scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper-parameter tuning for data cleaning is very different…

Databases · Computer Science 2019-05-08 Sanjay Krishnan , Eugene Wu

Matrix is a new message-oriented data synchronization middleware, used as a federated platform for near real-time decentralized applications. It features a novel approach for inter-server communication based on synchronizing message history…

Networking and Internet Architecture · Computer Science 2019-12-02 Florian Jacob , Jan Grashöfer , Hannes Hartenstein

In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Galip Aydin , Ibrahim Riza Hallac
‹ Prev 1 2 3 10 Next ›