English
Related papers

Related papers: LocationSpark: In-memory Distributed Spatial Query…

200 papers

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of…

Databases · Computer Science 2021-11-10 Yongyang Yu , Mingjie Tang , Walid G. Aref

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Odej Kao

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to…

Machine Learning · Statistics 2016-12-13 Shen-Yi Zhao , Ru Xiang , Ying-Hao Shi , Peng Gao , Wu-Jun Li

Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enabled devices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and…

Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. However, many of these tools rely on dynamic schedulers optimized for abstract task graphs, which often…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-14 Melih Elibol , Vinamra Benara , Samyu Yagati , Lianmin Zheng , Alvin Cheung , Michael I. Jordan , Ion Stoica

Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…

Databases · Computer Science 2014-04-01 Minos Garofalakis , Yannis Ioannidis

The number of mobile devices (e.g., smartphones, wearable technologies) is rapidly growing. In line with this trend, a massive amount of spatial data is being collected since these devices allow users to geo-tag user-generated content.…

Databases · Computer Science 2016-12-20 Afsin Akdogan

Trajectory similarity computation is a fundamental component in a variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances in mobile devices have enabled an unprecedented…

Databases · Computer Science 2021-01-27 Bolong Zheng , Lianggui Weng , Xi Zhao , Kai Zeng , Xiaofang Zhou , Christian S. Jensen

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Increasing need for large-scale data analytics in a number of application domains has led to a dramatic rise in the number of distributed data management systems, both parallel relational databases, and systems that support alternative…

Databases · Computer Science 2013-02-19 K. Ashwin Kumar , Amol Deshpande , Samir Khuller

Scalable ordered maps must ensure that range queries, which operate over many consecutive keys, provide intuitive semantics (e.g., linearizability) without degrading the performance of concurrent insertions and removals. These goals are…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-11 Matthew Rodriguez , Vitaly Aksenov , Michael Spear

Spark is an in-memory analytics platform that targets commodity server environments today. It relies on the Hadoop Distributed File System (HDFS) to persist intermediate checkpoint states and final processing results. In Spark, immutable…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-22 Mijung Kim , Jun Li , Haris Volos , Manish Marwah , Alexander Ulanov , Kimberly Keeton , Joseph Tucek , Lucy Cherkasova , Le Xu , Pradeep Fernando

Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance, cost, and coordination overhead emerge…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Shahir Abdullah , Syed Rohit Zaman

Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that…

Databases · Computer Science 2019-06-18 Alekh Jindal , Lalitha Viswanathan , Konstantinos Karanasos

With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-17 Shanjiang Tang , Bingsheng He , Ce Yu , Yusen Li , Kun Li

Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial…

Databases · Computer Science 2023-04-17 Yufan Sheng , Xin Cao , Yixiang Fang , Kaiqi Zhao , Jianzhong Qi , Gao Cong , Wenjie Zhang

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional…

Databases · Computer Science 2022-10-14 Elham Azhir , Mehdi Hosseinzadeh , Faheem Khan , Amir Mosavi

The importance of geo-spatial data in critical applications such as emergency response, transportation, agriculture etc., has prompted the adoption of recent GeoSPARQL standard in many RDF processing engines. In addition to large…

Databases · Computer Science 2017-10-23 Jyoti Leeka , Srikanta Bedathur , Debajyoti Bera , Sriram Lakshminarasimhan
‹ Prev 1 2 3 10 Next ›