English
Related papers

Related papers: Scalable Relational Query Processing on Big Matrix…

200 papers

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques…

Databases · Computer Science 2019-07-17 Mingjie Tang , Yongyang Yu , Walid G. Aref , Ahmed R. Mahmood , Qutaibah M. Malluhi , Mourad Ouzzani

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Odej Kao

The relational data model was designed to facilitate large-scale data management and analytics. We consider the problem of how to differentiate computations expressed relationally. We show experimentally that a relational engine running an…

Machine Learning · Computer Science 2023-06-08 Yuxin Tang , Zhimin Ding , Dimitrije Jankov , Binhang Yuan , Daniel Bourgeois , Chris Jermaine

We describe matrix computations available in the cluster programming framework, Apache Spark. Out of the box, Spark provides abstractions and implementations for distributed matrices and optimization routines using these matrices. When…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-07-14 Reza Bosagh Zadeh , Xiangrui Meng , Aaron Staple , Burak Yavuz , Li Pu , Shivaram Venkataraman , Evan Sparks , Alexander Ulanov , Matei Zaharia

Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the…

Databases · Computer Science 2018-12-11 Zhiwei Fan , Jianqiao Zhu , Zuyu Zhang , Aws Albarghouthi , Paraschos Koutris , Jignesh Patel

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax. We introduce an…

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Lauritz Thamsen , Dominik Scheinert , Jonathan Will , Jonathan Bader , Odej Kao

As RDF becomes more widely established and the amount of linked data is rapidly increasing, the efficient querying of large amount of data becomes a significant challenge. In this paper, we propose a family of algorithms for querying large…

Databases · Computer Science 2022-09-13 Eleftherios Kalogeros , Manolis Gergatsoulis , Matthew Damigos , Christos Nomikos

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

Modern distributed data processing systems struggle to balance performance, maintainability, and developer productivity when integrating machine learning at scale. These challenges intensify in large collaborative environments due to high…

With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-17 Shanjiang Tang , Bingsheng He , Ce Yu , Yusen Li , Kun Li

The growth of big data in domains such as Earth Sciences, Social Networks, Physical Sciences, etc. has lead to an immense need for efficient and scalable linear algebra operations, e.g. Matrix inversion. Existing methods for efficient and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Chandan Misra , Sourangshu Bhattacharya , Soumya K. Ghosh

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted to various "big data" problems. Query processing is one of them and…

Databases · Computer Science 2016-11-04 Hubert Naacke , Olivier Curé , Bernd Amann

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache…

Databases · Computer Science 2015-04-02 E. Preston Carman , Till Westmann , Vinayak R. Borkar , Michael J. Carey , Vassilis J. Tsotras

The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which…

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional…

Databases · Computer Science 2022-10-14 Elham Azhir , Mehdi Hosseinzadeh , Faheem Khan , Amir Mosavi

Large language models (LLMs) have become essential for applications such as text summarization, sentiment analysis, and automated question-answering. Recently, LLMs have also been integrated into relational database management systems to…

Databases · Computer Science 2025-08-29 Kerem Akillioglu , Anurag Chakraborty , Sairaj Voruganti , M. Tamer Özsu

The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-13 Youssef Bassil

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Zihan Wu , Zhaoke Huang , Hong Yan
‹ Prev 1 2 3 10 Next ›