English
Related papers

Related papers: A New Framework for Expressing, Parallelizing and …

200 papers

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Recently, increasingly large amounts of data are generated from a variety of sources. Existing data processing technologies are not suitable to cope with the huge amounts of generated data. Yet, many research works focus on Big Data, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-07 Wissem Inoubli , Sabeur Aridhi , Haithem Mezni , Mondher Maddouri , Engelbert Mephu Nguifo

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these…

Performance · Computer Science 2016-12-13 Patrick Dreher , Chansup Byun , Chris Hill , Vijay Gadepally , Bradley Kuszmaul , Jeremy Kepner

Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-26 Nasser Ghadiri , Meysam Ghaffari , Mohammad Amin Nikbakht

The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Ahmet Uyar , Kaiying Shan , Hasara Maithree , Damitha Lenadora , Thejaka Amila Kanewala , Geoffrey Fox

This paper introduces ENFrame, a unified data processing platform for querying and mining probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as bounded-range loops, list comprehension,…

Databases · Computer Science 2013-09-03 Sebastiaan J. van Schaik , Dan Olteanu , Robert Fink

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…

Machine Learning · Computer Science 2024-03-28 Rustam Mussabayev , Ravil Mussabayev

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Data originating from the Web, sensor readings and social media result in increasingly huge datasets. The so called Big Data comes with new scientific and technological challenges while creating new opportunities, hence the increasing…

Artificial Intelligence · Computer Science 2020-02-19 Ilias Tachmazidis , Grigoris Antoniou , Wolfgang Faber

Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements. We present KeystoneML, a system that captures and…

Machine Learning · Computer Science 2016-11-01 Evan R. Sparks , Shivaram Venkataraman , Tomer Kaftan , Michael J. Franklin , Benjamin Recht

Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity…

Databases · Computer Science 2019-08-20 Phanwadee Sinthong , Michael J. Carey

HEP-Frame is a new C++ package designed to efficiently perform analyses of data sets from a very large number of events, like those available at the Large Hadron Collider (LHC) at CERN, Geneva. It mainly targets high performance servers and…

High Energy Physics - Experiment · Physics 2023-03-10 A. Pereira , A. Onofre , A. Proenca

In this paper, we draw the specifications of a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-24 Jaume Ferrarons , Mulu Adhana , Carlos Colmenares , Sandra Pietrowska , Fadila Bentayeb , Jérôme Darmont

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay

The k-means algorithm is one of the most common clustering algorithms and widely used in data mining and pattern recognition. The increasing computational requirement of big data applications makes hardware acceleration for the k-means…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-23 Zhehao Li , Jifang Jin , Lingli Wang

Several recently devised machine learning (ML) algorithms have shown improved accuracy for various predictive problems. Model searches, which explore to find an optimal ML algorithm and hyperparameter values for the target problem, play a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-28 Yoshiki Takahashi , Masato Asahara , Kazuyuki Shudo

C is the lingua franca of programming and almost any device can be programmed using C. However, programming mod-ern heterogeneous architectures such as multi-core CPUs and GPUs requires explicitly expressing parallelism as well as…

Utilizing large language models (LLMs) for document reranking has been a popular and promising research direction in recent years, many studies are dedicated to improving the performance and efficiency of using LLMs for reranking. Besides,…

Information Retrieval · Computer Science 2025-04-11 Qi Liu , Haozhe Duan , Yiqun Chen , Quanfeng Lu , Weiwei Sun , Jiaxin Mao

Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems expose the user-friendly "think like a vertex" programming interface to users, and exhibit good horizontal…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-26 Da Yan , James Cheng , M. Tamer Özsu , Fan Yang , Yi Lu , John C. S. Lui , Qizhen Zhang , Wilfred Ng
‹ Prev 1 2 3 10 Next ›