English
Related papers

Related papers: Scalable Formal Concept Analysis algorithm for lar…

200 papers

With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular,…

Databases · Computer Science 2017-11-28 Anand Gupta , Hardeo Thakur , Ritvik Shrivastava , Pulkit Kumar , Sreyashi Nag

This document was created in order to study the algorithms for the categorization of phrases and rank them using the facilities provided by the framework Apache Spark. Starting from the study illustrated in the publication "Classifying…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-22 Marco Covelli , Massimiliano Morrelli

While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-10-10 Biao Xu , Ruairí de Fréin , Eric Robson , Mícheál Ó Foghlú

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on…

In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has…

Computation and Language · Computer Science 2019-12-03 Alexey Svyatkovskiy , Kosuke Imai , Mary Kroeger , Yuki Shiraito

There is increasing focus on analyzing data represented as hypergraphs, which are better able to express complex relationships amongst entities than are graphs. Much of the critical information about hypergraph structure is available only…

Data Structures and Algorithms · Computer Science 2023-07-24 Michael G. Rawson , Audun Myers , Robert Green , Michael Robinson , Cliff Joslyn

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…

Databases · Computer Science 2015-07-10 Olivier Curé , Hubert Naacke , Mohamed-Amine Baazizi , Bernd Amann

Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that…

Social and Information Networks · Computer Science 2025-10-30 Wenqing Lin

Programming systems incorporating aspects of functional programming, e.g., higher-order functions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques…

Programming Languages · Computer Science 2016-02-12 Philipp Haller , Heather Miller

Graphs, consisting of vertices and edges, are vital for representing complex relationships in fields like social networks, finance, and blockchain. Visualizing these graphs helps analysts identify structural patterns, with readability…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 Sanggeon Yun

Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing…

Machine Learning · Computer Science 2017-03-28 Alexander Ulanov , Andrey Simanovsky , Manish Marwah

Modern distributed data processing systems struggle to balance performance, maintainability, and developer productivity when integrating machine learning at scale. These challenges intensify in large collaborative environments due to high…

This paper presents a Spark-based modular LangGraph framework, designed to enhance machine learning workflows through scalability, visualization, and intelligent process optimization. At its core, the framework introduces Agent AI, a…

Artificial Intelligence · Computer Science 2024-12-09 Jialin Wang , Zhihua Duan

To process data more efficiently, big data frameworks provide data abstractions to developers. However, due to the abstraction, there may be many challenges for developers to understand and debug the data processing code. To uncover the…

Software Engineering · Computer Science 2021-03-29 Zehao Wang

Automatic Term Recognition is used to extract domain-specific terms that belong to a given domain. In order to be accurate, these corpus and language-dependent methods require large volumes of textual data that need to be processed to…

Computation and Language · Computer Science 2023-05-29 Ciprian-Octavian Truică , Neculai-Ovidiu Istrate , Elena-Simona Apostol

Network embedding is an important step in many different computations based on graph data. However, existing approaches are limited to small or middle size graphs with fewer than a million edges. In practice, web or social network graphs…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-09 Sara Riazi , Boyana Norris

The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of…

Artificial Intelligence · Computer Science 2021-01-14 Stephen Clark , Alexander Lerchner , Tamara von Glehn , Olivier Tieleman , Richard Tanburn , Misha Dashevskiy , Matko Bosnjak

Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets grow to hundreds of thousands or millions…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-01 Subhadip Mitra

The proliferation of mobile devices, such as smartphones and Internet of Things (IoT) gadgets, results in the recent mobile big data (MBD) era. Collecting MBD is unprofitable unless suitable analytics and learning methods are utilized for…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-16 Mohammad Abu Alsheikh , Dusit Niyato , Shaowei Lin , Hwee-Pink Tan , Zhu Han

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax. We introduce an…

‹ Prev 1 2 3 10 Next ›