Related papers: Efficient Iterative Programs with Distributed Data…

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs,…

Software Engineering · Computer Science 2021-08-06 Joao Batista de Souza Neto , Anamaria Martins Moreira , Genoveva Vargas-Solar , Martin A. Musicante

Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers

Iterative solvers are frequently used in scientific applications and engineering computations. However, the memory-bound Sparse Matrix-Vector (SpMV) kernel computation hinders the efficiency of iterative algorithms. As modern hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-08 Jianhua Gao , Jiayuan Shen , Yuxiang Zhang , Weixing Ji , Hua Huang

Towards scalable pattern-based optimization for dense linear algebra

Linear algebraic expressions are the essence of many computationally intensive problems, including scientific simulations and machine learning applications. However, translating high-level formulations of these expressions to efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-22 Dániel Berényi , András Leitereg , Gábor Lehel

Monotonic Properties of Completed Aggregates in Recursive Queries

The use of aggregates in recursion enables efficient and scalable support for a wide range of BigData algorithms, including those used in graph applications, KDD applications, and ML applications, which have proven difficult to be expressed…

Databases · Computer Science 2019-10-22 Carlo Zaniolo , Ariyam Das , Jiaqi Gu , Youfu Li , Mingda li , Jin Wang

Scalable data abstractions for distributed parallel computations

The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…

Programming Languages · Computer Science 2012-10-04 James Hanlon , Simon J. Hollis , David May

Scalable Querying of Nested Data

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform…

Databases · Computer Science 2020-11-13 Jaclyn Smith , Michael Benedikt , Milos Nikolic , Amir Shaikhha

Generalized Optimal Classification Trees: A Mixed-Integer Programming Approach

Global optimization of decision trees is a long-standing challenge in combinatorial optimization, yet such models play an important role in interpretable machine learning. Although the problem has been investigated for several decades, only…

Machine Learning · Computer Science 2026-02-03 Jiancheng Tu , Wenqi Fan , Zhibin Wu

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

Optimizing Automata Learning via Monads

Automata learning has been successfully applied in the verification of hardware and software. The size of the automaton model learned is a bottleneck for scalability, and hence optimizations that enable learning of compact representations…

Formal Languages and Automata Theory · Computer Science 2019-11-04 Gerco van Heerdt , Matteo Sammartino , Alexandra Silva

Fast Multilevel Support Vector Machines

Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that…

Machine Learning · Statistics 2014-10-14 Talayeh Razzaghi , Ilya Safro

Repr Types: One Abstraction to Rule Them All

The choice of how to represent an abstract type can have a major impact on the performance of a program, yet mainstream compilers cannot perform optimizations at such a high level. When dealing with optimizations of data type…

Programming Languages · Computer Science 2024-09-13 Viktor Palmkvist , Anders Ågren Thuné , Elias Castegren , David Broman

Distributed Optimization with Arbitrary Local Solvers

With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on…

Machine Learning · Computer Science 2016-08-04 Chenxin Ma , Jakub Konečný , Martin Jaggi , Virginia Smith , Michael I. Jordan , Peter Richtárik , Martin Takáč

BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion

In the past, the semantic issues raised by the non-monotonic nature of aggregates often prevented their use in the recursive statements of logic programs and deductive databases. However, the recently introduced notion of Pre-mappability…

Logic in Computer Science · Computer Science 2019-09-19 Ariyam Das , Youfu Li , Jin Wang , Mingda Li , Carlo Zaniolo

Towards Verified Compilation of Floating-point Optimization in Scientific Computing Programs

Scientific computing programs often undergo aggressive compiler optimization to achieve high performance and efficient resource utilization. While performance is critical, we also need to ensure that these optimizations are correct. In this…

Programming Languages · Computer Science 2025-09-12 Mohit Tekriwal , John Sarracino

Multi-objective integer programming: An improved recursive algorithm

This paper introduces an improved recursive algorithm to generate the set of all nondominated objective vectors for the Multi-Objective Integer Programming (MOIP) problem. We significantly improve the earlier recursive algorithm of \"Ozlen…

Optimization and Control · Mathematics 2014-03-25 Melih Ozlen , Benjamin A. Burton , Cameron A. G. MacRae

SPUDD: Stochastic Planning using Decision Diagrams

Markov decisions processes (MDPs) are becoming increasing popular as models of decision theoretic planning. While traditional dynamic programming methods perform well for problems with small state spaces, structured methods are needed for…

Artificial Intelligence · Computer Science 2013-01-30 Jesse Hoey , Robert St-Aubin , Alan Hu , Craig Boutilier

FedSplit: An algorithmic framework for fast federated optimization

Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past…

Machine Learning · Computer Science 2020-05-12 Reese Pathak , Martin J. Wainwright

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Scaling Datalog for Machine Learning on Big Data

In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for…

Databases · Computer Science 2012-03-05 Yingyi Bu , Vinayak Borkar , Michael J. Carey , Joshua Rosen , Neoklis Polyzotis , Tyson Condie , Markus Weimer , Raghu Ramakrishnan

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2017-04-18 Alexandros Nathan , Diego Klabjan