Related papers: Data accounting and error counting

BugDoc: Algorithms to Debug Computational Processes

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities…

Databases · Computer Science 2020-04-15 Raoni Lourenço , Juliana Freire , Dennis Shasha

Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis

In many massively parallel data management platforms, programs are represented as small imperative pieces of code connected in a data flow. This popular abstraction makes it hard to apply algebraic reordering techniques employed by…

Databases · Computer Science 2013-01-18 Fabian Hueske , Aljoscha Krettek , Kostas Tzoumas

Statistical Validity and Consistency of Big Data Analytics: A General Framework

Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making…

Databases · Computer Science 2018-03-30 Bikram Karmakar , Indranil Mukhopadhyay

Algebra of Data Reconciliation

With distributed computing and mobile applications becoming ever more prevalent, synchronizing diverging replicas of the same data is a common problem. Reconciliation -- bringing two replicas of the same data structure as close as possible…

Information Theory · Computer Science 2022-08-10 Elod P. Csirmaz , Laszlo Csirmaz

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or…

Artificial Intelligence · Computer Science 2022-05-11 Sandeep Hans , Diptikalyan Saha , Aniya Aggarwal

Amortized Analysis via Coalgebra

Amortized analysis is a cost analysis technique for data structures in which cost is studied in aggregate: rather than considering the maximum cost of a single operation, one bounds the total cost encountered throughout a session.…

Programming Languages · Computer Science 2024-12-18 Harrison Grodin , Robert Harper

Algebraic File Synchronization: Adequacy and Completeness

With distributed computing and mobile applications, synchronizing diverging replicas of data structures is a more and more common problem. We use algebraic methods to reason about filesystem operations, and introduce a simplified definition…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-24 Elod Pal Csirmaz

On Data Analysis Pipelines and Modular Bayesian Modeling

The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is…

Methodology · Statistics 2024-02-19 Erin Lipman , Abel Rodriguez

Exactly mergeable summaries

In the analysis of large/big data sets, aggregation (replacing values of a variable over a group by a single value) is a standard way of reducing the size (complexity) of the data. Data analysis programs provide different aggregation…

Machine Learning · Computer Science 2023-03-29 Vladimir Batagelj

Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration

While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented…

Machine Learning · Computer Science 2025-07-01 Chen Zhang

Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression

Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated…

Methodology · Statistics 2022-09-15 Mahsa Taheri , Néhémy Lim , Johannes Lederer

Fighting Accounting Fraud Through Forensic Data Analytics

Accounting fraud is a global concern representing a significant threat to the financial system stability due to the resulting diminishing of the market confidence and trust of regulatory authorities. Several tricks can be used to commit…

Machine Learning · Statistics 2018-05-09 Maria Jofre , Richard Gerlach

Process Algebra as Abstract Data Types

In this paper we introduced an algebraic semantics for process algebra in form of abstract data types. For that purpose, we developed a particular type of algebra, the seed algebra, which describes exactly the behavior of a process within a…

Programming Languages · Computer Science 2010-01-08 Ruqian Lu , Lixing Li , Yun Shang , Xiaoyu Li

Repairing Inconsistent Databases: A Model-Theoretic Approach and Abductive Reasoning

In this paper we consider two points of views to the problem of coherent integration of distributed data. First we give a pure model-theoretic analysis of the possible ways to `repair' a database. We do so by characterizing the…

Logic in Computer Science · Computer Science 2007-05-23 Ofer Arieli , Marc Denecker , Bert Van Nuffelen , Maurice Bruynooghe

Support for Debugging Automatically Parallelized Programs

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the…

Software Engineering · Computer Science 2007-05-23 Robert Hood , Gabriele Jost

Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task

Statistics and Optimization are foundational to modern Machine Learning. Here, we propose an alternative foundation based on Abstract Algebra, with mathematics that facilitates the analysis of learning. In this approach, the goal of the…

Machine Learning · Computer Science 2025-02-28 Fernando Martin-Maroto , Nabil Abderrahaman , David Mendez , Gonzalo G. de Polavieja

Contributions to Biclustering of Microarray Data Using Formal Concept Analysis

Biclustering is an unsupervised data mining technique that aims to unveil patterns (biclusters) from gene expression data matrices. In the framework of this thesis, we propose new biclustering algorithms for microarray data. The latter is…

Machine Learning · Computer Science 2018-11-26 Amina Houari

Process Analytics -- Data-driven Business Process Management

Data-driven analysis of business processes has a long tradition in research. However, recently the term of process mining is mostly used when referring to data-driven process analysis. As a consequence, awareness for the many facets of…

Software Engineering · Computer Science 2025-12-25 Matthias Stierle , Karsten Kraume , Martin Matzner

More Software Analytics Patterns: Broad-Spectrum Diagnostic and Embedded Improvements

Software analytics is a data-driven approach to decision making, which allows software practitioners to leverage valuable insights from data about software to achieve higher development process productivity and improve different aspects of…

Software Engineering · Computer Science 2022-01-12 Duarte Oliveira , João Fidalgo , Joelma Choma , Eduardo Guerra , Filipe Correia

Relational Expressions for Data Transformation and Computation

Separate programming models for data transformation (declarative) and computation (procedural) impact programmer ergonomics, code reusability and database efficiency. To eliminate the necessity for two models or paradigms, we propose a…

Databases · Computer Science 2023-11-09 David Robert Pratten , Luke Mathieson