Related papers: Towards Linear Algebra over Normalized Data

Algebraic Machine Learning

Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to…

Machine Learning · Computer Science 2018-03-16 Fernando Martin-Maroto , Gonzalo G. de Polavieja

Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task

Statistics and Optimization are foundational to modern Machine Learning. Here, we propose an alternative foundation based on Abstract Algebra, with mathematics that facilitates the analysis of learning. In this approach, the goal of the…

Machine Learning · Computer Science 2025-02-28 Fernando Martin-Maroto , Nabil Abderrahaman , David Mendez , Gonzalo G. de Polavieja

Efficient Construction of Nonlinear Models over Normalized Data

Machine Learning (ML) applications are proliferating in the enterprise. Relational data which are prevalent in enterprise applications are typically normalized; as a result, data has to be denormalized via primary/foreign-key joins to be…

Machine Learning · Computer Science 2021-03-22 Zhaoyue Chen , Nick Koudas , Zhe Zhang , Xiaohui Yu

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods…

Artificial Intelligence · Computer Science 2025-11-25 Xixi Wang , Miguel Costa , Jordanka Kovaceva , Shuai Wang , Francisco C. Pereira

Lara: A Key-Value Algebra underlying Arrays and Relations

Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned…

Databases · Computer Science 2016-04-14 Dylan Hutchison , Bill Howe , Dan Suciu

How to get Rid of SQL, Relational Algebra, the Relational Model, ERM, and ORMs in a Single Paper -- A Thought Experiment

Without any doubt, the relational paradigm has been a huge success. At the same time, we believe that the time is ripe to rethink how database systems could look like if we designed them from scratch. Would we really end up with the same…

Databases · Computer Science 2025-04-18 Jens Dittrich

The Programming of Algebra

We present module theory and linear maps as a powerful generalised and computationally efficient framework for the relational data model, which underpins today's relational database systems. Based on universal constructions of modules we…

Programming Languages · Computer Science 2022-07-05 Fritz Henglein , Robin Kaarsgaard , Mikkel Kragh Mathiesen

$C^*$-Algebraic Machine Learning: Moving in a New Direction

Machine learning has a long collaborative tradition with several fields of mathematics, such as statistics, probability and linear algebra. We propose a new direction for machine learning research: $C^*$-algebraic ML $-$ a…

Machine Learning · Computer Science 2024-06-10 Yuka Hashimoto , Masahiro Ikeda , Hachem Kadri

RL4RLA: Teaching ML to Discover Randomized Linear Algebra Algorithms Through Curriculum Design and Graph-Based Search

Randomized linear algebra (RLA) algorithms are a modern class of numerical linear algebra techniques that play an essential role in scientific computing and machine learning, with broad and growing adoption. However, their discovery remains…

Machine Learning · Computer Science 2026-05-19 Jinglong Xiong , Xiaotian Liu , Ruoxin Wang , Zihang Liu , Yefan Zhou , Yujun Yan , Yaoqing Yang

Towards scalable pattern-based optimization for dense linear algebra

Linear algebraic expressions are the essence of many computationally intensive problems, including scientific simulations and machine learning applications. However, translating high-level formulations of these expressions to efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-22 Dániel Berényi , András Leitereg , Gábor Lehel

A Survey on Large-scale Machine Learning

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However,…

Machine Learning · Computer Science 2020-08-11 Meng Wang , Weijie Fu , Xiangnan He , Shijie Hao , Xindong Wu

Human-LLM Collaborative Feature Engineering for Tabular Data

Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance.…

Machine Learning · Computer Science 2026-01-30 Zhuoyan Li , Aditya Bansal , Jinzhao Li , Shishuang He , Zhuoran Lu , Mutian Zhang , Qin Liu , Yiwei Yang , Swati Jain , Ming Yin , Yunyao Li

Ilargi: a GPU Compatible Factorized ML Model Training Framework

The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct…

Machine Learning · Computer Science 2025-02-05 Wenbo Sun , Rihan Hai

Typed Linear Algebra for Efficient Analytical Querying

This paper uses typed linear algebra (LA) to represent data and perform analytical querying in a single, unified framework. The typed approach offers strong type checking (as in modern programming languages) and a diagrammatic way of…

Databases · Computer Science 2018-09-05 João M. Afonso , Gabriel D. Fernandes , João P. Fernandes , Filipe Oliveira , Bruno M. Ribeiro , Rogério Pontes , José N. Oliveira , Alberto J. Proença

Lifted Relational Algebra with Recursion and Connections to Modal Logic

We propose a new formalism for specifying and reasoning about problems that involve heterogeneous "pieces of information" -- large collections of data, decision procedures of any kind and complexity and connections between them. The essence…

Logic in Computer Science · Computer Science 2016-12-30 Eugenia Ternovska

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning,…

Computation and Language · Computer Science 2025-04-04 Md Mahadi Hasan Nahid , Davood Rafiei

Learning Models over Relational Data: A Brief Tutorial

This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research. The input to learning classification and…

Databases · Computer Science 2019-11-18 Maximilian Schleich , Dan Olteanu , Mahmoud Abo-Khamis , Hung Q. Ngo , XuanLong Nguyen

General Data Analytics with Applications to Visual Information Analysis: A Provable Backward-Compatible Semisimple Paradigm over T-Algebra

We consider a novel backward-compatible paradigm of general data analytics over a recently-reported semisimple algebra (called t-algebra). We study the abstract algebraic framework over the t-algebra by representing the elements of…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Liang Liao , Stephen John Maybank

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

Multi-Relational Algebra for Multi-Granular Data Analytics

In modern data analytics, analysts frequently face the challenge of searching for desirable entities by evaluating, for each entity, a collection of its feature relations to derive key analytical properties. This search is challenging…

Databases · Computer Science 2025-07-25 Xi Wu , Eugene Wu , Zichen Zhu , Fengan Li , Jeffrey F. Naughton