Related papers: Benchmarking Declarative Approximate Selection Pre…

Declarative Data Analytics: a Survey

The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize…

Databases · Computer Science 2019-02-05 Nantia Makrynioti , Vasilis Vassalos

Data Quality Principles in the Semantic Web

The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making,…

Digital Libraries · Computer Science 2013-05-20 Ahmad Assaf , Aline Senart

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Matching dependencies were recently introduced as declarative rules for data cleaning and entity resolution. Enforcing a matching dependency on a database instance identifies the values of some attributes for two tuples, provided that the…

Databases · Computer Science 2010-08-24 Leopoldo Bertossi , Solmaz Kolahi , Laks V. S. Lakshmanan

Declarative Machine Learning - A Classification of Basic Properties and Types

Declarative machine learning (ML) aims at the high-level specification of ML tasks or algorithms, and automatic generation of optimized execution plans from these specifications. The fundamental goal is to simplify the usage and/or…

Databases · Computer Science 2016-05-20 Matthias Boehm , Alexandre V. Evfimievski , Niketan Pansare , Berthold Reinwald

Feature Selection: A Data Perspective

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature…

Machine Learning · Computer Science 2018-08-28 Jundong Li , Kewei Cheng , Suhang Wang , Fred Morstatter , Robert P. Trevino , Jiliang Tang , Huan Liu

Predicate Abstraction with Indexed Predicates

Predicate abstraction provides a powerful tool for verifying properties of infinite-state systems using a combination of a decision procedure for a subset of first-order logic and symbolic methods originally developed for finite-state model…

Logic in Computer Science · Computer Science 2007-05-23 Shuvendu K. Lahiri , Randal E. Bryant

Document Retrieval using Predication Similarity

Document retrieval has been an important research problem over many years in the information retrieval community. State-of-the-art techniques utilize various methods in matching documents to a given document including keywords, phrases, and…

Information Retrieval · Computer Science 2016-04-21 Kalpa Gunaratna

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor…

Databases · Computer Science 2019-06-19 Shohedul Hasan , Saravanan Thirumuruganathan , Jees Augustine , Nick Koudas , Gautam Das

Feature Selection Using Classifier in High Dimensional Data

Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The…

Computer Vision and Pattern Recognition · Computer Science 2014-01-07 Vijendra Singh , Shivani Pathak

Preference Elicitation with Soft Attributes in Interactive Recommendation

Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to…

Information Retrieval · Computer Science 2023-11-07 Erdem Biyik , Fan Yao , Yinlam Chow , Alex Haig , Chih-wei Hsu , Mohammad Ghavamzadeh , Craig Boutilier

Declarative Statistical Modeling with Datalog

Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the…

Databases · Computer Science 2015-01-06 Vince Barany , Balder ten Cate , Benny Kimelfeld , Dan Olteanu , Zografoula Vagena

A Novel Metric for Measuring Data Quality in Classification Applications (extended version)

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Attribute Oriented Induction with simple select SQL statement

Searching learning or rules in relational database for data mining purposes with characteristic or classification/discriminant rule in attribute oriented induction technique can be quicker, easy, and simple with simple SQL statement. With…

Databases · Computer Science 2010-06-10 Spits Warnars

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many…

Machine Learning · Computer Science 2021-01-06 Hyeongmin Cho , Sangkyun Lee

Open Data Quality

The research discusses how (open) data quality could be described, what should be considered developing a data quality management solution and how it could be applied to open data to check its quality. The proposed approach focuses on…

Databases · Computer Science 2022-06-16 Anastasija Nikiforova

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

Predicates are foundational components in data analysis systems. However, modern workloads increasingly involve unstructured documents, which demands semantic understanding, beyond traditional value-based predicates. Given enormous…

Databases · Computer Science 2026-05-22 Hengrui Zhang , Yulong Hui , Yihao Liu , Huanchen Zhang

DsDm: Model-Aware Dataset Selection with Datamodels

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

We consider the problem of classification using similarity/distance functions over data. Specifically, we propose a framework for defining the goodness of a (dis)similarity function with respect to a given learning task and propose…

Machine Learning · Computer Science 2015-03-19 Purushottam Kar , Prateek Jain

Contexts and Data Quality Assessment

The quality of data is context dependent. Starting from this intuition and experience, we propose and develop a conceptual framework that captures in formal terms the notion of "context-dependent data quality". We start by proposing a…

Databases · Computer Science 2016-08-16 Leopoldo Bertossi , Flavio Rizzolo

Predictive Performance Comparison of Decision Policies Under Confounding

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing…

Machine Learning · Computer Science 2024-06-13 Luke Guerdan , Amanda Coston , Kenneth Holstein , Zhiwei Steven Wu