English
Related papers

Related papers: Automatic String Data Validation with Pattern Disc…

200 papers

Complex data pipelines are increasingly common in diverse applications such as BI reporting and ML modeling. These pipelines often recur regularly (e.g., daily or weekly), as BI reports need to be refreshed, and ML models need to be…

Databases · Computer Science 2021-04-14 Jie Song , Yeye He

We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity…

Large language models (LLMs) are being increasingly deployed as part of pipelines that repeatedly process or generate data of some sort. However, a common barrier to deployment are the frequent and often unpredictable errors that plague…

Background: Data errors are a common challenge in machine learning (ML) projects and generally cause significant performance degradation in ML-enabled software systems. To ensure early detection of erroneous data and avoid training ML…

Software Engineering · Computer Science 2021-03-09 Lucy Ellen Lwakatare , Ellinor Rånge , Ivica Crnkovic , Jan Bosch

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications. Crucially, these pipelines are \emph{recurring} (e.g., daily or hourly) in production settings…

Databases · Computer Science 2023-06-06 Dezhan Tu , Yeye He , Weiwei Cui , Song Ge , Haidong Zhang , Han Shi , Dongmei Zhang , Surajit Chaudhuri

Machine learning (ML) models in production pipelines are frequently retrained on the latest partitions of large, continually-growing datasets. Due to engineering bugs, partitions in such datasets almost always have some corrupted features;…

Databases · Computer Science 2023-03-13 Shreya Shankar , Labib Fawaz , Karl Gyllstrom , Aditya G. Parameswaran

Uncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports…

Artificial Intelligence · Computer Science 2016-07-21 Jiangang Ma , Le Sun , Hua Wang , Yanchun Zhang , Uwe Aickelin

Modern web dashboards and enterprise applications increasingly rely on complex, distributed microservices architectures. While these architectures offer scalability, they introduce significant challenges in debugging and observability. When…

Software Engineering · Computer Science 2026-02-18 Devendra Tata , Mona Rajhans

With the advent of large-scale heterogeneous search engines comes the problem of unified search control resulting in mismatches that could have otherwise avoided. A mechanism is needed to determine exact patterns in web mining and…

Cryptography and Security · Computer Science 2019-01-28 Nazim Uddin Sheikh , Hasina Rahman , Hamid Al-Qahtani

An anomaly detection method based on deep autoencoders is proposed to address anomalies that often occur in enterprise-level ETL data streams. The study first analyzes multiple types of anomalies in ETL processes, including delays, missing…

Machine Learning · Computer Science 2025-11-04 Xin Chen , Saili Uday Gadgil , Kangning Gao , Yi Hu , Cong Nie

Users around the world rely on software-intensive systems in their day-to-day activities. These systems regularly contain bugs and security vulnerabilities. To facilitate bug fixing, data-driven models of automatic program repair use pairs…

Software Engineering · Computer Science 2022-02-08 Anastasiia Grishina

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola

The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications…

Databases · Computer Science 2020-04-23 Visweswara Sai Prashanth Dintyala , Arpit Narechania , Joy Arulraj

Maintaining software artifacts is among the hardest tasks an engineer faces. Like any other piece of code, model transformations developed by engineers are also subject to maintenance. To facilitate the comprehension of programs, software…

Software Engineering · Computer Science 2020-10-13 Chihab eddine Mokaddem , Houari Sahraoui , Eugene Syriani

The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So,…

Machine Learning · Computer Science 2021-12-16 Atindriyo Sanyal , Vikram Chatterji , Nidhi Vyas , Ben Epstein , Nikita Demir , Anthony Corletti

Development of new machine learning models is typically done on manually curated data sets, making them unsuitable for evaluating the models' performance during operations, where the evaluation needs to be performed automatically on…

Machine Learning · Computer Science 2021-10-15 Awalin Sopan , Konstantin Berlin

In the era of big data, ensuring the quality of datasets has become increasingly crucial across various domains. We propose a comprehensive framework designed to automatically assess and rectify data quality issues in any given dataset,…

Databases · Computer Science 2024-09-17 Djibril Sarr

Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, self-learning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event…

Machine Learning · Computer Science 2023-05-16 Max Landauer , Sebastian Onder , Florian Skopik , Markus Wurzenberger

Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major…

Machine Learning · Computer Science 2020-11-11 Sayan Chakraborty , Smit Shah , Kiumars Soltani , Anna Swigart , Luyao Yang , Kyle Buckingham

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all…

Databases · Computer Science 2019-01-08 Yeounoh Chung , Tim Kraska , Neoklis Polyzotis , Ki Hyun Tae , Steven Euijong Whang
‹ Prev 1 2 3 10 Next ›