Related papers: Abstract Interpretation-Based Data Leakage Static …

Data Leakage in Notebooks: Static Detection and Better Processes

Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly…

Software Engineering · Computer Science 2022-09-08 Chenyang Yang , Rachel A Brower-Sinning , Grace A. Lewis , Christian Kästner

Data Leakage in Visual Datasets

We analyze data leakage in visual datasets. Data leakage refers to images in evaluation benchmarks that have been seen during training, compromising fair model evaluation. Given that large-scale datasets are often sourced from the internet,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Patrick Ramos , Ryan Ramos , Noa Garcia

Data Leakage in Automotive Perception: Practitioners' Insights

Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive…

Cryptography and Security · Computer Science 2026-04-09 Md Abu Ahammed Babu , Sushant Kumar Pandey , Darko Durisic , Andras Balint , Miroslaw Staron

LeakageDetector 2.0: Analyzing Data Leakage in Jupyter-Driven Machine Learning Pipelines

In software development environments, code quality is crucial. This study aims to assist Machine Learning (ML) engineers in enhancing their code by identifying and correcting Data Leakage issues within their models. Data Leakage occurs when…

Software Engineering · Computer Science 2025-09-22 Owen Truong , Terrence Zhang , Arnav Marchareddy , Ryan Lee , Jeffery Busold , Michael Socas , Eman Abdullah AlOmar

Learning a Static Analyzer from Data

To be practically useful, modern static analyzers must precisely model the effect of both, statements in the programming language as well as frameworks used by the program under analysis. While important, manually addressing these…

Programming Languages · Computer Science 2017-06-27 Pavol Bielik , Veselin Raychev , Martin Vechev

From Data Leak to Secret Misses: The Impact of Data Leakage on Secret Detection Models

Machine learning models are increasingly used for software security tasks. These models are commonly trained and evaluated on large Internet-derived datasets, which often contain duplicated or highly similar samples. When such samples are…

Cryptography and Security · Computer Science 2026-02-02 Farnaz Soltaniani , Mohammad Ghafari

Packet flow analysis in IP networks via abstract interpretation

Static analysis (aka offline analysis) of a model of an IP network is useful for understanding, debugging, and verifying packet flow properties of the network. There have been static analysis approaches proposed in the literature for…

Networking and Internet Architecture · Computer Science 2011-11-30 Raghavan Komondoor , K. Vasanta Lakshmi , Deva P. Seetharam , Sudha Balodia

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Machine Learning (ML) has revolutionized various domains, offering predictive capabilities in several areas. However, with the increasing accessibility of ML tools, many practitioners, lacking deep ML expertise, adopt a "push the button"…

Machine Learning · Computer Science 2025-08-21 Andrea Apicella , Francesco Isgrò , Roberto Prevete

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

Monitoring and analyzing process traces is a critical task for modern companies and organizations. In scenarios where there is a gap between trace events and reference business activities, this entails an interpretation problem, amounting…

Artificial Intelligence · Computer Science 2026-05-26 Bettina Fazzinga , Sergio Flesca , Filippo Furfaro , Luigi Pontieri , Francesco Scala

Checkification: A Practical Approach for Testing Static Analysis Truths

Static analysis is an essential component of many modern software development tools. Unfortunately, the ever-increasing complexity of static analyzers makes their coding error-prone. Even analysis tools based on rigorous mathematical…

Software Engineering · Computer Science 2025-05-08 Daniela Ferreiro , Ignacio Casso , Jose F. Morales , Pedro López-García , Manuel V. Hermenegildo

Automating Abstract Interpretation of Abstract Machines

Static program analysis is a valuable tool for any programming language that people write programs in. The prevalence of scripting languages in the world suggests programming language interpreters are relatively easy to write. Users of…

Programming Languages · Computer Science 2015-05-01 James Ian Johnson

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

Large Language Models (LLMs) are increasingly trusted to perform automated code review and static analysis at scale, supporting tasks such as vulnerability detection, summarization, and refactoring. In this paper, we identify and exploit a…

Machine Learning · Computer Science 2025-12-19 Shir Bernstein , David Beste , Daniel Ayzenshteyn , Lea Schonherr , Yisroel Mirsky

Leakage and Interpretability in Concept-Based Models

Concept-based Models aim to improve interpretability by predicting high-level intermediate concepts, representing a promising approach for deployment in high-risk scenarios. However, they are known to suffer from information leakage,…

Machine Learning · Computer Science 2026-03-25 Enrico Parisini , Tapabrata Chakraborti , Chris Harbron , Ben D. MacArthur , Christopher R. S. Banerji

A Static Analyzer for Large Safety-Critical Software

We show that abstract interpretation-based static program analysis can be made efficient and precise enough to formally verify a class of properties for a family of large programs with few or no false alarms. This is achieved by refinement…

Programming Languages · Computer Science 2016-08-14 Bruno Blanchet , Patrick Cousot , Radhia Cousot , Jerôme Feret , Laurent Mauborgne , Antoine Miné , David Monniaux , Xavier Rival

Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models

Semantic leakage is a phenomenon recently introduced by Gonen et al. (2024). It refers to a situation in which associations learnt from the training data emerge in language model generations in an unexpected and sometimes undesired way.…

Computation and Language · Computer Science 2025-01-14 Veronika Smilga

Context-Sensitive Abstract Interpretation of Dynamic Languages

There is a vast gap in the quality of IDE tooling between static languages like Java and dynamic languages like Python or JavaScript. Modern frameworks and libraries in these languages heavily use their dynamic capabilities to achieve the…

Programming Languages · Computer Science 2024-02-01 Franciszek Piszcz

Interactive Abstract Interpretation: Reanalyzing Whole Programs for Cheap

To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To…

Programming Languages · Computer Science 2022-11-28 Julian Erhard , Simmo Saan , Sarah Tilscher , Michael Schwarz , Karoliine Holter , Vesal Vojdani , Helmut Seidl

Verifying Safety-Critical Timing and Memory-Usage Properties of Embedded Software by Abstract Interpretation

Static program analysis by abstract interpretation is an efficient method to determine properties of embedded software. One example is value analysis, which determines the values stored in the processor registers. Its results are used as…

Logic in Computer Science · Computer Science 2011-11-09 Reinhold Heckmann , Christian Ferdinand

Static Analyzers and Potential Future Research Directions for Scala: An Overview

Static analyzers are tool sets which are proving to be indispensable to modern programmers. These enable the programmers to detect possible errors and security defects present in the current code base within the implementation phase of the…

Software Engineering · Computer Science 2019-05-14 Eljose E Sajan , Yunpeng Zhang , Liang-Chieh Cheng

Reliable and Interpretable Drift Detection in Streams of Short Texts

Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences.…

Computation and Language · Computer Science 2023-05-30 Ella Rabinovich , Matan Vetzler , Samuel Ackerman , Ateret Anaby-Tavor