Related papers: PAC learning PDFA from data streams
State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from…
Intrusion detection systems are crucial for network security. Verification of these systems is complicated by various factors, including the heterogeneity of network platforms and the continuously changing landscape of cyber threats. In…
Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing)…
We study the generalization error of statistical learning algorithms in a non-i.i.d. setting, where the training data is sampled from a stationary mixing process. We develop an analytic framework for this scenario based on a reduction to…
We present a probabilistic model for stochastic iterative algorithms with the use case of optimization algorithms in mind. Based on this model, we present PAC-Bayesian generalization bounds for functions that are defined on the trajectory…
We present a new framework for deriving bounds on the generalization bound of statistical learning algorithms from the perspective of online learning. Specifically, we construct an online learning game called the "generalization game",…
We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data…
We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes…
We generalize the PAC (probably approximately correct) learning model to the quantum world by generalizing the concepts from classical functions to quantum processes, defining the problem of \emph{PAC learning quantum process}, and study…
In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It…
Monotone learning describes learning processes in which expected performance consistently improves as the amount of training data increases. However, recent studies challenge this conventional wisdom, revealing significant gaps in the…
In document classification, graph-based models effectively capture document structure, overcoming sequence length limitations and enhancing contextual understanding. However, most existing graph document representations rely on heuristics,…
We propose a neural machine-reading model that constructs dynamic knowledge graphs from procedural text. It builds these graphs recurrently for each step of the described procedure, and uses them to track the evolving states of participant…
A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning…
We introduce a novel technique for verification and model synthesis of sequential programs. Our technique is based on learning a regular model of the set of feasible paths in a program, and testing whether this model contains an incorrect…
Developing state-machine replication protocols for practical use is a complex and labor-intensive process because of the myriad of essential tasks (e.g., deployment, communication, recovery) that need to be taken into account in an…
In this paper we extend the work of Smith and Papamichail (1999) and present fast approximate Bayesian algorithms for learning in complex scenarios where at any time frame, the relationships between explanatory state space variables can be…
In many, if not most, machine learning applications the training data is naturally heterogeneous (e.g. federated learning, adversarial attacks and domain adaptation in neural net training). Data heterogeneity is identified as one of the…
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a…
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms…