Related papers: SimpliPy: A Source-Tracking Notional Machine for S…
Python's dynamic type system, while offering significant flexibility and expressiveness, poses substantial challenges for static analysis and automated tooling, particularly in unannotated or partially annotated codebases. Existing type…
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python…
PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…
Python is a popular high-level general-purpose programming language also heavily used by the scientific community. It supports a variety of different programming paradigms and is preferred by many for its ease of use. With the vision of…
Execution of concurrent programs implies frequent switching between different thread contexts. This property perplexes analyzing and reasoning about concurrent programs. Trace simplification is a technique that aims at alleviating this…
Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine…
In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using functional approach. This library defines operations on lazy data streams of named dictionaries represented as…
We present scg-cli, a~command line tool facilitating software comprehension. The tool extracts semantic information about code structure and dependencies from the Java and Scala projects, and structures it as a~Semantic Code Graph (SCG), an…
We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach…
HolPy is an interactive theorem proving system implemented in Python. It uses higher-order logic as the logical foundation. Its main features include a pervasive use of macros in producing, checking, and storing proofs, a JSON-based format…
Compilers use control flow graph (CFG) representations of low-level programs because they are suited to program analysis and optimizations. However, formalizing the behavior and metatheory of CFG programs is non-trivial: CFG programs don't…
GraphFlow is a visual workflow system designed to improve the reliability of agentic AI automation in multi-step, mission-critical processes. In these workflows, small errors compound rapidly: under an idealized model of independent steps,…
CurvPy is an open-source Python library for automated curve fitting and regression analysis, aiming to make advanced statistical and machine learning techniques more accessible. This paper explores the mathematical foundations and…
Proof autoformalization, the task of translating natural language theorems and proofs into machine-verifiable code, is a critical step for integrating large language models into rigorous mathematical workflows. Current approaches focus on…
We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…
The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data…
Many universities have courses and projects revolving around compiler or interpreter implementation as part of their degree programmes in computer science. In such teaching activities, tool support can be highly beneficial. While there are…
Program comprehension is a fundamental task in software development and maintenance processes. Software developers often need to understand a large amount of existing code before they can develop new features or fix bugs in existing…
Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs.…
In this report, we present a new programming model based on Pipelines and Operators, which are the building blocks of programs written in PiCo, a DSL for Data Analytics Pipelines. In the model we propose, we use the term Pipeline to denote…