Related papers: DyPyBench: A Benchmark of Executable Python Softwa…
Despite being the most popular programming language, Python has not yet received enough attention from the community. To the best of our knowledge, there is no general static analysis framework proposed to facilitate the implementation of…
We introduce, a large-scale synthetic benchmark of 15,045 university-level physics problems (90/10% train/test split). Each problem is fully parameterized, supporting an effectively infinite range of input configurations, and is accompanied…
Python is one of the fastest-growing programming languages and currently ranks as the top language in many lists, even recently overtaking JavaScript as the top language on GitHub. Given its importance in data science and machine learning,…
Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…
Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are…
In the rapidly evolving software development landscape, Python stands out for its simplicity, versatility, and extensive ecosystem. Python packages, as units of organization, reusability, and distribution, have become a pressing concern,…
Malicious Python packages make software supply chains vulnerable by exploiting trust in open-source repositories like Python Package Index (PyPI). Lack of real-time behavioral monitoring makes metadata inspection and static code analysis…
This work proposes a framework of benchmark functions designed to facilitate the creation of test cases for numerical optimisation techniques. The framework, written in Python 3, is designed to be easy to install, use, and expand. The…
Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial…
Python is a popular, widely used, and general-purpose programming language. In spite of its ever-growing community, researchers have not performed much analysis on Python's topics, trends, and technologies which provides insights for…
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target…
Existing code generation benchmarks primarily evaluate functional correctness, with limited focus on code efficiency and often restricted to a single language like Python. To address this gap, we introduce EffiBench-X, the first…
As AI-driven document understanding and processing tools become increasingly prevalent in real-world applications, the need for rigorous evaluation standards has grown increasingly urgent. Existing benchmarks and evaluations often focus on…
Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on…
Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and…
This paper presents a systematic review of Python packages with a focus on time series analysis. The objective is to provide (1) an overview of the different time series analysis tasks and preprocessing methods implemented, and (2) an…
The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-world coding tasks, such as data analysis and image editing. However, existing benchmarks primarily focus on either simplistic tasks, such as…
Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns are raised about potential data contamination in their considerable volume of training corpus. Moreover, the static nature…
The study of complex many-body systems via analysis of the trajectories of the units that dynamically move and interact within them is a non-trivial task. The workflow for extracting meaningful information from the raw trajectory data is…
Active inference is an account of cognition and behavior in complex systems which brings together action, perception, and learning under the theoretical mantle of Bayesian inference. Active inference has seen growing applications in…