Related papers: EasyNData: A simple tool to extract numerical valu…
Automated data extraction from research texts has been steadily improving, with the emergence of large language models (LLMs) accelerating progress even further. Extracting data from plots in research papers, however, has been such a…
The table analysis application TOPCAT uses a custom Java plotting library for highly configurable high-performance interactive or exported visualisations in two and three dimensions. We present here a variety of ways for end users or…
Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical…
Program understanding is an important aspect in Software Maintenance and Reengineering. Understanding the program is related to execution behaviour and relationship of variable involved in the program. The task of finding all statements in…
There are plenty of excellent plotting libraries. Each excels at a different use case: one is good for printed 2D publication figures, the other at interactive 3D graphics, a third has excellent L A TEX integration or is good for creating…
It is common for authors to communicate their results in graphical figures, but those data are frequently unavailable for reanalysis. Reconstructing data points from a figure manually requires the author to measure the coordinates either on…
Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, one-sample, two-sample and multiple-sample analysis, association between bivariate…
Presented here are algorithms for converting between (decimal) scientific-notation and (binary) IEEE-754 double-precision floating-point numbers. By employing a rounding integer quotient operation these algorithms are much simpler than…
We present an application, Superplot, for calculating and plotting statistical quantities relevant to parameter inference from a "chain" of samples drawn from a parameter space, produced by e.g. MultiNest. A simple graphical interface…
This paper describes a new modelling language for the effective design of Java annotations. Since their inclusion in the 5th edition of Java, annotations have grown from a useful tool for the addition of meta-data to play a central role in…
This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a…
Java implementations of algorithms used by spreadsheets to automatically recompute the set of cells dependent on a changed cell are described using a mathematical model for spreadsheets based on graph theory. These solutions comprise part…
tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for…
Dependencies between types in object-oriented software can be viewed as directed graphs, with types as nodes and dependencies as edges. The in-degree and out-degree distributions of such graphs have quite different forms, with the former…
Proponents of software verification have argued that simpler code is easier to verify: that is, that verification tools issue fewer false positives and require less human intervention when analyzing simpler code. We empirically validate…
Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents…
Numerical data processing is a key task across different fields of computer technology use. However, even simple summation of values is not precise due to the floating point representation use. This paper presents a practical algorithm for…
The paper advocates the use of a statistical tool dedicated to the exploration of data samples populated by several sources of events. This new technique, called sPlot, is able to unfold the contributions of the different sources to the…
Scientific software is one of the key elements for reproducible research. However, classic publications and related scientific software are typically not (sufficiently) linked, and it lacks tools to jointly explore these artefacts. In this…
One major challenge in science is to make all results potentially reproducible. Thus, along with the raw data, every step from basic processing of the data, evaluation, to the generation of the figures, has to be documented as clearly as…