Related papers: A method for deriving information from running R c…
Extracting information from r\'esum\'es is typically formulated as a two-stage problem, where the document is first segmented into sections and then each section is processed individually to extract the target entities. Instead, we cast the…
We present the first technique to synthesize programs that compose side-effecting functions, pure functions, and control flow, from partial traces containing records of only the side-effecting functions. This technique can be applied to…
To efficiently execute dynamically typed languages, many language implementations have adopted a two-tier architecture. The first tier aims for low-latency startup times and collects dynamic profiles, such as the dynamic types of variables.…
Reproducibility is increasingly important to statistical research, but many details are often omitted from the published version of complex statistical analyses. A reader's comprehension is limited to what the author concludes, without…
This technical memo describes Information Extraction from the point-of-view of a potential user of the technology. No knowledge of language processing is assumed. Information Extraction is a process which takes unseen texts as input and…
Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their…
CONTEXT The R programming language has a huge and active community, especially in the area of statistical computing. Its interpreted nature allows for several interesting constructs, like the manipulation of functions at run-time, that…
We show how to use aspect-oriented programming to separate security and trust issues from the logical design of mobile, distributed systems. The main challenge is how to enforce various types of security policies, in particular predictive…
Implicit feedback is collecting information about software usage to understand how and when the software is used. This research tackles implicit feedback in Software Product Lines (SPLs). The need for platform-centric feedback makes SPL…
Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To…
I describe an approach to compiling common idioms in R code directly to native machine code and illustrate it with several examples. Not only can this yield significant performance gains, but it allows us to use new approaches to computing…
The presence of undesired background areas associated with potential noise and unknown spectral characteristics degrades the performance of hyperspectral data processing. Masking out unwanted regions is key to addressing this issue.…
Software is increasingly produced in the form of ecosystems, collections of interdependent components maintained by a distributed community. These ecosystems act as network organizations, not markets, and thus often lack actionable…
Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first…
Session-based recommender systems typically focus on using only the triplet (user_id, timestamp, item_id) to make predictions of users' next actions. In this paper, we aim to utilize side information to help recommender systems catch…
In this paper, we present a tertiary systematic literature review of previous surveys, secondary systematic literature reviews, and systematic mappings. We identify the main observations (what we know) and challenges (what we do not know)…
Profiling techniques are used extensively at different parts of the computing stack to achieve many goals. One major goal is to make a piece of software execute more efficiently on a specific hardware platform, where efficiency spans…
The canonical analytics architecture today consists of a browser connected to a backend in the cloud. In all deployments that we are aware of, the browser is simply a dumb rendering endpoint. As an alternative, this paper explores…
To derive a program for a given specification R means to find an artifact P that satisfies two conditions: P is executable in some programming language; and P is correct with respect to R. Refinement-based program derivation achieves this…
Field data is an invaluable source of information for testers and developers because it witnesses how software systems operate in real environments, capturing scenarios and configurations relevant to end-users. Unfortunately, collecting…