Related papers: Object-oriented Neural Programming (OONP) for Docu…
Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function…
Object-Oriented Programming (OOP) has become a crucial paradigm for managing the growing complexity of modern software systems, particularly in fields like machine learning, deep learning, large language models (LLM), and data analytics.…
Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning. However, previous work on action-conditioned dynamics prediction focuses on learning the pixel-level motion and thus…
Semantic parsing is the task of converting natural language utterances into machine interpretable meaning representations which can be executed against a real-world environment such as a database. Scaling semantic parsing to arbitrary…
Operational semantics has established itself as a flexible but rigorous means to describe the meaning of programming languages. Oftentimes, it is felt necessary to keep a semantics small, for example to facilitate its use for model checking…
Documents are central to many business systems, and include forms, reports, contracts, invoices or purchase orders. The information in documents is typically in natural language, but can be organized in various layouts and formats. There…
Semantic understanding of programs is a fundamental problem for programming language processing (PLP). Recent works that learn representations of code based on pre-training techniques in NLP have pushed the frontiers in this direction.…
We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial…
Automatic (i.e., computer-assisted) theorem proving (ATP) can come in many flavors. This document presents early steps in our effort towards defining object-oriented theorem proving (OOTP) as a new style of ATP. Traditional theorem proving…
The paper introduces the principles of object-oriented translation for target machine which provides executing the sequences of elementary operations on persistent data presented as a set of relations (programmable relational system). The…
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural…
Object-oriented programming (OOP) is aimed at describing the structure and behaviour of objects by hiding the mechanism of their representation and access in primitive references. In this article we describe an approach, called…
The representation of workflows and processes is essential in materials science engineering, where experimental and computational reproducibility depend on structured and semantically coherent process models. Although numerous ontologies…
The evolution of programming languages from low-level assembly to high-level abstractions demonstrates a fundamental principle: by constraining how programmers express computation and enriching semantic information at the language level, we…
The heterogeneity of data poses a great challenge when data from different sources is to be merged for one application. Solutions for this are offered, for example, by ontology-based data management (OBDM). A challenge of OBDM is the…
Previous studies on program comprehension were carried out largely in the context of procedural languages. Our purpose is to develop and evaluate a cognitive model of object-oriented (OO) program understanding. Our model is based on the van…
In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers…
We propose a framework grounded in Logic Programming for representing and reasoning about business processes from both the procedural and ontological point of views. In particular, our goal is threefold: (1) define a logical language and a…
Representing structured text from complex documents typically calls for different machine learning techniques, such as language models for paragraphs and convolutional neural networks (CNNs) for table extraction, which prohibits drawing…
Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…