Related papers: PDF/A standard for long term archiving
Initially developed and considered for providing authentication and integrity functions, digital signatures are studied nowadays in relation to electronic documents (edocs) so that they can be considered equivalent to handwritten signatures…
Including LaTeX source of mathematical expressions, within the PDF document of a text-book or research paper, has definite benefits regarding `Accessibility' considerations. Here we describe three ways in which this can be done, fully…
In recent years, as electronic files include personal records and business activities, these files can be used as important evidences in a digital forensic investigation process. In general, the data that can be verified using its own…
A file system standard for use with write-once media such as digital compact disks is proposed. The file system is designed to work with any operating system and a variety of physical media. Although the implementation is simple, it…
The field of digital preservation is being defined by a set of standards developed top-down, starting with an abstract reference model (OAIS) and gradually adding more specific detail. Systems claiming conformance to these standards are…
In the recent years, Portable Document Format, commonly known as PDF, has become a democratized standard for document exchange and dissemination. This trend has been due to its characteristics such as its flexibility and portability across…
In order to allow different software applications, in constant evolution, to interact and exchange data, flexible file formats are needed. A file format specification for different types of content has been elaborated to allow communication…
Portable Document Format (PDF) is a file format which is used worldwide as de-facto standard for exchanging documents. In fact this document that you are currently reading has been uploaded as a PDF. Confidential information is also…
Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…
PDFs are the second-most used document type on the internet (after HTML). Yet, existing QA datasets commonly start from text sources or only address specific domains. In this paper, we present pdfQA, a multi-domain 2K human-annotated…
Good software documentation encourages good software engineering, but the meaning of "good" documentation is vaguely defined in the software engineering literature. To clarify this ambiguity, we draw on work from the data and information…
Identifying how a file has been created is often interesting in security. It can be used by both attackers and defenders. Attackers can exploit this information to tune their attacks and defenders can understand how a malicious file has…
How can an author store digital information so that it will be reliably useful, even years later when he is no longer available to answer questions? Methods that might work are not good enough; what is preserved today should be reliably…
Under ideal conditions, the probability density function (PDF) of a random variable, such as a sensor measurement, would be well known and amenable to computation and communication tasks. However, this is often not the case, so the user…
Preserving access to file content requires preserving not just bits but also meaningful logical structures. The ongoing development of the Data Format Description Language (DFDL) is a completely general standard that addresses this need.…
Tampering or forgery of digital documents has become widespread, most commonly through altering images without any malicious intent such as enhancing the overall appearance of the image. However, there are occasions when tampering of…
This article aims at reengineering of PDF-based complex documents, where specifications of the Object Management Group (OMG) are our initial targets. Our motivation is that such specifications are dense and intricate to use, and tend to…
A standard file format is proposed to store process and event information, primarily output from parton-level event generators for further use by general-purpose ones. The information content is identical with what was already defined by…
The Windsor Study Group on Digital Archiving was commissioned to recommend strategies, policies, and technologies necessary for ensuring the integrity and longevity of electronic publications. The goal of this work is to inform institutions…
Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document…