Digital Libraries
I present a summary of recent use of the Open Archives Initiative (OAI) registration and validation services for data-providers. The registration service has seen a steady stream of registrations since its launch in 2002, and there are now…
Pre-print repositories have seen a significant increase in use over the past fifteen years across multiple research domains. Researchers are beginning to develop applications capable of using these repositories to assist the scientific…
We describe mod_oai, an Apache 2.0 module that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIPMH is the de facto standard for metadata exchange in digital libraries and allows repositories to expose…
This paper introduces the write-once/read-many XMLtape/ARC storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a…
We generated networks of journal relationships from citation and download data, and determined journal impact rankings from these networks using a set of social network centrality metrics. The resulting journal impact rankings were compared…
This paper describes the aDORe repository architecture, designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. The aDORe…
Harvested metadata often suffers from uneven quality to the point that utility is compromised. Although some aggregators have developed methods for evaluating and repairing specific metadata problems, it has been unclear how these methods…
We describe the underlying data model and implementation of a new architecture for the National Science Digital Library (NSDL) by the Core Integration Team (CI). The architecture is based on the notion of an information network overlay.…
SPIRES is the largest database of scientific papers in the subject field of high energy and nuclear physics. It contains information on the citation graph of more than half a million of papers (vertexes of the citation graph). We outline…
The Fedora architecture is an extensible framework for the storage, management, and dissemination of complex objects and the relationships among them. Fedora accommodates the aggregation of local and distributed content into digital objects…
Categorical data clustering (CDC) and link clustering (LC) have been considered as separate research and application areas. The main focus of this paper is to investigate the commonalities between these two problems and the uses of these…
In this paper we present Eurydice, a platform dedicated to provide a unified gateway to documents. Its basic functionalities about collecting documents have been designed based on a long experience about the management of scientific…
How can an author store digital information so that it will be reliably useful, even years later when he is no longer available to answer questions? Methods that might work are not good enough; what is preserved today should be reliably…
The immense investments in creating and disseminating digitally represented information have not been accompanied by commensurate effort to ensure the longevity of information of permanent interest. Asserted difficulties with long-term…
The design of the defenses Internet systems can deploy against attack, especially adaptive and resilient defenses, must start from a realistic model of the threat. This requires an assessment of the capabilities of the adversary. The design…
The LOCKSS digital preservation system collects content by crawling the web and preserves it in the format supplied by the publisher. Eventually, browsers will no longer understand that format. A process called format migration converts it…
Current approaches to the annotation process focus on annotation schemas, languages for annotation, or are very application driven. In this paper it is proposed that a more flexible architecture for annotation requires a knowledge component…
We discuss long-term preservation of and access to relational databases. The focus is on national archives and science data archives which have to ingest and integrate data from a broad spectrum of vendor-specific relational database…
We present an approach to automatically generate interfaces supporting personalized interaction with digital libraries; these interfaces augment the user-DL dialog by empowering the user to (optionally) supply out-of-turn information during…
We discuss a methodology to dynamically generate links among digital objects by means of an unsupervised learning mechanism which analyzes user link traversal patterns. We performed an experiment with a test bed of 150 complex data objects,…