Digital Libraries
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy…
The proliferation of the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) has resulted in the creation of a large number of service providers, all harvesting from either data providers or aggregators. If data were…
This paper gives an absolute new search system that builds the information retrieval infrastructure for Internet. Now most search engine companies are mainly concerned with how to make profit from company users by advertisement and ranking…
We give a simple proof of the finite presentation of Sela's limit groups by using free actions on $\bbR^n$-trees. We first prove that Sela's limit groups do have a free action on an $\bbR^n$-tree. We then prove that a finitely generated…
The Open Archives Initiative (OAI) was created as a practical way to promote interoperability between eprint repositories. Although the scope of the OAI has been broadened, eprint repositories still represent a significant fraction of OAI…
The UQ Flint Archive houses the field notes and elicitation recordings made by Elwyn Flint in the 1950's and 1960's during extensive linguistic survey work across Queensland, Australia. The process of digitizing the contents of the UQ Flint…
Numerous systems for dissemination, retrieval, and archiving of documents have been developed in the past. Those systems often focus on one of these aspects and are hard to extend and combine. Typically, the transmission protocols, query…
This paper describes technological and methodological options to achieve interoperability in accessing electronic information resources, available in Internet, in the scope of Brazilian Digital Library in Science and Technology Project -…
Through a collaborative effort, the Fermilab Information Resources Department and Computing Division have created a "virtual library" of technical publications that provides public access to electronic full-text documents. This paper will…
This article explores the design and construction of a geo-spatial Internet web service application from the host web site perspective and from the perspective of an application using the web service. The TerraService.NET web service was…
This paper discusses the requirements of current and emerging applications based on the Open Archives Initiative (OAI) and emphasizes the need for a common infrastructure to support them. Inspired by HTTP proxy, cache, gateway and web…
The SkyServer provides Internet access to the public Sloan Digi-tal Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and archi-tecture. It also describes our experience…
We describe work leading toward specification of a technical architecture for the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL). This includes a technical scope and a functional model, with some…
We describe the core components of the architecture for the (NSDL) National Science, Mathematics, Engineering, and Technology Education Digital Library. Over time the NSDL will include heterogeneous users, content, and services. To…
With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or…
The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating…
I outline the involvement of the Los Alamos e-print archive (arXiv) within the Open Archives Initiative (OAI) and describe the implementation of the data provider side of the OAI protocol v1.0. I highlight the ways in which we map the…
Based on an empirical analysis of author usage of CoRR, and of its predecessor in the Los Alamos eprint archives, it is shown that CoRR has not yet been able to match the early growth of the Los Alamos physics archives. Some of the reasons…
Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key…
This is a response to the commentaries on "CoRR: A Computing Research Repository".