English
Related papers

Related papers: Routing Memento Requests Using Binary Classifiers

200 papers

Memento aggregators enable users to query multiple web archives for captures of a URI in time through a single HTTP endpoint. While this one-to-many access point is useful for researchers and end-users, aggregators are in a position to…

Digital Libraries · Computer Science 2023-01-10 Mat Kelly

Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we…

Information Retrieval · Computer Science 2019-06-04 Martin Klein , Lyudmila Balakireva , Harihar Shankar

The Memento aggregator currently polls every known public web archive when serving a request for an archived web page, even though some web archives focus on only specific domains and ignore the others. Similar to query routing in…

Digital Libraries · Computer Science 2013-09-17 Ahmed AlSum , Michele C. Weigle , Michael L. Nelson , Herbert Van de Sompel

Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the…

Digital Libraries · Computer Science 2019-09-11 Martin Klein , Harihar Shankar , Lyudmila Balakireva , Herbert Van de Sompel

In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the…

Digital Libraries · Computer Science 2019-05-30 Sawood Alam , Michele C. Weigle , Michael L. Nelson , Fernando Melo , Daniel Bicho , Daniel Gomes

When a user requests a web page from a web archive, the user will typically either get an HTTP 200 if the page is available, or an HTTP 404 if the web page has not been archived. This is because web archives are typically accessed by URI…

Digital Libraries · Computer Science 2019-08-09 Lulwah M. Alkwai , Michael L. Nelson , Michele C. Weigle

The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content…

Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private…

Digital Libraries · Computer Science 2018-06-05 Mat Kelly , Michael L. Nelson , Michele C. Weigle

We document the creation of a data set of 16,627 archived web pages, or mementos, of 3,698 unique live web URIs (Uniform Resource Identifiers) from 17 public web archives. We used four different methods to collect the dataset. First, we…

Digital Libraries · Computer Science 2019-05-13 Mohamed Aturban , Michael L. Nelson , Michele C. Weigle , Martin Klein , Herbert Van de Sompel

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We…

Computation and Language · Computer Science 2025-11-05 Hung-Ting Chen , Xiang Liu , Shauli Ravfogel , Eunsol Choi

A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it…

Emerging Technologies · Computer Science 2020-04-08 Can Li , Catherine E. Graves , Xia Sheng , Darrin Miller , Martin Foltin , Giacomo Pedretti , John Paul Strachan

In this paper, we present a generic, query-efficient black-box attack against API call-based machine learning malware classifiers. We generate adversarial examples by modifying the malware's API call sequences and non-sequential features…

Cryptography and Security · Computer Science 2020-10-06 Ishai Rosenberg , Asaf Shabtai , Yuval Elovici , Lior Rokach

Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the…

Digital Libraries · Computer Science 2018-04-06 Martin Klein , Lyudmila Balakireva , Herbert Van de Sompel

Modeling of long history data suffers from long-context window attention dilution, system efficiency and catastrophic forgetting problems, where naive linear scaling approach like LastN would fail. We introduce Memento, a personalized…

In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are…

Digital Libraries · Computer Science 2011-05-18 Robert Sanderson , Mark Phillips , Herbert Van de Sompel

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural…

Prior work on web archive profiling were focused on Archival Holdings to describe what is present in an archive. This work defines and explores Archival Voids to establish a means to represent portions of URI spaces that are not present in…

Digital Libraries · Computer Science 2021-08-10 Sawood Alam , Michele C. Weigle , Michael L. Nelson

Embedding-based dense retrieval has become the cornerstone of many critical applications, where approximate nearest neighbor search (ANNS) queries are often combined with filters on labels such as dates and price ranges. Graph-based indexes…

Databases · Computer Science 2026-01-13 Yicheng Jin , Yongji Wu , Wenjun Hu , Bruce M. Maggs , Jun Yang , Xiao Zhang , Danyang Zhuo

Generative retrieval represents a novel approach to information retrieval. It uses an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current approaches…

Information Retrieval · Computer Science 2024-09-30 Yubao Tang , Ruqing Zhang , Jiafeng Guo , Maarten de Rijke , Wei Chen , Xueqi Cheng

We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which…

Computer Vision and Pattern Recognition · Computer Science 2017-03-03 Ahmet Iscen , Teddy Furon , Vincent Gripon , Michael Rabbat , Hervé Jégou
‹ Prev 1 2 3 10 Next ›