Related papers: Routing Memento Requests Using Binary Classifiers

Aggregator Reuse and Extension for Richer Web Archive Interaction

Memento aggregators enable users to query multiple web archives for captures of a URI in time through a single HTTP endpoint. While this one-to-many access point is useful for researchers and end-users, aggregators are in a position to…

Digital Libraries · Computer Science 2023-01-10 Mat Kelly

Evaluating Memento Service Optimizations

Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we…

Information Retrieval · Computer Science 2019-06-04 Martin Klein , Lyudmila Balakireva , Harihar Shankar

Profiling Web Archive Coverage for Top-Level Domain and Content Language

The Memento aggregator currently polls every known public web archive when serving a request for an archived web page, even though some web archives focus on only specific domains and ignore the others. Similar to query routing in…

Digital Libraries · Computer Science 2013-09-17 Ahmed AlSum , Michele C. Weigle , Michael L. Nelson , Herbert Van de Sompel

The Memento Tracer Framework: Balancing Quality and Scalability for Web Archiving

Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the…

Digital Libraries · Computer Science 2019-09-11 Martin Klein , Harihar Shankar , Lyudmila Balakireva , Herbert Van de Sompel

MementoMap Framework for Flexible and Adaptive Web Archive Profiling

In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the…

Digital Libraries · Computer Science 2019-05-30 Sawood Alam , Michele C. Weigle , Michael L. Nelson , Fernando Melo , Daniel Bicho , Daniel Gomes

Making Recommendations from Web Archives for "Lost" Web Pages

When a user requests a web page from a web archive, the user will typically either get an HTTP 200 if the page is available, or an HTTP 404 if the web page has not been archived. This is because web archives are typically accessed by URI…

Digital Libraries · Computer Science 2019-08-09 Lulwah M. Alkwai , Michael L. Nelson , Michele C. Weigle

Memento: Time Travel for the Web

The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content…

Information Retrieval · Computer Science 2009-11-06 Herbert Van de Sompel , Michael L. Nelson , Robert Sanderson , Lyudmila L. Balakireva , Scott Ainsworth , Harihar Shankar

A Framework for Aggregating Private and Public Web Archives

Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private…

Digital Libraries · Computer Science 2018-06-05 Mat Kelly , Michael L. Nelson , Michele C. Weigle

Collecting 16K archived web pages from 17 public web archives

We document the creation of a data set of 16,627 archived web pages, or mementos, of 3,698 unique live web URIs (Uniform Resource Identifiers) from 17 public web archives. We used four different methods to collect the dataset. First, we…

Digital Libraries · Computer Science 2019-05-13 Mohamed Aturban , Michael L. Nelson , Michele C. Weigle , Martin Klein , Herbert Van de Sompel

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We…

Computation and Language · Computer Science 2025-11-05 Hung-Ting Chen , Xiang Liu , Shauli Ravfogel , Eunsol Choi

Analog content addressable memories with memristors

A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it…

Emerging Technologies · Computer Science 2020-04-08 Can Li , Catherine E. Graves , Xia Sheng , Darrin Miller , Martin Foltin , Giacomo Pedretti , John Paul Strachan

Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers

In this paper, we present a generic, query-efficient black-box attack against API call-based machine learning malware classifiers. We generate adversarial examples by modifying the malware's API call sequences and non-sequential features…

Cryptography and Security · Computer Science 2020-10-06 Ishai Rosenberg , Asaf Shabtai , Yuval Elovici , Lior Rokach

Focused Crawl of Web Archives to Build Event Collections

Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the…

Digital Libraries · Computer Science 2018-04-06 Martin Klein , Lyudmila Balakireva , Herbert Van de Sompel

Memento: Personalized RAG-Style Long-Retention Data Scaling for META Ads Recommendation

Modeling of long history data suffers from long-context window attention dilution, system efficiency and catastrophic forgetting problems, where naive linear scaling approach like LastN would fail. We introduce Memento, a personalized…

Information Retrieval · Computer Science 2026-05-26 Xiaoyu Chen , Ruichen Wang , Jieming Di , Suofei Feng , Nafis Abrar , Lilly Kumari , Tony Tsui , Yilin Liu , Yu Lu , Sowmya Patapati , Junwei Xiong , Qiao Yang , Dorothy Sun , Yang Cao , Victor Chen , Pan Chen , Ramsundar Sundarkumar , Shivendra Pratap Singh , Arnold Overwijk , Ling Leng , Dinesh Ramasamy , Sri Reddy , Robert Malkin , Sandeep Pandey

Analyzing the Persistence of Referenced Web Resources with Memento

In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are…

Digital Libraries · Computer Science 2011-05-18 Robert Sanderson , Mark Phillips , Herbert Van de Sompel

A Neural Corpus Indexer for Document Retrieval

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural…

Information Retrieval · Computer Science 2023-02-14 Yujing Wang , Yingyan Hou , Haonan Wang , Ziming Miao , Shibin Wu , Hao Sun , Qi Chen , Yuqing Xia , Chengmin Chi , Guoshuai Zhao , Zheng Liu , Xing Xie , Hao Allen Sun , Weiwei Deng , Qi Zhang , Mao Yang

Profiling Web Archival Voids for Memento Routing

Prior work on web archive profiling were focused on Archival Holdings to describe what is present in an archive. This work defines and explores Archival Voids to establish a means to represent portions of URI spaces that are not present in…

Digital Libraries · Computer Science 2021-08-10 Sawood Alam , Michele C. Weigle , Michael L. Nelson

Curator: Efficient Vector Search with Low-Selectivity Filters

Embedding-based dense retrieval has become the cornerstone of many critical applications, where approximate nearest neighbor search (ANNS) queries are often combined with filters on labels such as dates and price ranges. Graph-based indexes…

Databases · Computer Science 2026-01-13 Yicheng Jin , Yongji Wu , Wenjun Hu , Bruce M. Maggs , Jun Yang , Xiao Zhang , Danyang Zhuo

Generative Retrieval Meets Multi-Graded Relevance

Generative retrieval represents a novel approach to information retrieval. It uses an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current approaches…

Information Retrieval · Computer Science 2024-09-30 Yubao Tang , Ruqing Zhang , Jiafeng Guo , Maarten de Rijke , Wei Chen , Xueqi Cheng

Memory vectors for similarity search in high-dimensional spaces

We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which…

Computer Vision and Pattern Recognition · Computer Science 2017-03-03 Ahmet Iscen , Teddy Furon , Vincent Gripon , Michael Rabbat , Hervé Jégou