English
Related papers

Related papers: Web Archive Analytics

200 papers

Common Crawl is a multi-petabyte longitudinal dataset containing over 100 billion web pages which is widely used as a source of language data for sequence model training and in web science research. Each of its constituent archives is on…

Networking and Internet Architecture · Computer Science 2024-04-16 Henry S. Thompson

Curated web archive collections contain focused digital content which is collected by archiving organizations, groups, and individuals to provide a representative sample covering specific topics and events to preserve them for future…

Digital Libraries · Computer Science 2017-02-03 Zeon Trevor Fernando , Ivana Marenzi , Wolfgang Nejdl

Curated web archive collections contain focused digital contents which are collected by archiving organizations to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In…

Digital Libraries · Computer Science 2017-02-02 Zeon Trevor Fernando , Ivana Marenzi , Wolfgang Nejdl , Rishita Kalyani

Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller…

Digital Libraries · Computer Science 2017-02-06 Helge Holzmann , Vinay Goel , Avishek Anand

Web archives preserve unique and historically valuable information. They hold a record of past events and memories published by all kinds of people, such as journalists, politicians and ordinary people who have shared their testimony and…

Digital Libraries · Computer Science 2021-08-04 Miguel Costa , Julien Masanès

Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to…

Digital Libraries · Computer Science 2016-12-20 Gerhard Gossen , Elena Demidova , Thomas Risse

The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of…

Digital Libraries · Computer Science 2016-11-09 Ed Summers , Ricardo Punzalan

Although the Internet Archive's Wayback Machine is the largest and most well-known web archive, there have been a number of public web archives that have emerged in the last several years. With varying resources, audiences and collection…

Digital Libraries · Computer Science 2013-01-08 Scott G. Ainsworth , Ahmed AlSum , Hany SalahEldeen , Michele C. Weigle , Michael L. Nelson

We present a framework for web-scale archiving of the dark web. While commonly associated with illicit and illegal activity, the dark web provides a way to privately access web information. This is a valuable and socially beneficial tool to…

Digital Libraries · Computer Science 2021-07-12 Justin F. Brunelle , Ryan Farley , Grant Atkins , Trevor Bostic , Marites Hendrix , Zak Zebrowski

The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose…

The World Wide Web is the most wide known information source that is easily available and searchable. It consists of billions of interconnected documents Web pages are authored by millions of people. Accesses made by various users to pages…

Databases · Computer Science 2014-08-26 Priyanka Verma , Nishtha Kesswani

An enormous volume of security-relevant information is present on the Web, for instance in the content produced each day by millions of bloggers worldwide, but discovering and making sense of these data is very challenging. This paper…

Social and Information Networks · Computer Science 2013-01-01 Kristin Glass , Richard Colbaugh

Web archiving services play an increasingly important role in today's information ecosystem, by ensuring the continuing availability of information, or by deliberately caching content that might get deleted or removed. Among these, the…

Computers and Society · Computer Science 2018-04-10 Savvas Zannettou , Jeremy Blackburn , Emiliano De Cristofaro , Michael Sirivianos , Gianluca Stringhini

Web archiving is the process of collecting portions of the Web to ensure that the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful…

Digital Libraries · Computer Science 2018-10-25 Pavlos Fafalios , Helge Holzmann , Vaibhav Kasturia , Wolfgang Nejdl

Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources.…

Digital Libraries · Computer Science 2017-05-18 Yasmin AlNoamany , Michele C. Weigle , Michael L. Nelson

Archiving the web is socially and culturally critical, but presents problems of scale. The Internet Archive's Wayback Machine can replay captured web pages as they existed at a certain point in time, but it has limited ability to provide…

Information Retrieval · Computer Science 2013-06-12 Ahmed AlSum , Michael L. Nelson

Web traffic is a valuable data source, typically used in the marketing space to track brand awareness and advertising effectiveness. However, web traffic is also a rich source of information for cybersecurity monitoring efforts. To better…

Information Retrieval · Computer Science 2019-04-04 Han Qin , Kit Riehle , Haozhen Zhao

In recent years, journalists and other researchers have used web archives as an important resource for their study of disinformation. This paper provides several examples of this use and also brings together some of the work that the Old…

Digital Libraries · Computer Science 2023-06-19 Michele C. Weigle

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves,…

Digital Libraries · Computer Science 2021-01-26 Shawn M. Jones , Alexander Nwala , Michele C. Weigle , Michael L. Nelson

In order to evaluate, compare, and tune graph algorithms, experiments on well designed benchmark sets have to be performed. Together with the goal of reproducibility of experimental results, this creates a demand for a public archive to…

‹ Prev 1 2 3 10 Next ›