Related papers: Ten weeks in the life of an eDonkey server
Collecting information about user activity in peer-to-peer systems is a key but challenging task. We describe here a distributed platform for doing so on the eDonkey network, relying on a group of honeypot peers which claim to have certain…
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of…
This paper purpose is to investigate exponential behavior conditions for the infinite servers queue with Poisson arrivals busy period length distribution. It is presented a general theoretical result that is the basis of this work. The…
Computer users spend time every day interacting with digital files and folders, including downloading, moving, naming, navigating to, searching for, sharing, and deleting them. Such file management has been the focus of many studies across…
The SkyServer is an Internet portal to the Sloan Digital Sky Survey Catalog Archive Server. From 2001 to 2006, there were a million visitors in 3 million sessions generating 170 million Web hits, 16 million ad-hoc SQL queries, and 62…
Financial markets are extremely data-driven and regulated. Participants rely on notifications about significant events and background information that meet their requirements regarding timeliness, accuracy, and completeness. As one of…
As the Distributed Collection Manager's work on building tools to support users maintaining collections of changing web-based resources has progressed, questions about the characteristics of people's collections of web pages have arisen.…
The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search…
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly…
Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous…
Email is a ubiquitous communications tool in the workplace and plays an important role in social interactions. Previous studies of email were largely based on surveys and limited to relatively small populations of email users within…
Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge…
WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, understanding misbehavior on WhatsApp is an important issue. The sending of unwanted junk messages by unknown contacts via WhatsApp…
Discord has evolved from a gaming-focused communication tool into a versatile platform supporting diverse online communities. Despite its large user base and active public servers, academic research on Discord remains limited due to data…
Despite the growing popularity of video streaming over the Internet, problems such as re-buffering and high startup latency continue to plague users. In this paper, we present an end-to-end characterization of Yahoo's video streaming…
Enterprise networks are one of the major targets for cyber attacks due to the vast amount of sensitive and valuable data they contain. A common approach to detecting attacks in the enterprise environment relies on modeling the behavior of…
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database…
Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their…
Cloud-based documents are inherently valuable, due to the volume and nature of sensitive personal and business content stored in them. Despite the importance of such documents to Internet users, there are still large gaps in the…
Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread malware. Our goal is to present a definite figure about the characteristics of spam and spam vulnerable email accounts. These evaluations help…