Related papers: A novel multi-threaded web crawling model
With the ever proliferating size and scale of the WWW [1] efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this era of tera and multi-core…
In a technological landscape that is quickly moving toward dense multi-CPU and multi-core computer systems, where using multithreading is an increasingly popular application design decision, it is important to choose a proper model for…
Crawling parallel texts -- texts that are mutual translations -- from the Internet is usually done following a brute-force approach: documents are massively downloaded in an unguided process, and only a fraction of them end up leading to…
Multi-threaded programs are expected to improve responsiveness and conserve resources by dividing an application process into multiple threads for concurrent processing. However, due to scheduling and the interaction of multiple threads,…
Performance evaluation of caching systems is an old and widely investigated research topic. The research community is once again actively working on this topic because the Internet is evolving towards new transfer modes, which envisage to…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm…
The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a…
Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to…
The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…
In recent years, predicting the user's next request in web navigation has received much attention. An information source to be used for dealing with such problem is the left information by the previous web users stored at the web access log…
Clickstreams on individual websites have been studied for decades to gain insights into user interests and to improve website experiences. This paper proposes and examines a novel sequence modeling approach for web clickstreams, that also…
The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a…
In this paper we study the scheduling of parallel and real-time recurrent tasks. Firstly, we propose a new parallel task model which allows recurrent tasks to be composed of several threads, each thread requires a single processor for…
Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various…
As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too…
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…
Graphs are ubiquitous and ever-present data structures that have a wide range of applications involving social networks, knowledge bases and biological interactions. The evolution of a graph in such scenarios can yield important insights…
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…
Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only…