Related papers: A novel multi-threaded web crawling model

WebParF: A Web partitioning framework for Parallel Crawlers

With the ever proliferating size and scale of the WWW [1] efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this era of tera and multi-core…

Information Retrieval · Computer Science 2014-06-24 Sonali Gupta , Komal kumar Bhatia , Pikakshi Manchanda

Characteristics of multithreading models for high-performance IO driven network applications

In a technological landscape that is quickly moving toward dense multi-CPU and multi-core computer systems, where using multithreading is an increasingly popular application design decision, it is important to choose a proper model for…

Networking and Internet Architecture · Computer Science 2009-09-29 Ivan Voras , Mario Zagar

Smart Bilingual Focused Crawling of Parallel Documents

Crawling parallel texts -- texts that are mutual translations -- from the Internet is usually done following a brute-force approach: documents are massively downloaded in an unguided process, and only a fraction of them end up leading to…

Computation and Language · Computer Science 2026-04-22 Cristian García-Romero , Miquel Esplà-Gomis , Felipe Sánchez-Martínez

A Learning Support Method for Multi-threaded Programs Using Trace Tables

Multi-threaded programs are expected to improve responsiveness and conserve resources by dividing an application process into multiple threads for concurrent processing. However, due to scheduling and the interaction of multiple threads,…

Software Engineering · Computer Science 2024-09-26 Takumi Murata , Hiroaki Hashiura

A General, Tractable and Accurate Model for a Cascade of Caches

Performance evaluation of caching systems is an old and widely investigated research topic. The research community is once again actively working on this topic because the Internet is evolving towards new transfer modes, which envisage to…

Networking and Internet Architecture · Computer Science 2013-09-04 G. Bianchi , N. Blefari Melazzi , A. Caponi , A. Detti

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

Map-Reduce for Multiprocessing Large Data and Multi-threading for Data Scraping

This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm…

Numerical Analysis · Mathematics 2023-12-27 Zefeng Qiu , Prashanth Umapathy , Qingquan Zhang , Guanqun Song , Ting Zhu

Analysis of a Statistical Hypothesis Based Learning Mechanism for Faster crawling

The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a…

Machine Learning · Computer Science 2012-08-15 Sudarshan Nandy , Partha Pratim Sarkar , Achintya Das

Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine

Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to…

Information Retrieval · Computer Science 2011-02-04 Debajyoti Mukhopadhyay , Sajal Mukherjee , Soumya Ghosh , Saheli Kar , Young-Chon Kim

Memory-Based Multi-Processing Method For Big Data Computation

The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-13 Youssef Bassil

A New Clustering Approach based on Page's Path Similarity for Navigation Patterns Mining

In recent years, predicting the user's next request in web navigation has received much attention. An information source to be used for dealing with such problem is the left information by the previous web users stored at the web access log…

Machine Learning · Computer Science 2010-04-28 Heidar Mamosian , Amir Masoud Rahmani , Mashalla Abbasi Dezfouli

Modeling Web Browsing Behavior across Tabs and Websites with Tracking and Prediction on the Client Side

Clickstreams on individual websites have been studied for decades to gain insights into user interests and to improve website experiences. This paper proposes and examines a novel sequence modeling approach for web clickstreams, that also…

Human-Computer Interaction · Computer Science 2021-03-09 Changkun Ou , Daniel Buschek , Malin Eiband , Andreas Butz

Analysis of Statistical Hypothesis based Learning Mechanism for Faster Crawling

The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a…

Information Retrieval · Computer Science 2012-08-14 Sudarshan Nandy , Partha Pratim Sarkar , Achintya Das

Scheduling of Hard Real-Time Multi-Thread Periodic Tasks

In this paper we study the scheduling of parallel and real-time recurrent tasks. Firstly, we propose a new parallel task model which allows recurrent tasks to be composed of several threads, each thread requires a single processor for…

Operating Systems · Computer Science 2015-03-19 Irina Iulia Lupu , Joël Goossens

Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search -- Extended Technical Report

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various…

Databases · Computer Science 2020-06-15 Till Blume , Ansgar Scherp

Model-Parallel Model Selection for Deep Learning Systems

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-15 Kabir Nagrecha

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Efficient Continuous Multi-Query Processing over Graph Streams

Graphs are ubiquitous and ever-present data structures that have a wide range of applications involving social networks, knowledge bases and biological interactions. The evolution of a graph in such scenarios can yield important insights…

Data Structures and Algorithms · Computer Science 2019-02-15 Lefteris Zervakis , Vinay Setty , Christos Tryfonopoulos , Katja Hose

Online Machine Learning in Big Data Streams

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-19 András A. Benczúr , Levente Kocsis , Róbert Pálovics

Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches

Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only…

Data Structures and Algorithms · Computer Science 2021-11-01 Alessandro Epasto , Mohammad Mahdian , Vahab Mirrokni , Peilin Zhong