Related papers: A Simple Mechanism for Focused Web-harvesting

Penerapan teknik web scraping pada mesin pencari artikel ilmiah

Search engines are a combination of hardware and computer software supplied by a particular company through the website which has been determined. Search engines collect information from the web through bots or web crawlers that crawls the…

Information Retrieval · Computer Science 2014-10-22 Ahmad Josi , Leon Andretti Abdillah , Suryayusra

Effective Focused Crawling Based on Content and Link Structure Analysis

A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. While surfing the internet it is difficult to deal with irrelevant pages and to predict which links lead to quality…

Information Retrieval · Computer Science 2009-06-30 Anshika Pal , Deepak Singh Tomar , S. C. Shrivastava

An Innovative Approach for online Meta Search Engine Optimization

This paper presents an approach to identify efficient techniques used in Web Search Engine Optimization (SEO). Understanding SEO factors which can influence page ranking in search engine is significant for webmasters who wish to attract…

Information Retrieval · Computer Science 2015-09-29 Jai Manral , Mohammed Alamgir Hossain

Webpage Segmentation for Extracting Images and Their Surrounding Contextual Information

Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention…

Multimedia · Computer Science 2020-05-21 F. Fauzi , H. J. Long , M. Belkhatir

Query sensitive comparative summarization of search results using concept based segmentation

Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs…

Information Retrieval · Computer Science 2012-01-12 P. Chitra , R. Baskaran , K. Sarukesi

Web-page Indexing based on the Prioritize Ontology Terms

In this world, globalization has become a basic and most popular human trend. To globalize information, people are going to publish the documents in the internet. As a result, information volume of internet has become huge. To handle that…

Information Retrieval · Computer Science 2013-11-26 Sukanta Sinha , Rana Dattagupta , Debajyoti Mukhopadhyay

Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection

Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot…

Information Retrieval · Computer Science 2024-09-20 Junbiao Pang , Anjing Hu , Qingming Huang

Harvest -- An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums

Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to…

Information Retrieval · Computer Science 2021-08-05 Albert Weichselbraun , Adrian M. P. Brasoveanu , Roger Waldvogel , Fabian Odoni

Optimal Threshold Control by the Robots of Web Search Engines with Obsolescence of Documents

A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the crawling engine. The crawling engine finds new web pages and…

Networking and Internet Architecture · Computer Science 2012-01-20 Konstantin Avrachenkov , Alexander Dudin , Valentina Klimenok , Philippe Nain , Olga Semenova

Web Analytics for Security Informatics

An enormous volume of security-relevant information is present on the Web, for instance in the content produced each day by millions of bloggers worldwide, but discovering and making sense of these data is very challenging. This paper…

Social and Information Networks · Computer Science 2013-01-01 Kristin Glass , Richard Colbaugh

A Novel Approach for Web Page Set Mining

The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by…

Databases · Computer Science 2011-11-14 R. B. Geeta , Omkar Mamillapalli , Shasikumar G. Totad , Prasad Reddy P. V. G. D

A new approach for scientific data dissemination in developing countries: a case of Indonesia

This short paper is intended as an additional progress report to share our experiences in Indonesia on collecting, integrating and disseminating both global and local scientific data across the country through the web technology. Our recent…

Computers and Society · Computer Science 2009-03-05 L. T. Handoko

Analysis and Evaluation of the Link and Content Based Focused Treasure-Crawler

Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused…

Information Retrieval · Computer Science 2015-10-02 Ali Seyfi

To Click or not to Click? The Role of Contextualized and User-Centric Web Snippets

When searching the web, it is often possible that there are too many results available for ambiguous queries. Text snippets, extracted from the retrieved pages, are an indicator of the pages' usefulness to the query intention and can be…

Information Retrieval · Computer Science 2009-03-24 N. Zotos , P. Tzekou , G. Tsatsaronis , L. Kozanidis , S. Stamou , I. Varlamis

CoRank: A clustering cum graph ranking approach for extractive summarization

Online information has increased tremendously in today's age of Internet. As a result, the need has arose to extract relevant content from the plethora of available information. Researchers are widely using automatic text summarization…

Social and Information Networks · Computer Science 2021-06-02 Mohd Khizir Siddiqui , Amreen Ahmad , Om Pal , Tanvir Ahmad

A Focused Crawler Combinatory Link and Content Model Based on T-Graph Principles

The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download. For the first task, we propose a sophisticated custom…

Information Retrieval · Computer Science 2015-10-02 Ali Seyfi

Intelligent Search Optimization using Artificial Fuzzy Logics

Information on the web is prodigious; searching relevant information is difficult making web users to rely on search engines for finding relevant information on the web. Search engines index and categorize web pages according to their…

Artificial Intelligence · Computer Science 2015-10-06 Jai Manral

Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search -- Extended Technical Report

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various…

Databases · Computer Science 2020-06-15 Till Blume , Ansgar Scherp

Optimal Algorithms for Crawling a Hidden Database in the Web

A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static…

Databases · Computer Science 2012-08-02 Cheng Sheng , Nan Zhang , Yufei Tao , Xin Jin

Efficient PageRank Computation via Distributed Algorithms with Web Clustering

PageRank is a well-known centrality measure for the web used in search engines, representing the importance of each web page. In this paper, we follow the line of recent research on the development of distributed algorithms for computation…

Systems and Control · Electrical Eng. & Systems 2019-07-24 Atsushi Suzuki , Hideaki Ishii