English
Related papers

Related papers: A Simple Mechanism for Focused Web-harvesting

200 papers

Search engines are a combination of hardware and computer software supplied by a particular company through the website which has been determined. Search engines collect information from the web through bots or web crawlers that crawls the…

Information Retrieval · Computer Science 2014-10-22 Ahmad Josi , Leon Andretti Abdillah , Suryayusra

A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. While surfing the internet it is difficult to deal with irrelevant pages and to predict which links lead to quality…

Information Retrieval · Computer Science 2009-06-30 Anshika Pal , Deepak Singh Tomar , S. C. Shrivastava

This paper presents an approach to identify efficient techniques used in Web Search Engine Optimization (SEO). Understanding SEO factors which can influence page ranking in search engine is significant for webmasters who wish to attract…

Information Retrieval · Computer Science 2015-09-29 Jai Manral , Mohammed Alamgir Hossain

Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention…

Multimedia · Computer Science 2020-05-21 F. Fauzi , H. J. Long , M. Belkhatir

Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs…

Information Retrieval · Computer Science 2012-01-12 P. Chitra , R. Baskaran , K. Sarukesi

In this world, globalization has become a basic and most popular human trend. To globalize information, people are going to publish the documents in the internet. As a result, information volume of internet has become huge. To handle that…

Information Retrieval · Computer Science 2013-11-26 Sukanta Sinha , Rana Dattagupta , Debajyoti Mukhopadhyay

Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot…

Information Retrieval · Computer Science 2024-09-20 Junbiao Pang , Anjing Hu , Qingming Huang

Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to…

Information Retrieval · Computer Science 2021-08-05 Albert Weichselbraun , Adrian M. P. Brasoveanu , Roger Waldvogel , Fabian Odoni

A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the crawling engine. The crawling engine finds new web pages and…

Networking and Internet Architecture · Computer Science 2012-01-20 Konstantin Avrachenkov , Alexander Dudin , Valentina Klimenok , Philippe Nain , Olga Semenova

An enormous volume of security-relevant information is present on the Web, for instance in the content produced each day by millions of bloggers worldwide, but discovering and making sense of these data is very challenging. This paper…

Social and Information Networks · Computer Science 2013-01-01 Kristin Glass , Richard Colbaugh

The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by…

Databases · Computer Science 2011-11-14 R. B. Geeta , Omkar Mamillapalli , Shasikumar G. Totad , Prasad Reddy P. V. G. D

This short paper is intended as an additional progress report to share our experiences in Indonesia on collecting, integrating and disseminating both global and local scientific data across the country through the web technology. Our recent…

Computers and Society · Computer Science 2009-03-05 L. T. Handoko

Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused…

Information Retrieval · Computer Science 2015-10-02 Ali Seyfi

When searching the web, it is often possible that there are too many results available for ambiguous queries. Text snippets, extracted from the retrieved pages, are an indicator of the pages' usefulness to the query intention and can be…

Information Retrieval · Computer Science 2009-03-24 N. Zotos , P. Tzekou , G. Tsatsaronis , L. Kozanidis , S. Stamou , I. Varlamis

Online information has increased tremendously in today's age of Internet. As a result, the need has arose to extract relevant content from the plethora of available information. Researchers are widely using automatic text summarization…

Social and Information Networks · Computer Science 2021-06-02 Mohd Khizir Siddiqui , Amreen Ahmad , Om Pal , Tanvir Ahmad

The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download. For the first task, we propose a sophisticated custom…

Information Retrieval · Computer Science 2015-10-02 Ali Seyfi

Information on the web is prodigious; searching relevant information is difficult making web users to rely on search engines for finding relevant information on the web. Search engines index and categorize web pages according to their…

Artificial Intelligence · Computer Science 2015-10-06 Jai Manral

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various…

Databases · Computer Science 2020-06-15 Till Blume , Ansgar Scherp

A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static…

Databases · Computer Science 2012-08-02 Cheng Sheng , Nan Zhang , Yufei Tao , Xin Jin

PageRank is a well-known centrality measure for the web used in search engines, representing the importance of each web page. In this paper, we follow the line of recent research on the development of distributed algorithms for computation…

Systems and Control · Electrical Eng. & Systems 2019-07-24 Atsushi Suzuki , Hideaki Ishii
‹ Prev 1 2 3 10 Next ›