Related papers: Interstitial Content Detection
The contextual information of Web images is investigated to address the issue of characterizing their content with semantic descriptors and therefore bridge the semantic gap, i.e. the gap between their automated low-level representation in…
Template detection and content extraction are two of the main areas of information retrieval applied to the Web. They perform different analyses over the structure and content of webpages to extract some part of the document. However, their…
Table of content (TOC) detection has drawn attention now a day because it plays an important role in digitization of multipage document. Generally book document is multipage document. So it becomes necessary to detect Table of Content page…
Internet censors seek ways to identify and block internet access to information they deem objectionable. Increasingly, censors deploy advanced networking tools such as deep-packet inspection (DPI) to identify such connections. In response,…
The internet can be broadly divided into three parts: surface, deep and dark. The dark web has become notorious in the media for being a hidden part of the web where all manner of illegal activities take place. This review investigates how…
Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams) as well as high level applications like opinion mining and content summarization. One…
Information on any given topic is often scattered across the web. Previously this scatter has been characterized through the distribution of a set of facts (i.e. pieces of information) across web pages, showing that typically a few pages…
Convolutional neural network based face forgery detection methods have achieved remarkable results during training, but struggled to maintain comparable performance during testing. We observe that the detector is prone to focus more on…
Malicious web content is a serious problem on the Internet today. In this paper we propose a deep learning approach to detecting malevolent web pages. While past work on web content detection has relied on syntactic parsing or on emulation…
Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis. To counter thus, a number of approaches have been designed aiming to achieve a healthier and safer online news and media…
The Digital Forgeries though not visibly identifiable to human perception it may alter or meddle with underlying natural statistics of digital content. Tampering involves fiddling with video content in order to cause damage or make…
Content fingerprinting and digital watermarking are techniques that are used for content protection and distribution monitoring. Over the past few years, both techniques have been well studied and their shortcomings understood. Recently, a…
Dark patterns are deceptive user interfaces employed by e-commerce websites to manipulate user's behavior in a way that benefits the website, often unethically. This study investigates the detection of such dark patterns. Existing solutions…
Surface cracks are a very common indicator of potential structural faults. Their early detection and monitoring is an important factor in structural health monitoring. Left untreated, they can grow in size over time and require expensive…
As the information contained within the web is increasing day by day, organizing this information could be a necessary requirement.The data mining process is to extract information from a data set and transform it into an understandable…
Nowadays, the Web has become one of the most widespread platforms for information change and retrieval. As it becomes easier to publish documents, as the number of users, and thus publishers, increases and as the number of documents grows,…
Content based image retrieval, a technique which uses visual contents of image to search images from large scale image databases according to users' interests. This paper provides a comprehensive survey on recent technology used in the area…
Since the proliferation of LLMs, there have been concerns about their misuse for harmful content creation and spreading. Recent studies justify such fears, providing evidence of LLM vulnerabilities and high potential of their misuse. Humans…
Modern websites include various types of third-party content such as JavaScript, images, stylesheets, and Flash objects in order to create interactive user interfaces. In addition to explicit inclusion of third-party content by website…
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to…