English
Related papers

Related papers: Wrapper Maintenance: A Machine Learning Approach

200 papers

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the…

Artificial Intelligence · Computer Science 2012-02-13 Emilio Ferrara , Robert Baumgartner

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches…

Artificial Intelligence · Computer Science 2013-06-06 Emilio Ferrara , Robert Baumgartner

Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called…

Artificial Intelligence · Computer Science 2013-06-06 Emilio Ferrara , Robert Baumgartner

We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data,…

Databases · Computer Science 2011-03-15 Nilesh Dalvi , Ravi Kumar , Mohamed Soliman

Visual content has become the primary source of information, as evident in the billions of images and videos, shared and uploaded on the Internet every single day. This has led to an increase in alterations in images and videos to make them…

Computer Vision and Pattern Recognition · Computer Science 2020-01-22 Prabhat Kumar , Mayank Vatsa , Richa Singh

In this paper, we present a meta-analysis of several Web content extraction algorithms, and make recommendations for the future of content extraction on the Web. First, we find that nearly all Web content extractors do not consider a very…

Information Retrieval · Computer Science 2015-08-19 Tim Weninger , Rodrigo Palacios , Valter Crescenzi , Thomas Gottron , Paolo Merialdo

The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on…

Databases · Computer Science 2012-10-23 Tim Furche , Georg Gottlob , Giovanni Grasso , Giorgio Orsi , Christian Schallhart , Cheng Wang

The increasing adoption of econometric and machine-learning approaches by empirical researchers has led to a widespread use of one data collection method: web scraping. Web scraping refers to the use of automated computer programs to access…

General Economics · Economics 2023-08-07 Jens Foerderer

In this paper, we present a general scheme for building reproducible and extensible datasets for website phishing detection. The aim is to (1) enable comparison of systems using different features, (2) overtake the short-lived nature of…

Cryptography and Security · Computer Science 2024-04-24 Abdelhakim Hannousse , Salima Yahiouche

Efficient querying and retrieval of healthcare data is posing a critical challenge today with numerous connected devices continuously generating petabytes of images, text, and internet of things (IoT) sensor data. One approach to…

Machine Learning · Computer Science 2023-02-28 Sazia Mahfuz , Farhana Zulkernine

Can we preserve the accuracy of neural models while also providing faithful explanations of model decisions to training data? We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature…

Machine Learning · Computer Science 2024-10-07 Yiheng Su , Junyi Jessy Li , Matthew Lease

This paper proposes an iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question and its answer. Combining multiple sources…

Information Retrieval · Computer Science 2020-12-22 Ruben Cartuyvels , Graham Spinks , Marie-Francine Moens

Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them. While semantic information plays a crucial part in these processes, it remains unclear to what degree popular…

Machine Learning · Computer Science 2023-06-27 Shizhuo Dylan Zhang , Curt Tigges , Stella Biderman , Maxim Raginsky , Talia Ringer

Systems and machines undergo various failure modes that result in machine health degradation, so maintenance actions are required to restore them back to a state where they can perform their expected functions. Since maintenance tasks are…

Machine Learning · Computer Science 2023-07-11 Oluwaseyi Ogunfowora , Homayoun Najjaran

Dynamic sampling mechanisms in deep learning architectures have demonstrated utility across many computer vision models, though the theoretical analysis of these structures has not yet been unified. In this paper we connect the various…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Dario Morle , Reid Zaffino

Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing…

Machine Learning · Computer Science 2024-06-03 Pierre-Olivier Côté , Amin Nikanjam , Nafisa Ahmed , Dmytro Humeniuk , Foutse Khomh

Machine learning models are essential tools in various domains, but their performance can degrade over time due to changes in data distribution or other factors. On one hand, detecting and addressing such degradations is crucial for…

Machine Learning · Computer Science 2023-09-28 Florian Heinrichs

The employment of convolutional neural networks has achieved unprecedented performance in the task of image restoration for a variety of degradation factors. However, high-performance networks have been specifically designed for a single…

Computer Vision and Pattern Recognition · Computer Science 2020-01-22 Xing Liu , Masanori Suganuma , Xiyang Luo , Takayuki Okatani

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth. This paper presents…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Chuhui Xue , Zichen Tian , Fangneng Zhan , Shijian Lu , Song Bai
‹ Prev 1 2 3 10 Next ›