Related papers: Fusing Data with Correlations

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration

In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and…

Databases · Computer Science 2012-03-05 Bo Zhao , Benjamin I. P. Rubinstein , Jim Gemmell , Jiawei Han

Data Fusion: Resolving Conflicts from Multiple Sources

Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of…

Databases · Computer Science 2015-03-03 Xin Luna Dong , Laure Berti-Equille , Divesh Srivastava

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying…

Information Retrieval · Computer Science 2021-11-08 Richi Nayak , Thirunavukarasu Balasubramaniam , Sangeetha Kutty , Sachindra Banduthilaka , Erin Peterson

Querying with Conflicts of Interest

Conflicts of interest often arise between data sources and their users regarding how the users' information needs should be interpreted by the data source. For example, an online product search might be biased towards presenting certain…

Databases · Computer Science 2026-03-09 Nischal Aryal , Arash Termehchy , Marianne Winslett

Should we trust web-scraped data?

The increasing adoption of econometric and machine-learning approaches by empirical researchers has led to a widespread use of one data collection method: web scraping. Web scraping refers to the use of automated computer programs to access…

General Economics · Economics 2023-08-07 Jens Foerderer

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference…

Machine Learning · Computer Science 2012-07-19 Ben Wellner , Andrew McCallum , Fuchun Peng , Michael Hay

Data Source Selection for Information Integration in Big Data Era

In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and…

Databases · Computer Science 2016-11-01 Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao

Sampling Correctors

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported…

Data Structures and Algorithms · Computer Science 2018-04-03 Clément Canonne , Themis Gouleakis , Ronitt Rubinfeld

An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling

Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on…

Methodology · Statistics 2023-01-04 Raphaël Jauslin , Yves Tillé

"Don't quote me on that": Finding Mixtures of Sources in News Articles

Journalists publish statements provided by people, or \textit{sources} to contextualize current events, help voters make informed decisions, and hold powerful individuals accountable. In this work, we construct an ontological labeling…

Computation and Language · Computer Science 2021-04-21 Alexander Spangher , Nanyun Peng , Jonathan May , Emilio Ferrara

Faithful to the Document or to the World? Mitigating Hallucinations via Entity-linked Knowledge in Abstractive Summarization

Despite recent advances in abstractive summarization, current summarization systems still suffer from content hallucinations where models generate text that is either irrelevant or contradictory to the source document. However, prior work…

Computation and Language · Computer Science 2022-05-02 Yue Dong , John Wieting , Pat Verga

Scaling up Copy Detection

Recent research shows that copying is prevalent for Deep-Web data and considering copying can significantly improve truth finding from conflicting values. However, existing copy detection techniques do not scale for large sizes and numbers…

Databases · Computer Science 2015-03-03 Xian Li , Xin Luna Dong , Kenneth B. Lyons , Weiyi Meng , Divesh Srivastava

Design of Automatically Adaptable Web Wrappers

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches…

Artificial Intelligence · Computer Science 2013-06-06 Emilio Ferrara , Robert Baumgartner

A Survey on Truth Discovery

Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this…

Databases · Computer Science 2015-11-05 Yaliang Li , Jing Gao , Chuishi Meng , Qi Li , Lu Su , Bo Zhao , Wei Fan , Jiawei Han

Combining Data from Surveys and Related Sources

To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory…

Methodology · Statistics 2022-10-21 Dexter Cahoy , Joseph Sedransk

Conformal Prediction for Multi-Source Detection on a Network

Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-source detection problem: given snapshot observations…

Social and Information Networks · Computer Science 2025-12-02 Xingchao Jian , Purui Zhang , Lan Tian , Feng Ji , Wenfei Liang , Wee Peng Tay , Bihan Wen , Felix Krahmer

Adaptive Question Answering: Enhancing Language Model Proficiency for Addressing Knowledge Conflicts with Source Citations

Resolving knowledge conflicts is a crucial challenge in Question Answering (QA) tasks, as the internet contains numerous conflicting facts and opinions. While some research has made progress in tackling ambiguous settings where multiple…

Computation and Language · Computer Science 2024-10-30 Sagi Shaier , Ari Kobren , Philip Ogren

Distilling Information Reliability and Source Trustworthiness from Digital Traces

Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source…

Social and Information Networks · Computer Science 2017-04-04 Behzad Tabibian , Isabel Valera , Mehrdad Farajtabar , Le Song , Bernhard Schölkopf , Manuel Gomez-Rodriguez

Leveraging Association Rules for Better Predictions and Better Explanations

We present a new approach to classification that combines data and knowledge. In this approach, data mining is used to derive association rules (possibly with negations) from data. Those rules are leveraged to increase the predictive…

Artificial Intelligence · Computer Science 2025-10-22 Gilles Audemard , Sylvie Coste-Marquis , Pierre Marquis , Mehdi Sabiri , Nicolas Szczepanski

Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated…

Artificial Intelligence · Computer Science 2024-06-21 Pranav Janjani , Mayank Palan , Sarvesh Shirude , Ninad Shegokar , Sunny Kumar , Faruk Kazi