Related papers: OpenForge: Probabilistic Metadata Integration

Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical…

Machine Learning · Computer Science 2025-07-29 Ziyi Liang , Annie Qu , Babak Shahbaba

Compositional Inference Metaprogramming with Convergence Guarantees

Inference metaprogramming enables effective probabilistic programming by supporting the decomposition of executions of probabilistic programs into subproblems and the deployment of hybrid probabilistic inference algorithms that apply…

Programming Languages · Computer Science 2019-07-16 Shivam Handa , Vikash Mansinghka , Martin Rinard

A metadata model for profiling multidimensional sources in data ecosystems

The Big Data landscape poses challenges in managing diverse data formats, requiring efficient storage and processing for high-quality analysis. Effective metadata management is crucial for organizing, accessing, and reusing data within…

Databases · Computer Science 2025-03-21 Claudia Diamantini , Alessandro Mele , Domenico Potena , Cristina Rossetti , Emanuele Storti

Progressive Fusion for Multimodal Integration

Integration of multimodal information from various sources has been shown to boost the performance of machine learning models and thus has received increased attention in recent years. Often such models use deep modality-specific networks…

Machine Learning · Computer Science 2022-11-22 Shiv Shankar , Laure Thompson , Madalina Fiterau

Utilizing Metadata for Better Retrieval-Augmented Generation

Retrieval-Augmented Generation systems depend on retrieving semantically relevant document chunks to support accurate, grounded outputs from large language models. In structured and repetitive corpora such as regulatory filings, chunk…

Information Retrieval · Computer Science 2026-01-21 Raquib Bin Yousuf , Shengzhe Xu , Mandar Sharma , Andrew Neeser , Chris Latimer , Naren Ramakrishnan

Fast and Reliable Probabilistic Face Embeddings in the Wild

Probabilistic Face Embeddings (PFE) can improve face recognition performance in unconstrained scenarios by integrating data uncertainty into the feature representation. However, existing PFE methods tend to be over-confident in estimating…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Kai Chen , Qi Lv , Taihe Yi

FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation

This paper introduces \textit{Federated Retrieval-Augmented Generation (FRAG)}, a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems, which are increasingly powered by…

Cryptography and Security · Computer Science 2024-10-18 Dongfang Zhao

ProbFuse: A Probabilistic Approach to Data Fusion

Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval effectiveness over that of the…

Information Retrieval · Computer Science 2014-10-01 David Lillis , Fergus Toolan , Rem Collier , John Dunnion

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features…

Artificial Intelligence · Computer Science 2023-10-17 Shuwen Yang , Anran Wu , Xingjiao Wu , Luwei Xiao , Tianlong Ma , Cheng Jin , Liang He

MedForge: Building Medical Foundation Models Like Open Source Software Development

Foundational models (FMs) have made significant strides in the healthcare domain. Yet the data silo challenge and privacy concern remain in healthcare systems, hindering safe medical data sharing and collaborative model development among…

Machine Learning · Computer Science 2025-02-27 Zheling Tan , Kexin Ding , Jin Gao , Mu Zhou , Dimitris Metaxas , Shaoting Zhang , Dequan Wang

AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors

The complexity of financial data, characterized by its variability and low signal-to-noise ratio, necessitates advanced methods in quantitative investment that prioritize both performance and interpretability.Transitioning from early manual…

Computational Finance · Quantitative Finance 2024-12-13 Hao Shi , Weili Song , Xinting Zhang , Jiahe Shi , Cuicui Luo , Xiang Ao , Hamid Arian , Luis Seco

Backward-Compatible Prediction Updates: A Probabilistic Approach

When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly…

Machine Learning · Computer Science 2021-07-05 Frederik Träuble , Julius von Kügelgen , Matthäus Kleindessner , Francesco Locatello , Bernhard Schölkopf , Peter Gehler

Metamorphic Relation Prioritization for Effective Regression Testing

Metamorphic testing (MT) is widely used for testing programs that face the oracle problem. It uses a set of metamorphic relations (MRs), which are relations among multiple inputs and their corresponding outputs to determine whether the…

Software Engineering · Computer Science 2021-09-22 Madhusudan Srinivasan , Upulee Kanewala

HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs

Scientific applications produce vast amounts of data, posing grand challenges in the underlying data management and analytic tasks. Progressive compression is a promising way to address this problem, as it allows for on-demand data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-02 Yanliang Li , Wenbo Li , Qian Gong , Qing Liu , Norbert Podhorszki , Scott Klasky , Xin Liang , Jieyang Chen

Metadata-based Data Exploration with Retrieval-Augmented Generation for Large Language Models

Developing the capacity to effectively search for requisite datasets is an urgent requirement to assist data users in identifying relevant datasets considering the very limited available metadata. For this challenge, the utilization of…

Information Retrieval · Computer Science 2024-10-08 Teruaki Hayashi , Hiroki Sakaji , Jiayi Dai , Randy Goebel

A Replication Study on Predicting Metamorphic Relations at Unit Testing Level

Metamorphic Testing (MT) addresses the test oracle problem by examining the relations between inputs and outputs of test executions. Such relations are known as Metamorphic Relations (MRs). In current practice, identifying and selecting…

Software Engineering · Computer Science 2022-07-28 Alejandra Duque-Torres , Dietmar Pfahl , Rudolf Ramler , Claus Klammer

Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach

This paper presents an approach for metadata reconciliation, curation and linking for Open Governamental Data Portals (ODPs). ODPs have been lately the standard solution for governments willing to put their public data available for the…

Information Retrieval · Computer Science 2015-10-16 Alan Tygel , Sören Auer , Jeremy Debattista , Fabrizio Orlandi , Maria Luiza Machado Campos

Data Conflict Resolution Using Trust Mappings

In massively collaborative projects such as scientific or community databases, users often need to agree or disagree on the content of individual data items. On the other hand, trust relationships often exist between users, allowing them to…

Databases · Computer Science 2015-03-17 Wolfgang Gatterbauer , Dan Suciu

Propositionalization and Embeddings: Two Sides of the Same Coin

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper…

Machine Learning · Computer Science 2020-10-30 Nada Lavrač , Blaž Škrlj , Marko Robnik-Šikonja

Combining Heterogeneous Classifiers for Relational Databases

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat'…

Machine Learning · Computer Science 2012-03-14 Geetha Manjunatha , M Narasimha Murty , Dinkar Sitaram