English
Related papers

Related papers: OpenForge: Probabilistic Metadata Integration

200 papers

Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical…

Machine Learning · Computer Science 2025-07-29 Ziyi Liang , Annie Qu , Babak Shahbaba

Inference metaprogramming enables effective probabilistic programming by supporting the decomposition of executions of probabilistic programs into subproblems and the deployment of hybrid probabilistic inference algorithms that apply…

Programming Languages · Computer Science 2019-07-16 Shivam Handa , Vikash Mansinghka , Martin Rinard

The Big Data landscape poses challenges in managing diverse data formats, requiring efficient storage and processing for high-quality analysis. Effective metadata management is crucial for organizing, accessing, and reusing data within…

Databases · Computer Science 2025-03-21 Claudia Diamantini , Alessandro Mele , Domenico Potena , Cristina Rossetti , Emanuele Storti

Integration of multimodal information from various sources has been shown to boost the performance of machine learning models and thus has received increased attention in recent years. Often such models use deep modality-specific networks…

Machine Learning · Computer Science 2022-11-22 Shiv Shankar , Laure Thompson , Madalina Fiterau

Retrieval-Augmented Generation systems depend on retrieving semantically relevant document chunks to support accurate, grounded outputs from large language models. In structured and repetitive corpora such as regulatory filings, chunk…

Information Retrieval · Computer Science 2026-01-21 Raquib Bin Yousuf , Shengzhe Xu , Mandar Sharma , Andrew Neeser , Chris Latimer , Naren Ramakrishnan

Probabilistic Face Embeddings (PFE) can improve face recognition performance in unconstrained scenarios by integrating data uncertainty into the feature representation. However, existing PFE methods tend to be over-confident in estimating…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Kai Chen , Qi Lv , Taihe Yi

This paper introduces \textit{Federated Retrieval-Augmented Generation (FRAG)}, a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems, which are increasingly powered by…

Cryptography and Security · Computer Science 2024-10-18 Dongfang Zhao

Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval effectiveness over that of the…

Information Retrieval · Computer Science 2014-10-01 David Lillis , Fergus Toolan , Rem Collier , John Dunnion

Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features…

Artificial Intelligence · Computer Science 2023-10-17 Shuwen Yang , Anran Wu , Xingjiao Wu , Luwei Xiao , Tianlong Ma , Cheng Jin , Liang He

Foundational models (FMs) have made significant strides in the healthcare domain. Yet the data silo challenge and privacy concern remain in healthcare systems, hindering safe medical data sharing and collaborative model development among…

Machine Learning · Computer Science 2025-02-27 Zheling Tan , Kexin Ding , Jin Gao , Mu Zhou , Dimitris Metaxas , Shaoting Zhang , Dequan Wang

The complexity of financial data, characterized by its variability and low signal-to-noise ratio, necessitates advanced methods in quantitative investment that prioritize both performance and interpretability.Transitioning from early manual…

Computational Finance · Quantitative Finance 2024-12-13 Hao Shi , Weili Song , Xinting Zhang , Jiahe Shi , Cuicui Luo , Xiang Ao , Hamid Arian , Luis Seco

When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly…

Metamorphic testing (MT) is widely used for testing programs that face the oracle problem. It uses a set of metamorphic relations (MRs), which are relations among multiple inputs and their corresponding outputs to determine whether the…

Software Engineering · Computer Science 2021-09-22 Madhusudan Srinivasan , Upulee Kanewala

Scientific applications produce vast amounts of data, posing grand challenges in the underlying data management and analytic tasks. Progressive compression is a promising way to address this problem, as it allows for on-demand data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-02 Yanliang Li , Wenbo Li , Qian Gong , Qing Liu , Norbert Podhorszki , Scott Klasky , Xin Liang , Jieyang Chen

Developing the capacity to effectively search for requisite datasets is an urgent requirement to assist data users in identifying relevant datasets considering the very limited available metadata. For this challenge, the utilization of…

Information Retrieval · Computer Science 2024-10-08 Teruaki Hayashi , Hiroki Sakaji , Jiayi Dai , Randy Goebel

Metamorphic Testing (MT) addresses the test oracle problem by examining the relations between inputs and outputs of test executions. Such relations are known as Metamorphic Relations (MRs). In current practice, identifying and selecting…

Software Engineering · Computer Science 2022-07-28 Alejandra Duque-Torres , Dietmar Pfahl , Rudolf Ramler , Claus Klammer

This paper presents an approach for metadata reconciliation, curation and linking for Open Governamental Data Portals (ODPs). ODPs have been lately the standard solution for governments willing to put their public data available for the…

Information Retrieval · Computer Science 2015-10-16 Alan Tygel , Sören Auer , Jeremy Debattista , Fabrizio Orlandi , Maria Luiza Machado Campos

In massively collaborative projects such as scientific or community databases, users often need to agree or disagree on the content of individual data items. On the other hand, trust relationships often exist between users, allowing them to…

Databases · Computer Science 2015-03-17 Wolfgang Gatterbauer , Dan Suciu

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper…

Machine Learning · Computer Science 2020-10-30 Nada Lavrač , Blaž Škrlj , Marko Robnik-Šikonja

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat'…

Machine Learning · Computer Science 2012-03-14 Geetha Manjunatha , M Narasimha Murty , Dinkar Sitaram
‹ Prev 1 2 3 10 Next ›