English
Related papers

Related papers: Multimodal Neural Databases

200 papers

In this paper, we propose Multi-Modal Databases (MMDBs), which is a new class of database systems that can seamlessly query text and tables using SQL. To enable seamless querying of textual data using SQL in an MMDB, we propose to extend…

Databases · Computer Science 2023-05-01 Matthias Urban , Carsten Binnig

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.…

Machine Learning · Computer Science 2022-02-21 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Jabbar Abdul

The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to…

Artificial Intelligence · Computer Science 2023-11-23 Jiayang Wu , Wensheng Gan , Zefeng Chen , Shicheng Wan , Philip S. Yu

Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to…

Computer Vision and Pattern Recognition · Computer Science 2020-10-13 Zhou Yu , Yuhao Cui , Jun Yu , Meng Wang , Dacheng Tao , Qi Tian

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

The continually increasing number of complex datasets each year necessitates ever improving machine learning methods for robust and accurate categorization of these data. This paper introduces Random Multimodel Deep Learning (RMDL): a new…

Machine Learning · Computer Science 2018-06-01 Kamran Kowsari , Mojtaba Heidarysafa , Donald E. Brown , Kiana Jafari Meimandi , Laura E. Barnes

Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations pose unique challenges for machine understanding,…

Artificial Intelligence · Computer Science 2026-02-10 Zhuoyan Xu , Haoyang Fang , Boran Han , Bonan Min , Bernie Wang , Cuixiong Hu , Shuai Zhang

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Songyuan Li , Jabbar Abdul

Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems by integrating and analyzing diverse types of data, including text, images, audio, and video. Inspired by the…

Artificial Intelligence · Computer Science 2024-12-24 Priyaranjan Pattnayak , Hitesh Laxmichand Patel , Bhargava Kumar , Amit Agarwal , Ishan Banerjee , Srikant Panda , Tejaswini Kumar

The recent advancements in generative language models have demonstrated their ability to memorize knowledge from documents and recall knowledge to respond to user queries effectively. Building upon this capability, we propose to enable…

Multimedia · Computer Science 2024-02-19 Yongqi Li , Wenjie Wang , Leigang Qu , Liqiang Nie , Wenjie Li , Tat-Seng Chua

Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Kartik Narayan , Yang Xu , Tian Cao , Kavya Nerella , Vishal M. Patel , Navid Shiee , Peter Grasch , Chao Jia , Yinfei Yang , Zhe Gan

State-of-the-art retrieval models typically address a straightforward search scenario, in which retrieval tasks are fixed (e.g., finding a passage to answer a specific question) and only a single modality is supported for both queries and…

Computation and Language · Computer Science 2025-02-25 Sheng-Chieh Lin , Chankyu Lee , Mohammad Shoeybi , Jimmy Lin , Bryan Catanzaro , Wei Ping

Universal multimodal embedding models have achieved great success in capturing semantic relevance between queries and candidates. However, current methods either condense queries and candidates into a single vector, potentially limiting the…

Information Retrieval · Computer Science 2026-04-08 Zilin Xiao , Qi Ma , Mengting Gu , Chun-cheng Jason Chen , Xintao Chen , Vicente Ordonez , Vijai Mohan

Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Avinash Madasu , Estelle Aflalo , Gabriela Ben Melech Stan , Shachar Rosenman , Shao-Yen Tseng , Gedas Bertasius , Vasudev Lal

Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Lei Chen , Feng Yan , Yujie Zhong , Shaoxiang Chen , Zequn Jie , Lin Ma

In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data…

Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting their effectiveness in multilingual…

Information Retrieval · Computer Science 2025-12-04 Adithya S Kolavi , Vyoman Jain

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of…

Neural and Evolutionary Computing · Computer Science 2016-03-07 Caiming Xiong , Stephen Merity , Richard Socher

Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text…

Computation and Language · Computer Science 2024-06-13 Mingyu Zheng , Xinwei Feng , Qingyi Si , Qiaoqiao She , Zheng Lin , Wenbin Jiang , Weiping Wang

Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual…

‹ Prev 1 2 3 10 Next ›