Related papers: Multiple Random Masking Autoencoder Ensembles for …

Multi-View Masked World Models for Visual Robotic Manipulation

Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good…

Robotics · Computer Science 2023-06-01 Younggyo Seo , Junsu Kim , Stephen James , Kimin Lee , Jinwoo Shin , Pieter Abbeel

Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Models

Machine learning (ML) is a revolutionary technology with demonstrable applications across multiple disciplines. Within the Earth science community, ML has been most visible for weather forecasting, producing forecasts that rival modern…

Machine Learning · Computer Science 2025-01-08 Paul A. Ullrich , Elizabeth A. Barnes , William D. Collins , Katherine Dagon , Shiheng Duan , Joshua Elms , Jiwoo Lee , L. Ruby Leung , Dan Lu , Maria J. Molina , Travis A. O'Brien , Finn O. Rebassoo

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the…

Computation and Language · Computer Science 2022-01-10 Panagiotis Koromilas , Theodoros Giannakopoulos

Multimodal learning-based inversion models for the space-time reconstruction of satellite-derived geophysical fields

For numerous earth observation applications, one may benefit from various satellite sensors to address the reconstruction of some process or information of interest. A variety of satellite sensors deliver observation data with different…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Ronan Fablet , Bertrand Chapron

Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration

Multi-modal co-learning is emerging as an effective paradigm in machine learning, enabling models to collaboratively learn from different modalities to enhance single-modality predictions. Earth Observation (EO) represents a quintessential…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Francisco Mena , Dino Ienco , Cassio F. Dantas , Roberto Interdonato , Andreas Dengel

Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation

Semantic segmentation of satellite imagery is crucial for Earth observation applications, but remains constrained by limited labelled training data. While self-supervised pretraining methods like Masked Autoencoders (MAE) have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-07-17 John Waithaka , Moise Busogi

Training Multimodal Systems for Classification with Multiple Objectives

We learn about the world from a diverse range of sensory information. Automated systems lack this ability as investigation has centred on processing information presented in a single form. Adapting architectures to learn from multiple…

Machine Learning · Computer Science 2020-10-27 Jason Armitage , Shramana Thakur , Rishi Tripathi , Jens Lehmann , Maria Maleshkova

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

NeighborMAE: Exploiting Spatial Dependencies between Neighboring Earth Observation Images in Masked Autoencoders Pretraining

Masked Image Modeling has been one of the most popular self-supervised learning paradigms to learn representations from large-scale, unlabeled Earth Observation images. While incorporating multi-modal and multi-temporal Earth Observation…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Liang Zeng , Valerio Marsocci , Wufan Zhao , Andrea Nascetti , Maarten Vergauwen

When Neural Networks Using Different Sensors Create Similar Features

Multimodal problems are omnipresent in the real world: autonomous driving, robotic grasping, scene understanding, etc... We draw from the well-developed analysis of similarity to provide an example of a problem where neural networks are…

Machine Learning · Computer Science 2021-11-05 Hugues Moreau , Andréa Vassilev , Liming Chen

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Antoine Labatie , Michael Vaccaro , Nina Lardiere , Anatol Garioud , Nicolas Gonthier

Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data

Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are…

Machine Learning · Computer Science 2020-01-07 Guruprasad Nayak , Rahul Ghosh , Xiaowei Jia , Varun Mithal , Vipin Kumar

Multimodal Machine Learning: A Survey and Taxonomy

Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as…

Machine Learning · Computer Science 2017-08-02 Tadas Baltrušaitis , Chaitanya Ahuja , Louis-Philippe Morency

Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

There are many ways of interpreting the world and they are highly interdependent. We exploit such complex dependencies and introduce a powerful multi-task hypergraph, in which every node is a task and different paths through the hypergraph…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Mihai Pirvu , Alina Marcu , Alexandra Dobrescu , Nabil Belbachir , Marius Leordeanu

Continual Learning of Visual Concepts for Robots through Limited Supervision

For many real-world robotics applications, robots need to continually adapt and learn new concepts. Further, robots need to learn through limited data because of scarcity of labeled data in the real-world environments. To this end, my…

Robotics · Computer Science 2021-01-27 Ali Ayub , Alan R. Wagner

Interpretable Climate Change Modeling With Progressive Cascade Networks

Typical deep learning approaches to modeling high-dimensional data often result in complex models that do not easily reveal a new understanding of the data. Research in the deep learning field is very actively pursuing new methods to…

Machine Learning · Computer Science 2022-05-16 Charles Anderson , Jason Stock , David Anderson

Layer-Wise Multi-View Learning for Neural Machine Translation

Traditional neural machine translation is limited to the topmost encoder layer's context representation and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, making…

Computation and Language · Computer Science 2020-11-04 Qiang Wang , Changliang Li , Yue Zhang , Tong Xiao , Jingbo Zhu

Category-Learning with Context-Augmented Autoencoder

Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. Biological neural networks are known to solve this problem quite well in unsupervised manner, yet unsupervised…

Machine Learning · Computer Science 2020-10-13 Denis Kuzminykh , Laida Kushnareva , Timofey Grigoryev , Alexander Zatolokin

Fusing Climate Data Products using a Spatially Varying Autoencoder

Autoencoders are powerful machine learning models used to compress information from multiple data sources. However, autoencoders, like all artificial neural networks, are often unidentifiable and uninterpretable. This research focuses on…

Applications · Statistics 2024-03-13 Jacob A. Johnson , Matthew J. Heaton , William F. Christensen , Lynsie R. Warr , Summer B. Rupper

Common Practices and Taxonomy in Deep Multi-view Fusion for Remote Sensing Applications

The advances in remote sensing technologies have boosted applications for Earth observation. These technologies provide multiple observations or views with different levels of information. They might contain static or temporary views with…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Francisco Mena , Diego Arenas , Marlon Nuske , Andreas Dengel