Related papers: Multi-Modality Spatio-Temporal Forecasting via Sel…

Must: Maximizing Latent Capacity of Spatial Transcriptomics Data

Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue…

Computational Engineering, Finance, and Science · Computer Science 2024-01-17 Zelin Zang , Liangyu Li , Yongjie Xu , Chenrui Duan , Kai Wang , Yang You , Yi Sun , Stan Z. Li

Towards Effective Fusion and Forecasting of Multimodal Spatio-temporal Data for Smart Mobility

With the rapid development of location based services, multimodal spatio-temporal (ST) data including trajectories, transportation modes, traffic flow and social check-ins are being collected for deep learning based methods. These deep…

Machine Learning · Computer Science 2024-07-24 Chenxing Wang

Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction

Robust prediction of citywide traffic flows at different time periods plays a crucial role in intelligent transportation systems. While previous work has made great efforts to model spatio-temporal correlations, existing methods still…

Machine Learning · Computer Science 2024-03-07 Jiahao Ji , Jingyuan Wang , Chao Huang , Junjie Wu , Boren Xu , Zhenhe Wu , Junbo Zhang , Yu Zheng

Multi-modal Spatio-Temporal Transformer for High-resolution Land Subsidence Prediction

Forecasting high-resolution land subsidence is a critical yet challenging task due to its complex, non-linear dynamics. While standard architectures like ConvLSTM often fail to model long-range dependencies, we argue that a more fundamental…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Wendong Yao , Binhua Huang , Soumyabrata Dev

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

Medical vision-language pre-training methods mainly leverage the correspondence between paired medical images and radiological reports. Although multi-view spatial images and temporal sequences of image-report pairs are available in…

Artificial Intelligence · Computer Science 2024-05-31 Jinxia Yang , Bing Su , Wayne Xin Zhao , Ji-Rong Wen

Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation

The precise prediction of human mobility has produced significant socioeconomic impacts, such as location recommendations and evacuation suggestions. However, existing methods suffer from limited generalization capability: unimodal…

Artificial Intelligence · Computer Science 2025-12-30 Junshu Dai , Yu Wang , Tongya Zheng , Wei Ji , Qinghong Guo , Ji Cao , Jie Song , Canghong Jin , Mingli Song

Multi-Modal Self-Supervised Learning for Recommendation

The online emergence of multi-modal sharing platforms (eg, TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (eg, visual, textual and acoustic) into the latent user representations. While…

Information Retrieval · Computer Science 2023-07-19 Wei Wei , Chao Huang , Lianghao Xia , Chuxu Zhang

AutoSTL: Automated Spatio-Temporal Multi-Task Learning

Spatio-Temporal prediction plays a critical role in smart city construction. Jointly modeling multiple spatio-temporal tasks can further promote an intelligent city life by integrating their inseparable relationship. However, existing…

Machine Learning · Computer Science 2023-04-20 Zijian Zhang , Xiangyu Zhao , Hao Miao , Chunxu Zhang , Hongwei Zhao , Junbo Zhang

MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts

We present MoST (Mixture of Speech and Text), a novel multimodal large language model that seamlessly integrates speech and text processing through our proposed Modality-Aware Mixture of Experts (MAMoE) architecture. While current…

Computation and Language · Computer Science 2026-01-16 Yuxuan Lou , Kai Yang , Yang You

Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning

Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the…

Machine Learning · Computer Science 2026-03-02 Kohei Obata , Taichi Murayama , Zheng Chen , Yasuko Matsubara , Yasushi Sakurai

Event-Aware Multimodal Mobility Nowcasting

As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from…

Machine Learning · Computer Science 2021-12-17 Zhaonan Wang , Renhe Jiang , Hao Xue , Flora D. Salim , Xuan Song , Ryosuke Shibasaki

Bootstrap Motion Forecasting With Self-Consistent Constraints

We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints (MISC). The motion forecasting task aims at predicting future trajectories of vehicles by incorporating spatial and temporal information from the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Maosheng Ye , Jiamiao Xu , Xunnong Xu , Tengfei Wang , Tongyi Cao , Qifeng Chen

Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While…

Machine Learning · Computer Science 2024-09-17 Du Yin , Jinliang Deng , Shuang Ao , Zechen Li , Hao Xue , Arian Prabowo , Renhe Jiang , Xuan Song , Flora Salim

How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning

Spatio-temporal forecasting is essential for real-world applications such as traffic management and urban computing. Although recent methods have shown improved accuracy, they often fail to account for dynamic deviations between current…

Machine Learning · Computer Science 2025-10-07 Haotian Gao , Zheng Dong , Jiawei Yong , Shintaro Fukushima , Kenjiro Taura , Renhe Jiang

MoTime: A Dataset Suite for Multimodal Time Series Forecasting

While multimodal data sources are increasingly available from real-world forecasting, most existing research remains on unimodal time series. In this work, we present MoTime, a suite of multimodal time series forecasting datasets that pair…

Machine Learning · Computer Science 2025-06-02 Xin Zhou , Weiqing Wang , Francisco J. Baldán , Wray Buntine , Christoph Bergmeir

Learning Sequential Latent Variable Models from Multimodal Time Series Data

Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data…

Machine Learning · Computer Science 2023-01-23 Oliver Limoyo , Trevor Ablett , Jonathan Kelly

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-19 Cheng Tan , Siyuan Li , Zhangyang Gao , Wenfei Guan , Zedong Wang , Zicheng Liu , Lirong Wu , Stan Z. Li

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic…

Machine Learning · Computer Science 2024-06-19 Jiaqi Lin , Qianqian Ren

Exploring High-Order Self-Similarity for Video Understanding

Space-time self-similarity (STSS), which captures visual correspondences across frames, provides an effective way to represent temporal dynamics for video understanding. In this work, we explore higher-order STSS and demonstrate how STSSs…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Manjin Kim , Heeseung Kwon , Karteek Alahari , Minsu Cho

GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks

In recent years, there has been a rapid development of spatio-temporal prediction techniques in response to the increasing demands of traffic management and travel planning. While advanced end-to-end models have achieved notable success in…

Machine Learning · Computer Science 2023-11-09 Zhonghang Li , Lianghao Xia , Yong Xu , Chao Huang