Related papers: Multi-View Broad Learning System for Primate Oculo…

End-to-End Multi-View Lipreading

Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading. However, the vast majority of recent lipreading works, including the deep learning approaches which significantly…

Computer Vision and Pattern Recognition · Computer Science 2017-09-05 Stavros Petridis , Yujiang Wang , Zuwei Li , Maja Pantic

Using economic value signals from primate prefrontal cortex in neuro-engineering applications

Neural signals related to movement can be measured from intracranial recordings and used in brain-machine interface devices (BMI) to restore physical function in impaired patients. In this study, we explore the use of more abstract neural…

Neurons and Cognition · Quantitative Biology 2025-02-18 Tevin C. Rouse , Shira M. Lupkin , Vincent B. McGinty

Monkey Perceptogram: Reconstructing Visual Representation and Presumptive Neural Preference from Monkey Multi-electrode Arrays

Understanding how the primate brain transforms complex visual scenes into coherent perceptual experiences remains a central challenge in neuroscience. Here, we present a comprehensive framework for interpreting monkey visual processing by…

Neurons and Cognition · Quantitative Biology 2025-10-10 Teng Fei , Srinivas Ravishankar , Hoko Nakada , Abhinav Uppal , Ian Jackson , Garrison W. Cottrell , Ryusuke Hayashi , Virginia R. de Sa

Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding

Multimodal large language models (MLLMs) have achieved remarkable progress on various vision-language tasks, yet their visual perception remains limited. Humans, in comparison, perceive complex scenes efficiently by dynamically scanning and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Yuchen Feng , Zhenyu Zhang , Naibin Gu , Yilong Chen , Peng Fu , Zheng Lin , Shuohuan Wang , Yu Sun , Hua Wu , Weiping Wang , Haifeng Wang

Binocular Mutual Learning for Improving Few-shot Classification

Most of the few-shot learning methods learn to transfer knowledge from datasets with abundant labeled data (i.e., the base set). From the perspective of class space on base set, existing methods either focus on utilizing all classes under a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-30 Ziqi Zhou , Xi Qiu , Jiangtao Xie , Jianan Wu , Chi Zhang

Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features

Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Changde Du , Kaicheng Fu , Jinpeng Li , Huiguang He

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Thomas Monninger , Shaoyuan Xie , Qi Alfred Chen , Sihao Ding

Aligning brain functions boosts the decoding of visual semantics in novel subjects

Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-subject variability in brain characteristics has limited most studies to train models on…

Machine Learning · Computer Science 2023-12-12 Alexis Thual , Yohann Benchetrit , Felix Geilert , Jérémy Rapin , Iurii Makarov , Hubert Banville , Jean-Rémi King

Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals

Understanding how neural activity gives rise to perception is a central challenge in neuroscience. We address the problem of decoding visual information from high-density intracortical recordings in primates, using the THINGS Ventral Stream…

Neurons and Cognition · Quantitative Biology 2026-01-19 Matteo Ciferri , Matteo Ferrante , Nicola Toschi

Dual Thinking and Logical Processing -- Are Multi-modal Large Language Models Closing the Gap with Human Vision ?

The dual thinking framework considers fast, intuitive, and slower logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ, and the latter is under-explored…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Kailas Dayanandan , Nikhil Kumar , Anand Sinha , Brejesh Lall

A Survey on Multi-view Learning

In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. In trying to organize…

Machine Learning · Computer Science 2013-04-23 Chang Xu , Dacheng Tao , Chao Xu

Efficient Multimodal Learning from Data-centric Perspective

Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks. However, their deployment is hindered by substantial computational costs in both training and inference,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Muyang He , Yexin Liu , Boya Wu , Jianhao Yuan , Yueze Wang , Tiejun Huang , Bo Zhao

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks. However,…

Computation and Language · Computer Science 2026-01-13 Ziyue Wang , Chi Chen , Yiqi Zhu , Fuwen Luo , Peng Li , Ming Yan , Ji Zhang , Fei Huang , Maosong Sun , Yang Liu

Embedded Deep Bilinear Interactive Information and Selective Fusion for Multi-view Learning

As a concrete application of multi-view learning, multi-view classification improves the traditional classification methods significantly by integrating various views optimally. Although most of the previous efforts have been demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Jinglin Xu , Wenbin Li , Jiantao Shen , Xinwang Liu , Peicheng Zhou , Xiangsen Zhang , Xiwen Yao , Junwei Han

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this…

Computation and Language · Computer Science 2026-05-20 Juncheng Wu , Hardy Chen , Haoqin Tu , Xianfeng Tang , Freda Shi , Hui Liu , Hanqing Lu , Cihang Xie , Yuyin Zhou

Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning

In biomedical research, many different types of patient data can be collected, such as various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the…

Machine Learning · Statistics 2020-05-13 Wouter van Loon , Marjolein Fokkema , Botond Szabo , Mark de Rooij

BMIP: Bi-directional Modality Interaction Prompt Learning for VLM

Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks. However, existing…

Machine Learning · Computer Science 2025-01-15 Song-Lin Lv , Yu-Yang Chen , Zhi Zhou , Ming Yang , Lan-Zhe Guo

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance. Current MLLMs typically singularly focus on inputs at a…

Computer Vision and Pattern Recognition · Computer Science 2024-02-23 Yuhang Cao , Pan Zhang , Xiaoyi Dong , Dahua Lin , Jiaqi Wang

Unified Multimodal Understanding via Byte-Pair Visual Encoding

Multimodal large language models (MLLMs) have made significant progress in vision-language understanding, yet effectively aligning different modalities remains a fundamental challenge. We present a framework that unifies multimodal…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Wanpeng Zhang , Yicheng Feng , Hao Luo , Yijiang Li , Zihao Yue , Sipeng Zheng , Zongqing Lu

Guided Co-training for Large-Scale Multi-View Spectral Clustering

In many real-world applications, we have access to multiple views of the data, each of which characterizes the data from a distinct aspect. Several previous algorithms have demonstrated that one can achieve better clustering accuracy by…

Computer Vision and Pattern Recognition · Computer Science 2017-08-01 Tyng-Luh Liu