Related papers: Multi-Granularity Modularized Network for Abstract…
While achieving unmatched performance on many well-defined tasks, deep learning models have also been used to solve visual abstract reasoning tasks, which are relatively less well-defined, and have been widely used to measure human…
Machine learning models have achieved significant milestones in various domains, for example, computer vision models have an exceptional result in object recognition, and in natural language processing, where Large Language Models (LLM)…
A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of…
Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce…
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and…
A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated by the…
Multimodal Mathematical Reasoning (MMR) has recently attracted increasing attention for its capability to solve mathematical problems involving both textual and visual modalities. However, current models still face significant challenges in…
Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and…
Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven Progressive Matrices (RPM), the…
Psychologists recognize Raven's Progressive Matrices as a very effective test of general human intelligence. While many computational models have been developed by the AI community to investigate different forms of top-down, deliberative…
Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the…
A hallmark of human intelligence is the ability to infer abstract rules from limited experience and apply these rules to unfamiliar situations. This capacity is widely studied in the visual domain using the Raven's Progressive Matrices.…
Neural Module Network (NMN) exhibits strong interpretability and compositionality thanks to its handcrafted neural modules with explicit multi-hop reasoning capability. However, most NMNs suffer from two critical drawbacks: 1) scalability:…
We propose Modal Logical Neural Networks (MLNNs), a neurosymbolic framework that integrates deep learning with the formal semantics of modal logic, enabling reasoning about necessity and possibility. Drawing on Kripke semantics, we…
Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine…
One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is…
Raven's Progressive Matrices (RPM) is highly correlated with human intelligence, and it has been widely used to measure the abstract reasoning ability of humans. In this paper, to study the abstract reasoning capability of deep neural…
In order to achieve a general visual question answering (VQA) system, it is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. Meanwhile, the reasoning process should be…
The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system. Although very recent works also tried the explicit compositional…
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g.,…