Related papers: Beyond Transformers for Function Learning
Understanding how agents learn to generalize -- and, in particular, to extrapolate -- in high-dimensional, naturalistic environments remains a challenge for both machine learning and the study of biological agents. One approach to this has…
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own…
Despite incredible progress, many neural architectures fail to properly generalize beyond their training distribution. As such, learning to reason in a correct and generalizable way is one of the current fundamental challenges in machine…
A fundamental feature of human intelligence is that we accumulate and transfer knowledge as a society and across generations. We describe here a network architecture for the human brain that may support this feature and suggest that two key…
Artificial intelligence algorithms are capable of fantastic exploits, yet they are still grossly inefficient compared with the brain's ability to learn from few exemplars or solve problems that have not been explicitly defined. What is the…
Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire…
Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to…
Borrowing from the transformer models that revolutionized the field of natural language processing, self-supervised feature learning for visual tasks has also seen state-of-the-art success using these extremely deep, isotropic networks.…
Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions.…
This paper studies interpretable and fair artificial intelligence architectures for understanding English reading. Introduced transformer-based models, integrating advanced attention mechanisms and gradient-based feature attribution. The…
In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which…
Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from…
Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data. However, they fare poorly when evaluated on out-of-support test…
People use rich prior knowledge about the world in order to efficiently learn new concepts. These priors - also known as "inductive biases" - pertain to the space of internal models considered by a learner, and they help the learner make…
In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well…
Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but…
An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is…
While current deep learning systems excel at tasks such as object classification, language processing, and gameplay, few can construct or modify a complex system such as a tower of blocks. We hypothesize that what these systems lack is a…
The bias/variance tradeoff is fundamental to learning: increasing a model's complexity can improve its fit on training data, but potentially worsens performance on future samples. Remarkably, however, the human brain effortlessly handles a…
The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and…