Related papers: Rethinking Encoder-Decoder Flow Through Shared Str…
Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer. We investigate the effects of allowing the encoder to preserve and explore alternative hypotheses, combined at the end of the…
Multi-task learning of dense prediction tasks, by sharing both the encoder and decoder, as opposed to sharing only the encoder, provides an attractive front to increase both accuracy and computational efficiency. When the tasks are similar,…
In this paper, we present a novel deep learning architecture for infrared and visible images fusion problem. In contrast to conventional convolutional networks, our encoding network is combined by convolutional layers, fusion layer and…
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers. To address these limitations, we propose a new architecture based on a decoder…
The Encoder-Decoder architecture is a main stream deep learning model for biomedical image segmentation. The encoder fully compresses the input and generates encoded features, and the decoder then produces dense predictions using encoded…
Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting…
Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network…
In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional…
Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the…
In computer vision and image processing tasks, image fusion has evolved into an attractive research field. However, recent existing image fusion methods are mostly built on pixel-level operations, which may produce unacceptable artifacts…
The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an…
Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However,…
We present a new versatile building block for deep point cloud processing architectures that is equally suited for diverse tasks. This building block combines the ideas of spatial transformers and multi-view convolutional networks with the…
Learning deeper models is usually a simple and effective approach to improve model performance, but deeper models have larger model parameters and are more difficult to train. To get a deeper model, simply stacking more layers of the model…
There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During…
The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard…
Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here…
We present an architecture that is effective for continual learning in an especially demanding setting, where task boundaries do not exist or are unknown, and where classes have to be learned online (with each example presented only once).…
Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in…
Surface crack segmentation poses a challenging computer vision task as background, shape, colour and size of cracks vary. In this work we propose optimized deep encoder-decoder methods consisting of a combination of techniques which yield…