Related papers: Semi-Structured Object Sequence Encoders

Revisiting Sequence-to-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Fatemeh Azimi , Benjamin Bischke , Sebastian Palacio , Federico Raue , Joern Hees , Andreas Dengel

Capturing Temporal Components for Time Series Classification

Analyzing sequential data is crucial in many domains, particularly due to the abundance of data collected from the Internet of Things paradigm. Time series classification, the task of categorizing sequential data, has gained prominence,…

Machine Learning · Computer Science 2024-06-21 Venkata Ragavendra Vavilthota , Ranjith Ramanathan , Sathyanarayanan N. Aakur

Interpreting the structure of multi-object representations in vision encoders

In this work, we interpret the representations of multi-object scenes in vision encoders through the lens of structured representations. Structured representations allow modeling of individual objects distinctly and their flexible use based…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Tarun Khajuria , Braian Olmiro Dias , Marharyta Domnich , Jaan Aru

Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts

Understanding sequential information is a fundamental task for artificial intelligence. Current neural networks attempt to learn spatial and temporal information as a whole, limited their abilities to represent large scale spatial…

Computer Vision and Pattern Recognition · Computer Science 2020-06-02 Bo Pang , Kaiwen Zha , Hanwen Cao , Jiajun Tang , Minghui Yu , Cewu Lu

Joint Inductive and Transductive Learning for Video Object Segmentation

Semi-supervised video object segmentation is a task of segmenting the target object in a video sequence given only a mask annotation in the first frame. The limited information available makes it an extremely challenging task. Most previous…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Yunyao Mao , Ning Wang , Wengang Zhou , Houqiang Li

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner

Modelling Sentence Pairs with Tree-structured Attentive Encoder

We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we…

Computation and Language · Computer Science 2016-10-11 Yao Zhou , Cong Liu , Yan Pan

In-Context Compositional Learning via Sparse Coding Transformer

Transformer architectures have achieved remarkable success across language, vision, and multimodal tasks, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target…

Machine Learning · Computer Science 2025-11-26 Wei Chen , Jingxi Yu , Zichen Miao , Qiang Qiu

Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction

Labeled sequence transduction is a task of transforming one sequence into another sequence that satisfies desiderata specified by a set of labels. In this paper we propose multi-space variational encoder-decoders, a new model for labeled…

Computation and Language · Computer Science 2019-10-08 Chunting Zhou , Graham Neubig

Flow-guided Semi-supervised Video Object Segmentation

We propose an optical flow-guided approach for semi-supervised video object segmentation. Optical flow is usually exploited as additional guidance information in unsupervised video object segmentation. However, its relevance in…

Computer Vision and Pattern Recognition · Computer Science 2023-01-26 Yushan Zhang , Andreas Robinson , Maria Magnusson , Michael Felsberg

Contextually Structured Token Dependency Encoding for Large Language Models

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the…

Computation and Language · Computer Science 2018-07-24 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Long Short-Term Memory-Networks for Machine Reading

In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning…

Computation and Language · Computer Science 2016-09-22 Jianpeng Cheng , Li Dong , Mirella Lapata

Encoding-based Memory Modules for Recurrent Neural Networks

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design…

Machine Learning · Computer Science 2020-02-03 Antonio Carta , Alessandro Sperduti , Davide Bacciu

Self-supervised Object-Centric Learning for Videos

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Görkay Aydemir , Weidi Xie , Fatma Güney

Clustering and Recognition of Spatiotemporal Features through Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks

Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for…

Machine Learning · Computer Science 2020-02-03 Kun Su , Eli Shlizerman

Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

Robotics · Computer Science 2025-05-28 Nikos Giannakakis , Argyris Manetas , Panagiotis P. Filntisis , Petros Maragos , George Retsinas

FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout…

Computation and Language · Computer Science 2022-03-25 Chen-Yu Lee , Chun-Liang Li , Timothy Dozat , Vincent Perot , Guolong Su , Nan Hua , Joshua Ainslie , Renshen Wang , Yasuhisa Fujii , Tomas Pfister

Self-supervised structured object representation learning

Self-supervised learning (SSL) has emerged as a powerful technique for learning visual representations. While recent SSL approaches achieve strong results in global image understanding, they are limited in capturing the structured…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Oussama Hadjerci , Antoine Letienne , Mohamed Abbas Hedjazi , Adel Hafiane

Embeddings and Representation Learning for Structured Data

Performing machine learning on structured data is complicated by the fact that such data does not have vectorial form. Therefore, multiple approaches have emerged to construct vectorial representations of structured data, from kernel and…

Machine Learning · Computer Science 2019-05-16 Benjamin Paaßen , Claudio Gallicchio , Alessio Micheli , Alessandro Sperduti