Related papers: A Neural Multi-sequence Alignment TeCHnique (NeuMA…

Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

The task of video and text sequence alignment is a prerequisite step toward joint understanding of movie videos and screenplays. However, supervised methods face the obstacle of limited realistic training data. With this paper, we attempt…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Jianan Wang , Boyang Li , Xiangyu Fan , Jing Lin , Yanwei Fu

Video Summarization with Long Short-term Memory

We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term…

Computer Vision and Pattern Recognition · Computer Science 2016-08-01 Ke Zhang , Wei-Lun Chao , Fei Sha , Kristen Grauman

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks…

Computation and Language · Computer Science 2015-12-18 Hongyuan Mei , Mohit Bansal , Matthew R. Walter

Temporal Alignment Networks for Long-term Video

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then…

Computer Vision and Pattern Recognition · Computer Science 2022-04-07 Tengda Han , Weidi Xie , Andrew Zisserman

Sequence to Sequence Learning with Neural Networks

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to…

Computation and Language · Computer Science 2014-12-16 Ilya Sutskever , Oriol Vinyals , Quoc V. Le

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback. The proposed FSMN is a standard…

Neural and Evolutionary Computing · Computer Science 2016-01-06 Shiliang Zhang , Cong Liu , Hui Jiang , Si Wei , Lirong Dai , Yu Hu

Set Functions for Time Series

Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that commonly occur in real-world datasets, especially in healthcare applications. This…

Machine Learning · Computer Science 2020-09-16 Max Horn , Michael Moor , Christian Bock , Bastian Rieck , Karsten Borgwardt

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Yuxiao Chen , Kai Li , Wentao Bao , Deep Patel , Yu Kong , Martin Renqiang Min , Dimitris N. Metaxas

Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility

Traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) units operate on discrete time steps, often failing to capture the fluid temporal dynamics of real-world physical processes. Liquid Neural Networks (LNNs),…

Machine Learning · Computer Science 2026-05-28 Ye Kyaw Thu , Thazin Myint Oo , Thepchai Supnithi

Associative Recurrent Memory Transformer

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is…

Computation and Language · Computer Science 2025-02-17 Ivan Rodkin , Yuri Kuratov , Aydar Bulatov , Mikhail Burtsev

Introducing the Hidden Neural Markov Chain framework

Nowadays, neural network models achieve state-of-the-art results in many areas as computer vision or speech processing. For sequential data, especially for Natural Language Processing (NLP) tasks, Recurrent Neural Networks (RNNs) and their…

Computation and Language · Computer Science 2021-02-23 Elie Azeraf , Emmanuel Monfrini , Emmanuel Vignon , Wojciech Pieczynski

A Statistical Framework for Model Selection in LSTM Networks

Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem…

Machine Learning · Statistics 2026-05-26 Fahad Mostafa

Lattice Long Short-Term Memory for Human Action Recognition

Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs).…

Computer Vision and Pattern Recognition · Computer Science 2017-08-15 Lin Sun , Kui Jia , Kevin Chen , Dit Yan Yeung , Bertram E. Shi , Silvio Savarese

Parallel Long Short-Term Memory for Multi-stream Classification

Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells…

Machine Learning · Computer Science 2017-02-15 Mohamed Bouaziz , Mohamed Morchid , Richard Dufour , Georges Linarès , Renato De Mori

SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring

Deep learning has demonstrated tremendous potential for Automatic Text Scoring (ATS) tasks. In this paper, we describe a new neural architecture that enhances vanilla neural network models with auxiliary neural coherence features. Our new…

Artificial Intelligence · Computer Science 2017-11-15 Yi Tay , Minh C. Phan , Luu Anh Tuan , Siu Cheung Hui

Neural Sequential Phrase Grounding (SeqGROUND)

We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases…

Computer Vision and Pattern Recognition · Computer Science 2019-03-20 Pelin Dogan , Leonid Sigal , Markus Gross

Uncertainty-DTW for Sequences and Visual Tokens

Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, human action recognition, and visual representation learning. Existing alignment methods, including…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Lei Wang , Syuan-Hao Li , Yongsheng Gao , Piotr Koniusz

Unlocking the Power of LSTM for Long Term Time Series Forecasting

Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural…

Machine Learning · Computer Science 2025-02-25 Yaxuan Kong , Zepu Wang , Yuqi Nie , Tian Zhou , Stefan Zohren , Yuxuan Liang , Peng Sun , Qingsong Wen

Benchmarking of LSTM Networks

LSTM (Long Short-Term Memory) recurrent neural networks have been highly successful in a number of application areas. This technical report describes the use of the MNIST and UW3 databases for benchmarking LSTM networks and explores the…

Neural and Evolutionary Computing · Computer Science 2016-10-31 Thomas M. Breuel

Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

We propose a novel multi-modal and multi-task architecture for simultaneous low level gesture and surgical task classification in Robot Assisted Surgery (RAS) videos.Our end-to-end architecture is based on the principles of a long…

Computer Vision and Pattern Recognition · Computer Science 2018-05-03 Duygu Sarikaya , Khurshid A. Guru , Jason J. Corso