Related papers: Learning Efficient Algorithms with Hierarchical At…

Hierarchical Attention: What Really Counts in Various NLP Tasks

Attention mechanisms in sequence to sequence models have shown great ability and wonderful performance in various natural language processing (NLP) tasks, such as sentence embedding, text generation, machine translation, machine reading…

Computation and Language · Computer Science 2018-08-14 Zehao Dou , Zhihua Zhang

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that…

Computation and Language · Computer Science 2019-11-18 Jie Hao , Xing Wang , Shuming Shi , Jinfeng Zhang , Zhaopeng Tu

Optimum Binary Search Trees on the Hierarchical Memory Model

The Hierarchical Memory Model (HMM) of computation is similar to the standard Random Access Machine (RAM) model except that the HMM has a non-uniform memory organized in a hierarchy of levels numbered 1 through h. The cost of accessing a…

Data Structures and Algorithms · Computer Science 2008-04-08 Shripad Thite

Hybrid Associative Memories

Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a…

Machine Learning · Computer Science 2026-03-30 Leon Lufkin , Tomás Figliolia , Beren Millidge , Kamesh Krishnamurthy

Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content

Multimedia or spoken content presents more attractive information than plain text content, but the former is more difficult to display on a screen and be selected by a user. As a result, accessing large collections of the former is much…

Computation and Language · Computer Science 2017-01-03 Wei Fang , Jui-Yang Hsu , Hung-yi Lee , Lin-Shan Lee

Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments

Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory…

Artificial Intelligence · Computer Science 2025-01-15 Mohamed A. Taha

HAM: Hierarchical Adapter Merging for Scalable Continual Learning

Continual learning is an essential capability of human cognition, yet it poses significant challenges for current deep learning models. The primary issue is that new knowledge can interfere with previously learned information, causing the…

Machine Learning · Computer Science 2025-09-19 Eric Nuertey Coleman , Luigi Quarantiello , Samrat Mukherjee , Julio Hurtado , Vincenzo Lomonaco

H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure

Memory data are ubiquitous in Large Language Model (LLM)-based agents (e.g., OpenClaw and Manus). A few recent works have attempted to exploit agents'memory for improving their performance on the question-answering (QA) task, but they lack…

Computation and Language · Computer Science 2026-05-18 Jiawei Yu , Yixiang Fang , Xilin Liu , Yuchi Ma

Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents

Long-term memory is one of the key factors influencing the reasoning capabilities of Large Language Model Agents (LLM Agents). Incorporating a memory mechanism that effectively integrates past interactions can significantly enhance…

Computation and Language · Computer Science 2025-08-01 Haoran Sun , Shaoning Zeng

A memory enhanced LSTM for modeling complex temporal dependencies

In this paper, we present Gamma-LSTM, an enhanced long short term memory (LSTM) unit, to enable learning of hierarchical representations through multiple stages of temporal abstractions. Gamma memory, a hierarchical memory unit, forms the…

Machine Learning · Computer Science 2019-10-29 Sneha Aenugu

Neural Attention Memory

We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM). NAM is a memory structure that is both readable and writable via differentiable…

Machine Learning · Computer Science 2023-10-17 Hyoungwook Nam , Seung Byum Seo

Towards mental time travel: a hierarchical memory for reinforcement learning agents

Reinforcement learning agents often forget details of the past, especially after delays or distractor tasks. Agents with common memory architectures struggle to recall and integrate across multiple timesteps of a past event, or even to…

Machine Learning · Computer Science 2021-12-09 Andrew Kyle Lampinen , Stephanie C. Y. Chan , Andrea Banino , Felix Hill

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.…

Computation and Language · Computer Science 2019-05-09 Yikang Shen , Shawn Tan , Alessandro Sordoni , Aaron Courville

Hierarchical Temporal Memory Based on Spin-Neurons and Resistive Memory for Energy-Efficient Brain-Inspired Computing

Hierarchical temporal memory (HTM) tries to mimic the computing in cerebral-neocortex. It identifies spatial and temporal patterns in the input for making inferences. This may require large number of computationally expensive tasks like,…

Emerging Technologies · Computer Science 2016-11-17 Deliang Fan , Mrigank Sharad , Abhronil Sengupta , Kaushik Roy

Hierarchical Multi-scale Attention Networks for Action Recognition

Recurrent Neural Networks (RNNs) have been widely used in natural language processing and computer vision. Among them, the Hierarchical Multi-scale RNN (HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-29 Shiyang Yan , Jeremy S. Smith , Wenjin Lu , Bailing Zhang

Long Short-Term Memory Over Tree Structures

The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell…

Computation and Language · Computer Science 2015-03-18 Xiaodan Zhu , Parinaz Sobhani , Hongyu Guo

Attention Tree: Learning Hierarchies of Visual Features for Large-Scale Image Recognition

One of the key challenges in machine learning is to design a computationally efficient multi-class classifier while maintaining the output accuracy and performance. In this paper, we present a tree-based classifier: Attention Tree (ATree)…

Computer Vision and Pattern Recognition · Computer Science 2016-08-03 Priyadarshini Panda , Kaushik Roy

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agent's long-term semantic memory system,…

Information Retrieval · Computer Science 2026-05-27 Zhentao Xu , Shangjin Zhang , Emir Poyraz , Yvonne Li , Ye Jin , Xie Lu , Xiaoyang Gu , Karthik Ramgopal , Praveen Kumar Bodigutla , Xiaofeng Wang

HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

Memory retrieval in agentic large language model (LLM) systems is often treated as a static lookup problem, relying on flat vector search or fixed binary relational graphs. However, fixed graph structures cannot capture the varying…

Artificial Intelligence · Computer Science 2026-05-12 Dongming Jiang , Yi Li , Guanpeng Li , Qiannan Li , Bingzhe Li

Tree-structured Attention with Hierarchical Accumulation

Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks. However, it is evident that state-of-the-art (SOTA) sequence-based models like the Transformer…

Machine Learning · Computer Science 2020-02-20 Xuan-Phi Nguyen , Shafiq Joty , Steven C. H. Hoi , Richard Socher