English
Related papers

Related papers: Mini-Sequence Transformer: Optimizing Intermediate…

200 papers

Spatial transformer network has been used in a layered form in conjunction with a convolutional network to enable the model to transform data spatially. In this paper, we propose a combined spatial transformer network (STN) and a Long…

Image and Video Processing · Electrical Eng. & Systems 2019-09-02 Shiyang Feng , Tianyue Chen , Hao Sun

Auto-regressive speech-text models pre-trained on interleaved text tokens and discretized speech tokens demonstrate strong speech understanding and generation, yet remain substantially less compute-efficient than text LLMs, partly due to…

Computation and Language · Computer Science 2026-03-11 Yen-Ju Lu , Yashesh Gaur , Wei Zhou , Benjamin Muller , Jesus Villalba , Najim Dehak , Luke Zettlemoyer , Gargi Ghosh , Mike Lewis , Srinivasan Iyer , Duc Le

Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in…

Computation and Language · Computer Science 2025-02-07 Zifan He , Yingqi Cao , Zongyue Qin , Neha Prakriya , Yizhou Sun , Jason Cong

With the rapid development of Natural Language Processing (NLP) technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation…

Computation and Language · Computer Science 2024-11-01 Yuxu Wu , Yiren Xing

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has…

Computation and Language · Computer Science 2022-12-09 Aydar Bulatov , Yuri Kuratov , Mikhail S. Burtsev

Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-09 Xiao Wang , Isaac Lyngaas , Aristeidis Tsaris , Peng Chen , Sajal Dash , Mayanka Chandra Shekar , Tao Luo , Hong-Jun Yoon , Mohamed Wahib , John Gouley

Large language models (LLMs) have made significant strides in complex tasks, yet their widespread adoption is impeded by substantial computational demands. With hundreds of billion parameters, transformer-based LLMs necessitate months of…

Machine Learning · Computer Science 2024-08-22 Pihe Hu , Shaolong Li , Longbo Huang

Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of end-to-end speech recognition, especially in streaming scenarios where both latency and accuracy are important. Although various methods,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Yusuke Shinohara , Shinji Watanabe

Transformers have achieved success in both language and vision domains. However, it is prohibitively expensive to scale them to long sequences such as long documents or high-resolution images, because self-attention mechanism has quadratic…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Chen Zhu , Wei Ping , Chaowei Xiao , Mohammad Shoeybi , Tom Goldstein , Anima Anandkumar , Bryan Catanzaro

Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating…

Neural and Evolutionary Computing · Computer Science 2019-01-29 Shiwei Liu , Decebal Constantin Mocanu , Mykola Pechenizkiy

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by…

In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning…

Computation and Language · Computer Science 2016-09-22 Jianpeng Cheng , Li Dong , Mirella Lapata

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to…

Computation and Language · Computer Science 2014-12-16 Ilya Sutskever , Oriol Vinyals , Quoc V. Le

The Long Short-Term Memory (LSTM) layer is an important advancement in the field of neural networks and machine learning, allowing for effective training and impressive inference performance. LSTM-based neural networks have been…

Neural and Evolutionary Computing · Computer Science 2019-01-04 Daniel Kent , Fathi M. Salem

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process…

Computation and Language · Computer Science 2024-05-29 Chaojun Xiao , Pengle Zhang , Xu Han , Guangxuan Xiao , Yankai Lin , Zhengyan Zhang , Zhiyuan Liu , Maosong Sun

We study Latent Recurrent Transformer (LRT), a lightweight augmentation of autoregressive transformers that reuses a high-level source-layer hidden state from the previous token as recurrent memory for the next token. Because this source…

The Transformer architecture has shown significant success in many language processing and visual tasks. However, the method faces challenges in efficiently scaling to long sequences because the self-attention computation is quadratic with…

Machine Learning · Computer Science 2025-05-05 Edison Mucllari , Zachary Daniels , David Zhang , Qiang Ye

Attention is an important cognition process of humans, which helps humans concentrate on critical information during their perception and learning. However, although many machine learning models can remember information of data, they have…

Machine Learning · Computer Science 2019-09-06 Guoqiang Zhong , Xin Lin , Kang Chen , Qingyang Li , Kaizhu Huang

Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional encoder-decoder policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT…

Computation and Language · Computer Science 2025-09-29 Qianen Zhang , Satoshi Nakamura

Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP) and achieved great success. However, it has not been fully explored in visual self-supervised learning. Meanwhile, previous methods only…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Zhaowen Li , Zhiyang Chen , Fan Yang , Wei Li , Yousong Zhu , Chaoyang Zhao , Rui Deng , Liwei Wu , Rui Zhao , Ming Tang , Jinqiao Wang
‹ Prev 1 2 3 10 Next ›