Related papers: Real-time Neural-based Input Method

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the…

Computation and Language · Computer Science 2017-06-20 Edouard Grave , Armand Joulin , Moustapha Cissé , David Grangier , Hervé Jégou

Neural Machine Translation via Binary Code Prediction

In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output…

Computation and Language · Computer Science 2017-04-25 Yusuke Oda , Philip Arthur , Graham Neubig , Koichiro Yoshino , Satoshi Nakamura

Long Short-Term Memory for Japanese Word Segmentation

This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated…

Computation and Language · Computer Science 2018-09-28 Yoshiaki Kitagawa , Mamoru Komachi

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Neural language models have been widely used in various NLP tasks, including machine translation, next word prediction and conversational agents. However, it is challenging to deploy these models on mobile devices due to their slow…

Machine Learning · Computer Science 2018-10-31 Patrick H. Chen , Si Si , Sanjiv Kumar , Yang Li , Cho-Jui Hsieh

Memory-Efficient Training of RNN-Transducer with Sampled Softmax

RNN-Transducer has been one of promising architectures for end-to-end automatic speech recognition. Although RNN-Transducer has many advantages including its strong accuracy and streaming-friendly property, its high memory consumption…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-01 Jaesong Lee , Lukas Lee , Shinji Watanabe

Attention Scheme Inspired Softmax Regression

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. As a case study, a state-of-the-art neural language model usually consists of one or more…

Computation and Language · Computer Science 2018-06-20 Patrick H. Chen , Si Si , Yang Li , Ciprian Chelba , Cho-jui Hsieh

SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism

Extensive efforts have been made to boost the performance in the domain of language models by introducing various attention-based transformers. However, the inclusion of linear layers with large dimensions contributes to significant…

Machine Learning · Computer Science 2024-11-19 Priyansh Bhatnagar , Linfeng Wen , Mingu Kang

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators…

Hardware Architecture · Computer Science 2024-11-28 Mariam Rakka , Jinhao Li , Guohao Dai , Ahmed Eltawil , Mohammed E. Fouda , Fadi Kurdahi

Cumulative Adaptation for BLSTM Acoustic Models

This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of…

Computation and Language · Computer Science 2019-06-17 Markus Kitza , Pavel Golik , Ralf Schlüter , Hermann Ney

Personalized Speech recognition on mobile devices

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized…

Computation and Language · Computer Science 2016-03-15 Ian McGraw , Rohit Prabhavalkar , Raziel Alvarez , Montse Gonzalez Arenas , Kanishka Rao , David Rybach , Ouais Alsharif , Hasim Sak , Alexander Gruenstein , Francoise Beaufays , Carolina Parada

Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism

There has been a rapid advance of custom hardware (HW) for accelerating the inference speed of deep neural networks (DNNs). Previously, the softmax layer was not a main concern of DNN accelerating HW, because its portion is relatively small…

Machine Learning · Computer Science 2021-11-23 Ihor Vasyltsov , Wooseok Chang

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking…

Human-Computer Interaction · Computer Science 2023-12-05 Shanqing Cai , Subhashini Venugopalan , Katie Seaver , Xiang Xiao , Katrin Tomanek , Sri Jalasutram , Meredith Ringel Morris , Shaun Kane , Ajit Narayanan , Robert L. MacDonald , Emily Kornman , Daniel Vance , Blair Casey , Steve M. Gleason , Philip Q. Nelson , Michael P. Brenner

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text…

Machine Learning · Computer Science 2025-02-18 Yeqi Gao , Zhao Song , Junze Yin

A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling

Statistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in…

Computation and Language · Computer Science 2016-02-05 Anantharaman Palacode Narayana Iyer

Approximate FPGA-based LSTMs under Computation Time Constraints

Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in…

Computer Vision and Pattern Recognition · Computer Science 2018-01-10 Michalis Rizakis , Stylianos I. Venieris , Alexandros Kouris , Christos-Savvas Bouganis

Efficient Learning for Undirected Topic Models

Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel…

Machine Learning · Computer Science 2015-06-25 Jiatao Gu , Victor O. K. Li

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent…

Computation and Language · Computer Science 2019-03-25 Sachin Kumar , Yulia Tsvetkov

Investigation of Large-Margin Softmax in Neural Language Modeling

To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-22 Jingjing Huo , Yingbo Gao , Weiyue Wang , Ralf Schlüter , Hermann Ney

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Thai-Son Nguyen , Ngoc-Quan Pham , Sebastian Stueker , Alex Waibel