Related papers: Efficient Soft-Error Detection for Low-precision D…
Resistive Random-Access Memory (ReRAM) crossbar arrays are promising candidates for in-situ matrix-vector multiplication (MVM), a frequent operation in Deep Learning algorithms. Despite their advantages, these emerging non-volatile memories…
Deep-learning-based recommendation models (DLRMs) are widely deployed to serve personalized content to users. DLRMs are large in size due to their use of large embedding tables, and are trained by distributing the model across the memory of…
The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable…
Affine frequency division multiplexing (AFDM), an emerging multi-carrier modulation scheme, has garnered significant attention due to its resilience to Doppler shifts and capability to achieve full diversity in doubly dispersive channels.…
An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level…
Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading…
Central to the application of many multi-view geometry algorithms is the extraction of matching points between multiple viewpoints, enabling classical tasks such as camera pose estimation and 3D reconstruction. Many approaches that…
Although LLM-based conversational agents demonstrate strong fluency and coherence, they still produce undesirable behaviors (errors) that are challenging to prevent from reaching users during deployment. Recent research leverages large…
A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can…
Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens,…
Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google…
Deep neural networks (DNNs) have been increasingly explored for receiver design because they can handle complex environments without relying on explicit channel models. Nevertheless, because communication channels change rapidly, their…
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel…
Deep learning frameworks serve as the foundation for developing and deploying deep learning applications. To enhance the quality of deep learning frameworks, researchers have proposed numerous testing methods using deep learning models as…
Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data. SRS selects a subset uniformly at random with replacement from the full data set in…
Quantum error mitigation (QEM) is typically viewed as a suite of practical techniques for today's noisy intermediate-scale quantum devices, with limited relevance once fault-tolerant quantum computers become available. In this work, we…
Accurate text summarization is one of the most common and important tasks performed by Large Language Models, where the costs of human review for an entire document may be high, but the costs of errors in summarization may be even greater.…
The deployment of non-binary pulse amplitude modulation (PAM) and soft decision (SD)-forward error correction (FEC) in future intensity-modulation (IM)/direct-detection (DD) links is inevitable. However, high-speed IM/DD links suffer from…
This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs. LM-Fix runs a short test-vector pass and…
Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of…