Related papers: Towards Audio Codec-based Speech Separation

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec

Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-18 Haoyang Li , Jia Qi Yip , Tianyu Fan , Eng Siong Chng

Speech Separation using Neural Audio Codecs with Embedding Loss

Neural audio codecs have revolutionized audio processing by enabling speech tasks to be performed on highly compressed representations. Recent work has shown that speech separation can be achieved within these compressed domains, offering…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Jia Qi Yip , Chin Yuen Kwok , Bin Ma , Eng Siong Chng

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Neural Audio Codecs (NACs) can reduce transmission overhead by performing compact compression and reconstruction, which also aim to bridge the gap between continuous and discrete signals. Existing NACs can be divided into two categories:…

Sound · Computer Science 2026-01-07 Zhisheng Zhang , Xiang Li , Yixuan Zhou , Jing Peng , Shengbo Cai , Guoyang Zeng , Zhiyong Wu

Source-Aware Neural Speech Coding for Noisy Speech Compression

This paper introduces a novel neural network-based speech coding system that can process noisy speech effectively. The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-11 Haici Yang , Kai Zhen , Seungkwon Beack , Minje Kim

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations

Neural audio codecs (NACs), which use neural networks to generate compact audio representations, have garnered interest for their applicability to many downstream tasks -- especially quantized codecs due to their compatibility with large…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Ryo Aihara , Yoshiki Masuyama , Gordon Wichern , François G. Germain , Jonathan Le Roux

SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain

Neural Audio Codecs (NACs) have gained growing attention in recent years as technologies for audio compression and audio representation in speech language models. While mainstream NACs typically require G-level computation and M-level…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-27 Zixiang Wan , Guochang Zhang , Yifeng He , Jianqiang Wei

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-25 Alexandre Défossez , Jade Copet , Gabriel Synnaeve , Yossi Adi

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices from a continuous influx of conversational audio containing an unknown number of utterances spoken by an unknown number of speakers. A common application scenario is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-14 Zhuohuang Zhang , Takuya Yoshioka , Naoyuki Kanda , Zhuo Chen , Xiaofei Wang , Dongmei Wang , Sefik Emre Eskimez

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs

Recent advancements in neural audio codecs have not only enabled superior audio compression but also enhanced speech synthesis techniques. Researchers are now exploring their potential as universal acoustic feature extractors for a broader…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Wei-Cheng Tseng , David Harwath

Noise-Aware Speech Separation with Contrastive Learning

Recently, speech separation (SS) task has achieved remarkable progress driven by deep learning technique. However, it is still challenging to separate target speech from noisy mixture, as the neural model is vulnerable to assign background…

Sound · Computer Science 2024-01-09 Zizheng Zhang , Chen Chen , Hsin-Hung Chen , Xiang Liu , Yuchen Hu , Eng Siong Chng

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs…

Sound · Computer Science 2024-10-04 Yuzhe Gu , Enmao Diao

CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents

Text-guided sound separation enables flexible audio editing, assistive listening, and open-domain source extraction, but systems such as AudioSep remain too expensive for low-latency edge or codec-mediated deployment. Existing neural audio…

Sound · Computer Science 2026-04-28 Adhiraj Banerjee , Vipul Arora

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over…

Sound · Computer Science 2021-02-22 Zining Zhang , Bingsheng He , Zhenjie Zhang

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Speech codecs that convert continuous speech signals into discrete tokens have become essential for speech language models. However, existing codecs struggle to balance high-quality reconstruction with semantically rich representations,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-16 Wenxi Chen , Xinsheng Wang , Ruiqi Yan , Yushen Chen , Zhikang Niu , Ziyang Ma , Xiquan Li , Yuzhe Liang , Hanlin Wen , Shunshun Yin , Ming Tao , Xie Chen

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating…

Computation and Language · Computer Science 2023-05-24 Tian-Hao Zhang , Hai-Bo Qin , Zhi-Hao Lai , Song-Lu Chen , Qi Liu , Feng Chen , Xinyuan Qian , Xu-Cheng Yin

SUNAC: Source-aware Unified Neural Audio Codec

Neural audio codecs (NACs) provide compact representations that can be leveraged in many downstream applications, in particular large language models. Yet most NACs encode mixtures of multiple sources in an entangled manner, which may…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Ryo Aihara , Yoshiki Masuyama , Francesco Paissan , François G. Germain , Gordon Wichern , Jonathan Le Roux

Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-26 Kunal Dhawan , Nithin Rao Koluguri , Ante Jukić , Ryan Langman , Jagadeesh Balam , Boris Ginsburg

SLM-SS: Speech Language Model for Generative Speech Separation

Speech separation (SS) has advanced significantly with neural network-based methods, showing improved performance on signal-level metrics. However, these methods often struggle to maintain speech intelligibility in the separated signals,…

Sound · Computer Science 2026-01-28 Tianhua Li , Chenda Li , Wei Wang , Xin Zhou , Xihui Chen , Jianqing Gao , Yanmin Qian