Related papers: Dynamic Multi-scale Convolution for Dialect Identi…

EfficientTDNN: Efficient Architecture Search for Speaker Recognition

Convolutional neural networks (CNNs), such as the time-delay neural network (TDNN), have shown their remarkable capability in learning speaker embedding. However, they meanwhile bring a huge computational cost in storage size, processing,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Rui Wang , Zhihua Wei , Haoran Duan , Shouling Ji , Yang Long , Zhen Hong

Dynamic Convolution: Attention over Convolution Kernels

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited…

Computer Vision and Pattern Recognition · Computer Science 2020-04-02 Yinpeng Chen , Xiyang Dai , Mengchen Liu , Dongdong Chen , Lu Yuan , Zicheng Liu

KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

Dynamic convolution enhances model capacity by adaptively combining multiple kernels, yet faces critical trade-offs: prior works either (1) incur significant parameter overhead by scaling kernel numbers linearly, (2) compromise inference…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Haiduo Huang , Yadong Zhang , Yinghui Xu , Pengju Ren

Adaptive Dynamic Filtering Network for Image Denoising

In image denoising networks, feature scaling is widely used to enlarge the receptive field size and reduce computational costs. This practice, however, also leads to the loss of high-frequency information and fails to consider within-scale…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Hao Shen , Zhong-Qiu Zhao , Wandi Zhang

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

A number of recent studies have shown that a Deep Convolutional Neural Network (DCNN) pretrained on a large dataset can be adopted as a universal image description which leads to astounding performance in many visual classification tasks.…

Computer Vision and Pattern Recognition · Computer Science 2014-12-01 Lingqiao Liu , Chunhua Shen , Anton van den Hengel

Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

State-of-the-art speaker verification frameworks have typically focused on developing models with increasingly deeper (more layers) and wider (number of channels) models to improve their verification performance. Instead, this paper…

Sound · Computer Science 2023-02-28 Anna Ollerenshaw , Md Asif Jalal , Thomas Hain

Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection

Deep Convolutional Neural Networks (DCNNs) commonly use generic `max-pooling' (MP) layers to extract deformation-invariant features, but we argue in favor of a more refined treatment. First, we introduce epitomic convolution as a building…

Computer Vision and Pattern Recognition · Computer Science 2014-12-02 George Papandreou , Iasonas Kokkinos , Pierre-André Savalle

D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation

Hierarchical transformers have achieved significant success in medical image segmentation due to their large receptive field and capabilities of effectively leveraging global long-range contextual information. Convolutional neural networks…

Image and Video Processing · Electrical Eng. & Systems 2024-10-18 Jin Yang , Peijie Qiu , Yichi Zhang , Daniel S. Marcus , Aristeidis Sotiras

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

This paper presents a novel Dialect Identification (DID) system developed for the Fifth Edition of the Multi-Genre Broadcast challenge, the task of Fine-grained Arabic Dialect Identification (MGB-5 ADI Challenge). The system improves upon…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-20 Xiaoxiao Miao , Ian McLoughlin

Densely connected multidilated convolutional networks for dense prediction tasks

Tasks that involve high-resolution dense prediction require a modeling of both local and global patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is…

Computer Vision and Pattern Recognition · Computer Science 2021-06-10 Naoya Takahashi , Yuki Mitsufuji

Convolutional Dictionary Pair Learning Network for Image Representation Learning

Both the Dictionary Learning (DL) and Convolutional Neural Networks (CNN) are powerful image representation learning systems based on different mechanisms and principles, however whether we can seamlessly integrate them to improve the…

Computer Vision and Pattern Recognition · Computer Science 2020-01-16 Zhao Zhang , Yulin Sun , Yang Wang , Zhengjun Zha , Shuicheng Yan , Meng Wang

Improving Convolutional Neural Networks for Fault Diagnosis by Assimilating Global Features

Deep learning techniques have become prominent in modern fault diagnosis for complex processes. In particular, convolutional neural networks (CNNs) have shown an appealing capacity to deal with multivariate time-series data by converting…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Saif S. S. Al-Wahaibi , Qiugang Lu

A Convolutional Neural Network for Modelling Sentences

The ability to accurately represent sentences is central to language understanding. We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. The…

Computation and Language · Computer Science 2014-04-09 Nal Kalchbrenner , Edward Grefenstette , Phil Blunsom

MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification

In speaker verification, traditional models often emphasize modeling long-term contextual features to capture global speaker characteristics. However, this approach can neglect fine-grained voiceprint information, which contains highly…

Sound · Computer Science 2025-05-07 Ya Li , Bin Zhou , Bo Hu

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-11 Hyeonuk Nam , Seong-Hu Kim , Deokki Min , Junhyeok Lee , Yong-Hwa Park

Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition

The Deep Convolutional Neural Networks (CNNs) have obtained a great success for pattern recognition, such as recognizing the texts in images. But existing CNNs based frameworks still have several drawbacks: 1) the traditaional pooling…

Computer Vision and Pattern Recognition · Computer Science 2020-01-20 Zhao Zhang , Zemin Tang , Zheng Zhang , Yang Wang , Jie Qin , Meng Wang

Frequency Dynamic Convolutions for Sound Event Detection

Recent research in deep learning-based Sound Event Detection (SED) has primarily focused on Convolutional Recurrent Neural Networks (CRNNs) and Transformer models. However, conventional 2D convolution-based models assume shift invariance…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-17 Hyeonuk Nam

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Motivated by the fact that characteristics of different sound classes are highly diverse in different temporal scales and hierarchical levels, a novel deep convolutional neural network (CNN) architecture is proposed for the environmental…

Sound · Computer Science 2018-06-15 Boqing Zhu , Kele Xu , Dezhi Wang , Lilun Zhang , Bo Li , Yuxing Peng

Cross-convolutional-layer Pooling for Image Recognition

Recent studies have shown that a Deep Convolutional Neural Network (DCNN) pretrained on a large image dataset can be used as a universal image descriptor, and that doing so leads to impressive performance for a variety of image…

Computer Vision and Pattern Recognition · Computer Science 2016-12-23 Lingqiao Liu , Chunhua Shen , Anton van den Hengel

Adaptive Convolution for CNN-based Speech Enhancement Models

Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Dahan Wang , Xiaobin Rong , Shiruo Sun , Yuxiang Hu , Changbao Zhu , Jing Lu