Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability. We discretize the encoder output units into a predefined interpretable vocabulary space using the Connectionist Temporal Classification (CTC) loss. Our modular systems achieve near SOTA performance on the 300h Switchboard benchmark, with WER of 8.3% and 17.6% on the SWB and CH subsets, using seq2seq models with encoder and decoder modules which are independent and interchangeable.
Cite
@article{arxiv.1911.03782,
title = {Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models},
author = {Siddharth Dalmia and Abdelrahman Mohamed and Mike Lewis and Florian Metze and Luke Zettlemoyer},
journal= {arXiv preprint arXiv:1911.03782},
year = {2019}
}