View article

[PDF] from academia.edu

Asynchronous stream modeling for large vocabulary audio-visual speech recognition

Authors

Juergen Luettin, Gerasimos Potamianos, Chalapathy Neti

Publication date

2001/5/7

Conference

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221)

Volume

Pages

169-172

Publisher

IEEE

Description

Addresses the problem of audio-visual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information integration. We show how these models can be trained jointly based on maximum likelihood estimation. Experiments, performed for a speaker-independent large vocabulary continuous speech recognition task and different integration methods, show that best performance is obtained by asynchronous stream integration. This system reduces the error rate at a 8.5 dB SNR with additive speech "babble" noise by 27 % relative over audio-only models and by 12 % relative over traditional audio-visual models using concatenative feature fusion.

Total citations

Cited by 133

2001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320245 12 12 6 5 4 8 9 9 4 2 5 6 6 3 9 6 7 2 3 5 3 1 1

Scholar articles

Asynchronous stream modeling for large vocabulary audio-visual speech recognition

J Luettin, G Potamianos, C Neti - 2001 IEEE International Conference on Acoustics …, 2001