Storming Media: Pentagon Reports and DocumentsPentagon Reports: Fast. Definitive. Complete.     
New Account »
Forgot Password?
Advanced Search »
ComputersCybernetics

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Authors: Puming Zhan; Alex Waibel; CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
Abstract:
Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTLN in frequency domain; (2) propose a speaker-specific Bark/ Mel scale VTLN in Bark/Mel domain; (3) investigate adaptation of the normalization factor. Our experimental results show that the speaker-specific Bark/Mel scale VTLN is better than the piecewise/bilinear warping VTLN in frequency domain. It can reduce up to 12% word error rate for our Spanish and English spontaneous speech scheduling task database. For adaptation of the normalization factor, our experimental results show that promising result can be obtained by using not more than three utterances from a new speaker to estimate his/her normalization factor, and the unsupervised adaptation mode works as well as the supervised one. Therefore, the computational complexity of VTLN can be avoided by learning the normalization factor from very few utterances of a new speaker.

Pages: 20
Report Date: MAY 97
Contract Number: N00014-93-1-0806
Report Number: A415333

Report Unavailable

This title is unavailable from Storming Media. We do not know when it might be available, if at all. We list the report on our site for bibliographic completeness, to help our users know what other work has been performed in this field. Please note that as with all titles on this site, we do not have contact information for any of the authors. Nor can we give any suggestions on how one might obtain this report.
Keywords relating to this report:
*SPEECH RECOGNITION
DATA BASES
ENGLISH LANGUAGE
FREQUENCY DOMAIN
SIGNAL PROCESSING
SPANISH LANGUAGE
VOCABULARY
WORD RECOGNITION
WORDS_LANGUAGE_
Email This Abstract