Since the 1980's the underlying technology in speech recognition has been the Hidden Markov Model (HMM), an accurate process to statistically model continuous human speech. A speech model is represented as combination of probabilities associated with both acoustic and language models. Acoustic models estimate the probability associated with postulated sequence of acoustic observations. Language models describe the probability associated with postulated sequence of words and can incorporate both syntactic and semantic constraints of the language. When developing speech recognition for training systems, both acoustic and language models are crafted for the application. Due to the complexity in building a tuned accurate speech recognition application, it is necessary to understand how acoustic and language models affect accuracy. The Speech Technology Group (STG) at NAVAIR Orlando develops acoustic and language models specifically for the Navy Air Traffic Control (ATC) trainers, in contrast to commercial-off-the-shelf speech tools that contain generic acoustic models with limited alterability. The present study evaluates several speech model configurations including word pair (bi-gram) models. The STG, under laboratory conditions, measured the effects of accuracy of the following variables: vocabulary, perplexity, acoustic models, and language models. The findings of this study describe the influence of acoustic and language modeling on speech recognition. These lessons learned provide a better understanding of how speech model parameters influence model accuracy and can be used to more efficiently incorporate speech recognition within training applications, thereby enhancing the learning performance of the war-fighter.
Analysis of Tradeoffs in Modeling Continuous Speech Recognition for Domain Specific Training Application
2 Views