John I. Makhoul
Spyros Matsoukas, Jennifer G. Dy
Date of Award
Master of Science
Department or Academic Unit
College of Engineering. Department of Electrical and Computer Engineering.
Electrical engineering, Computer engineering, Speech recognition, ROVER, BAYCOM
Natural language processing (Computer science)
Automatic Speech Recognition systems (ASRs) recognize word sequences by employing algorithms such as Hidden Markov Models. Given the same speech to recognize, the different ASRs may output very similar results but with errors such as insertion, substitution or deletion of incorrect words. Since different ASRs may be based on different algorithms, it is likely that error segments across ASRs are uncorrelated. Therefore it may be possible to improve the speech recognition accuracy by exploiting multiple hypotheses testing using a combination of ASRs. System Combination is a technique that combines the outputs of two or more ASRs to estimate the most likely hypothesis among conflicting word pairs or differing hypotheses for the same part of utterance. In this thesis, a conventional voting scheme called Recognized Output Voting Error Reduction (ROVER) is studied. A weighted voting scheme based on Bayesian theory known as Bayesian Combination (BAYCOM) is implemented. BAYCOM is derived from first principles of Bayesian theory. ROVER and BAYCOM use probabilities at the system level, such as performance of the ASR, to identify the most likely hypothesis. These algorithms arrive at the most likely word sequences by considering only a few parameters at the system level. The motivation is to develop newer System Combination algorithms that model the most likely word sequence hypothesis based on parameters that are not only related to the corresponding ASR but the word sequences themselves. Parameters, such as probabilities with respect to hypothesis and ASRs are termed word level probabilities and system level probabilities, respectively, in the thesis. Confusion Matrix Combination is a decision model based on parameters at word level. Confusion matrix consisting of probabilities with respect to word sequences are estimated during training. The system combination algorithms are initially trained with known speech transcripts followed by validation on a different set of transcripts. The word sequences are obtained by processing speech from Arabic news broadcasts. It is found that Confusion Matrix Combination performs better than system level BAYCOM and ROVER over the training sets. ROVER still proves to be a simple and powerful system combination technique and provides best improvements over the validation set.
Harish Kashyap Krishnamurthy
Krishnamurthy, Harish Kashyap, "Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs" (2009). Electrical and Computer Engineering Master's Theses. Paper 24. http://hdl.handle.net/2047/d10019273
Click button above to open, or right-click to save.