Advisor(s)

John I. Makhoul

Contributor(s)

Spyros Matsoukas, Jennifer G. Dy

Date of Award

2009

Date Accepted

4-2009

Degree Grantor

Northeastern University

Degree Level

M.S.

Degree Name

Master of Science

Department or Academic Unit

College of Engineering. Department of Electrical and Computer Engineering.

Keywords

Electrical engineering, Computer engineering, Speech recognition, ROVER, BAYCOM

Subject Categories

Natural language processing (Computer science)

Disciplines

Computer Engineering

Abstract

Automatic Speech Recognition systems (ASRs) recognize word sequences by employing algorithms such as Hidden Markov Models. Given the same speech to recognize, the different ASRs may output very similar results but with errors such as insertion, substitution or deletion of incorrect words. Since different ASRs may be based on different algorithms, it is likely that error segments across ASRs are uncorrelated. Therefore it may be possible to improve the speech recognition accuracy by exploiting multiple hypotheses testing using a combination of ASRs. System Combination is a technique that combines the outputs of two or more ASRs to estimate the most likely hypothesis among conflicting word pairs or differing hypotheses for the same part of utterance. In this thesis, a conventional voting scheme called Recognized Output Voting Error Reduction (ROVER) is studied. A weighted voting scheme based on Bayesian theory known as Bayesian Combination (BAYCOM) is implemented. BAYCOM is derived from first principles of Bayesian theory. ROVER and BAYCOM use probabilities at the system level, such as performance of the ASR, to identify the most likely hypothesis. These algorithms arrive at the most likely word sequences by considering only a few parameters at the system level. The motivation is to develop newer System Combination algorithms that model the most likely word sequence hypothesis based on parameters that are not only related to the corresponding ASR but the word sequences themselves. Parameters, such as probabilities with respect to hypothesis and ASRs are termed word level probabilities and system level probabilities, respectively, in the thesis. Confusion Matrix Combination is a decision model based on parameters at word level. Confusion matrix consisting of probabilities with respect to word sequences are estimated during training. The system combination algorithms are initially trained with known speech transcripts followed by validation on a different set of transcripts. The word sequences are obtained by processing speech from Arabic news broadcasts. It is found that Confusion Matrix Combination performs better than system level BAYCOM and ROVER over the training sets. ROVER still proves to be a simple and powerful system combination technique and provides best improvements over the validation set.

Document Type

Master's Thesis

Rights Information

Copyright 2009

Rights Holder

Harish Kashyap Krishnamurthy



Click button above to open, or right-click to save.

Share

COinS