Jennifer G. Dy
Glenn Fung, Dana H. Brooks, Deniz Erdogmus
Date of Award
Doctor of Philosophy
Department or Academic Unit
College of Engineering. Department of Computer and Electrical Engineering.
computer engineering, batch learning, kernel methods, machine learning, multi-class classification, semi-supervised learning, Support Vector Machines (SVM)
Support vector machines, Machine learning
Support vector machine (SVM) is a powerful supervised classification algorithm that has been successful in many real-world problems such as text categorization, face recognition, and applications in bioinformatics and computer-aided diagnosis. Although SVM is popular and accurate, it has some limitations as well. In this thesis, we focus on three major limitations of SVM and introduce various algorithms that utilize the relationships among samples to overcome these issues. Firstly, a limitation of SVM and that of supervised learning algorithms in general is that they only learn from labeled data. However, in many domains, labeled insances are typically costly to obtain. This is particularly true for the medical domains that motivate our research, where labels are assigned via time-consuming manual review by physicians. We introduce a number of methods where we take advantage of the relationships between labeled and unlabeled data (also known as semi-supervised learning) and incorporate the information hidden in the unlabeled data into SVM. Secondly, most classification systems assume that the data used to train and test the classifier are drawn from an independent and identically distributed (i.i.d.) underlying distribution. Nevertheless, this assumption is commonly violated in many real-life problems where sub-groups of samples have a high degree of correlation amongst both their features and their labels. Here, we introduce approaches that relax the i.i.d. assumption in support vector machines. Finally, another limitation of standard SVM is that it is designed for binary classification. Yet, many real-world applications have more than two categories. In this thesis, we design different algorithms to extend SVM to multi-class problems pursuing the following two goals: 1) efficiency in terms of training and testing times, and 2) increased accuracy by exploiting the information hidden in inter-class relationships.
Vural, Volkan, "Improving large margin classifiers using relationships among samples" (2009). Computer Engineering Dissertations. Paper 5. http://hdl.handle.net/2047/d20000453
Click button above to open, or right-click to save.