Abstract

We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to ε on the number of queries, Ο(log² 1/ε), the Vapnik-Chervonenkis dimension of the query space, Ο(1og 1/ε log log 1/ε), and the inverse of the minimum tolerance, Ο(1/ε log 1/ε). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by Ω(log 1/ε) and the inverse of the minimum tolerance by Ω(1/ε). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which tolerate classification noise. We give an upper bound on the sample complexity of these noise-tolerant PAC algorithms which is nearly optimal with respect to the noise rate. We also give upper bounds on space complexity and hypothesis size and show that these two measures are in fact independent of the noise rate. We note that the running times of these noise-tolerant PAC algorithms are efficient. This sequence of simulations also demonstrates that it is possible to boost the accuracy of nearly all PAC algorithms even in the presence of noise. This provides a partial answer to an open problem of Schapire and the first theoretical evidence for an empirical result of Drucker, Schapire and Simard.

Notes

Originally published in Proceedings. 34th Annual Symposium on Foundations of Computer Science, 1993. pp. 282-291.

Keywords

general bounds, statistical query learning, PAC learning, noise, hypothesis boosting, complexity

Subject Categories

Machine learning

Disciplines

Computer Sciences

Publication Date

1993

Rights Information

Copyright 1993

Restrictions

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

Rights Holder

IEEE

Click button above to open, or right-click to save.

Share

COinS