Comprehensive Proof of Perceptron Convergence Theorem

05-15-201705-15-2017 blog 3 minutes read (About 438 words) visits

Introduction

I was reading the perceptron convergence theorem, which is a proof for the convergence of perceptron learning algorithm, in the book “Machine Learning - An Algorithmic Perspective” 2nd Ed. I found the authors made some errors in the mathematical derivation by introducing some unstated assumptions. Obviously, the author was looking at the materials from multiple different sources but did not generalize it very well to match his proceeding writings in the book. I then tried to look up the right derivation on the internet, and I found that most of the materials include too many assumptions that did not generalize the theorem very well. I finally found one from the MIT open course and I think this is the best one I was looking for.

Mathematical Proof and Caveats

In case you forget the perceptron learning algorithm, you may find it here.

The perceptron convergence theorem basically states that the perceptron learning algorithm converges in finite number of steps, given a linearly separable dataset. More precisely, if for each data point x, \( \|\mathbf{x}\| < R \) where \( R \) is certain constant number, \( \gamma = (\theta^{\ast})^{T} \mathbf{x_c} \) where \( \mathbf{x_c} \) is the data point that is the closest to the linear separate hyperplane. It should be noted that mathematically \(\frac{\gamma}{\|\theta^{\ast}\|^2}\) is the distance \(d\) of the closest datapoint to the linear separate hyperplane (it could be negative). The number of steps is bounded by \(\frac{R^{2}\|\theta^{\ast}\|^{2}}{\gamma^2}\) or \(\frac{R^{2}}{d^2}\).

In some materials, for simplicity, someone added assumption without generality that the weight of separate hyperplane is a unit vector (\( \|\theta^{\ast}\|^{2} = 1 \)) and one could claim that the physical meaning of \( \gamma \) is the distance of the closest data point to the linear separate hyperplane. However, sometimes people ignored this assumption and claim \( \gamma \) is the distance of the closest data point to the linear separate hyperplane. That was wrong.

The comprehensive correct proof could be found here.

PS: “MathJax is against humanity.” and “Happy Mother’s Day.”

Updates

6/16/2017

I happened to find an interesting non-mathematical informal proof in Hinton’s “Neural Networks for Machine Learning” course. Unlike the previous mathematical proofs I provided, this proof discussed the problems in the weight space. The whole informal proof is very simple and easy to understand. The informal proof could be downloaded here.

Comprehensive Proof of Perceptron Convergence Theorem

https://leimao.github.io/blog/Perceptron-Convergence-Theorem/

Author

Lei Mao

Posted on

05-15-2017

Updated on

05-15-2017

Licensed under

Machine Learning

Comprehensive Proof of Perceptron Convergence Theorem

Introduction

Mathematical Proof and Caveats

Updates

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue