We have already discussed about the Kernel Width Generator for Support Vector Clustering
-
S. Lee and K. M. Daniels, "Gaussian Kernel Width Generator for Support Vector Clustering," in Advances in Bioinformatics and Its Applications, 2005, pp. 151-162.
@inproceedings{gauskergenerator2004,
author = {Sei-Hyung Lee and Karen M. Daniels},
Booktitle = {Advances in Bioinformatics and Its Applications},
Date-Added = {2007-10-23 17:21:17 +0200},
Date-Modified = {2007-10-23 17:21:17 +0200},
Editor = {Matthew He and Giri Narasimhan and Sergei Petoukhov},
Keywords = {SVM, clustering, gaussian kernel},
Pages = {151–162},
Title = {Gaussian Kernel Width Generator for Support Vector Clustering},
Url = {http://www.cs.uml.edu/~kdaniels/papers/ICBA.pdf},
Volume = {8},
Year = {2005},
Bdsk-Url-1 = {http://www.cs.uml.edu/~kdaniels/papers/ICBA.pdf}
}
More precisely, it is called the Gaussian kernel width generator, because it is intended for gaussian kernel width exploration. But the work above was influenced by the exclusive employment of the Gaussian kernel in the Support Vector Clustering literature.
In reality, the Support Vector Clustering works well (in some cases better) also with other kernel functions, such as Laplace kernel or Exponential kernel, notwithstanding the Gaussian kernel has the best average performances.
So we are interested in a more general kernel width exploration method.
As GaussianKernel(x,x) = 1 for all x, the method rely on the assumption that the entire data space is embedded onto the surface of the unit ball in feature space. In reality, the equality K(x,x) = 1 holds for all normalized kernels, so the validity of the proposed kernel width exploration method can be extended to all normalized kernels.
Kernels such as Gaussian, Laplacian, Exponential are all “self-normalized”, because all relies on the exp function with only a distance as exponent. Straightforwardly, such distance is always zero when we compute K(x,x), resulting in K(x,x) = 1.
Despite other kernels (like polynomial, sigmoid, perceptron, etc.) does not work well with Support Vector Clustering, for sake of completeness we recall that they can be normalized as follows
NK(x,y) = K(x,y) / sqrt(K(x,x)K(y,y))
where NK is the normalized version of K.