Archive for Appunti

SVDD and kernel functions

Support Vector Domain Description (SVDD) is the basis of the Support Vector Clustering. The non linear version of the SVDD use the Gaussian kernel and no other kernels has been apparently investigated but the polynomial one which is an example of kernel type that works bad with SVDD.

I wrote an e-mail message to the SVDD author, Dr. David M. J. Tax (from Delft University of Technology, Netherlands). Here the “core” of the message

Even though the Gaussian kernel is the one with the best average performances, some experiments conducted on a specific application domain have given better results with a Laplacian kernel or an Exponential kernel.

What does it theoretically means from an SVDD perspective? Have you never tried kernels other than Gaussian and polynomial ones?

The reply of Dr. Tax was

To be honest the number of kernels I used is relatively limited. For some cases I used a correlation between image patches, and it seemed to work well. Also, some people
have used a modified Haussdorf distance to compare shapes. I don’t have a lot of
experience with it.

The big problem is that we can only say something about generalization for a given representation. By changing the kernel, the representation changes. And what happens then is completely dependent on the data, so it is extremely hard to say something general about what kernel to use (the same like what features to use). For some applications there may be features ‘proven by experience’ (like the RBF kernel for the simple UCI datasets), but theoretically you cannot really proof it, I think.

So, the conclusion is that a deeper investigation about other kernels and SVDD is needed. Currently, I have yet some experimental results in this direction (even if they are from a clustering perspective) and in future we could think to go in depth of the question analyzing the shape of data description and the behavior of various (exponential-based) kernels at different kernel width values.

Appunti: differenze tra MLP e SVM

I limiti principali del Multi-Layer Perceptron (MLP) sono:

  • la necessità di fissare a priori la struttura della rete, in termini di hidden layers e di numero di neuroni da porre in ognuno di essi
  • l’eccessiva ampiezza delle maggiorazioni ottenute per la VC-dimension dei modelli impiegati praticamente
  • difficoltà di addestramento nel caso di dataset non linearmente separabili:
    • a causa dell’alto numero di dimensioni dello spazio dei pesi
    • poiché le tecniche più diffuse, come la back-propagation, permettono di ottenere i pesi della rete risolvendo un problema di ottimizzazione non convesso e non vincolato che, di conseguenza, presenta un numero indeterminato di minimi locali.

Le SVM superano questi problemi.
Innanzitutto non c’è la necessità di costruire esplicitamente la funzione non lineare per mappare gli ingressi nello spazio degli attributi. Tramite il kernel trick si opera implicitamente nello spazio degli attributi (equivalente allo spazio degli hidden layers). In questo modo ci si svincola dall’obbligo di fissare a priori la struttura della rete neurale. Allo stesso tempo si rende le SVM scalabili rispetto a dati di alta dimensionalità.
Inoltre le SVM assicurano una soluzione unica e globale nel caso si scelta un kernel definito positivamente.