October 14 2007
[OT] Star galaxies separation via SVM/CVM classification
We have used some astrophysics star/galaxies datasets for our clustering problems, because they have heavily overlapping clusters.
Here we present some results of an SVM classification performed on the same datasets. In fact, S/G separation is usually faced in a supervised way.
We have used a simple nonlinear SVM/CVM classifier with a linear kernel (K(x,y) = x’ * y).
For each dataset, we have used 5% of it as training set. The rest is the test set.
Datasets:
Longo 01, 2500 items, 2000 stars, 500 galaxies
Longo 02, 9816 items, 2935 stars, 6883 galaxies
Longo 03, 10940 items, 2978 stars, 7964 galaxies
Accuracy results:
Longo 01: 95%
Longo 02: 98,0746%
Longo 03: 97,925%
Accuracy results with CVM:
Longo 01: 94,98%
Longo 02: 97,5%
Longo 03: 95,2%
Probably, other kernels could lead to better results, but it is necessary to understand in which way tune the hyperparameters, such as the kernel width and the soft margin constant, etc.
