Archive for November, 2007

Multivariate Data Analysis Software and Resources

A collection of the software for multivariate data analysis is available here.

Stesura tesi - Sesta bozza 14/11

Sixth draft of thesis. Contents are (in bold new contents)

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 9: Support Vector Clustering software development
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

Stesura tesi - Quinta bozza 06/11

Fifth draft of thesis. Contents are

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 9: Support Vector Clustering software development
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

Bregman divergences, SVMs and possible implications

In order to find a connection between the works studied (Bregman Co-clustering and Support Vector Clustering) we have performed some research. An interesting result are the following paper:

  • R. Nock and F. Nielsen, "Fitting the smallest enclosing Bregman balls," in 16th European Conference on Machine Learning, 2005, pp. 649-656.
    @conference{bregmanmeb05,
      author = {Richard Nock and Frank Nielsen},
      Booktitle = {16th European Conference on Machine Learning},
      Date-Added = {2007-06-23 11:00:19 +0200},
      Date-Modified = {2007-11-14 12:55:32 +0100},
      Keywords = {bregman, MEB},
      Number = {3720},
      Pages = {649–656},
      Publisher = {Springer-Verlag},
      Series = {Lectures Notes on Computer Science Series},
      Title = {{Fitting the smallest enclosing Bregman balls}},
      Url = {http://www.sonycsl.co.jp/person/nielsen/BregmanBall/nn-ecml-05.pdf},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEEUuLi8uLi8uLi9QYXBlcnMvTm9jay9GaXR0aW5nIHRoZSBzbWFsbGVzdCBlbmNsb3NpbmcgQnJlZ21hbiBiYWxscy5wZGbSGw8cHVdOUy5kYXRhTxECHAAAAAACHAACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAAN6DVH0ZpdHRpbmcgdGhlIHNtYWxsZXN0IzM3QTBEMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3oNPCoreIAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAROb2NrABAACAAAvs5cjgAAABEACAAAwqKbaAAAAAEAFAA3oNUANxuAAACy8gAAEsYAABKtAAIAT0RvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpOb2NrOkZpdHRpbmcgdGhlIHNtYWxsZXN0IzM3QTBEMy5wZGYAAA4AYgAwAEYAaQB0AHQAaQBuAGcAIAB0AGgAZQAgAHMAbQBhAGwAbABlAHMAdAAgAGUAbgBjAGwAbwBzAGkAbgBnACAAQgByAGUAZwBtAGEAbgAgAGIAYQBsAGwAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAVy9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9Ob2NrL0ZpdHRpbmcgdGhlIHNtYWxsZXN0IGVuY2xvc2luZyBCcmVnbWFuIGJhbGxzLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAPIA9wD/Ax8DIQMmAy8DOgM+A0wDUwNcA2EDZAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANx},
      Bdsk-Url-1 = {http://www.sonycsl.co.jp/person/nielsen/BregmanBall/nn-ecml-05.pdf}
    }

The above paper generalizes the Minimum Enclosing Ball (MEB) problem to the Bregman divergences and also provide a generalization of the Bâdoiu-Clarkson (BC) approximation algorith. This is the same algorithm exploited in practical by the Core Vector Machines

  • I. W. Tsang, J. T. Kwok, and P. Cheung, "Core vector machines: Fast SVM training on very large data sets," Journal of Machine Learning Research, vol. 6, pp. 363-392, 2005.
    @article{cvm05,
      author = {Ivor W. Tsang and James T. Kwok and Pak-Ming Cheung},
      Date-Added = {2007-05-26 12:49:30 +0200},
      Date-Modified = {2007-06-23 08:23:02 +0200},
      Journal = {Journal of Machine Learning Research},
      Keywords = {SVM, CVM, MEB, SVDD},
      Pages = {363–392},
      Title = {Core vector machines: Fast SVM training on very large data sets},
      Url = {http://www.cs.ust.hk/%7Eivor/publication/tsang05a.pdf},
      Volume = {6},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEFguLi8uLi8uLi9QYXBlcnMvVHNhbmcvQ29yZSB2ZWN0b3IgbWFjaGluZXMgRmFzdCBTVk0gdHJhaW5pbmcgb24gdmVyeSBsYXJnZSBkYXRhIHNldHMucGRm0hsPHB1XTlMuZGF0YU8RAlQAAAAAAlQAAgAACURvY3VtZW50cwAAAAAAAAAAAAAAAAAAAAAAAL7OeK5IKwAAADctuR9Db3JlIHZlY3RvciBtYWNoaW5lcyMzMkY0MjEucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMvQhwn3aZgAAAAAAAAAAAAMAAwAACQAAAAAAAAAAAAAAAAAAAAAFVHNhbmcAABAACAAAvs5cjgAAABEACAAAwn2+RgAAAAEAFAA3LbkANxuAAACy8gAAEsYAABKtAAIAUERvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpUc2FuZzpDb3JlIHZlY3RvciBtYWNoaW5lcyMzMkY0MjEucGRmAA4AhgBCAEMAbwByAGUAIAB2AGUAYwB0AG8AcgAgAG0AYQBjAGgAaQBuAGUAcwAgAEYAYQBzAHQAIABTAFYATQAgAHQAcgBhAGkAbgBpAG4AZwAgAG8AbgAgAHYAZQByAHkAIABsAGEAcgBnAGUAIABkAGEAdABhACAAcwBlAHQAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAai9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9Uc2FuZy9Db3JlIHZlY3RvciBtYWNoaW5lcyBGYXN0IFNWTSB0cmFpbmluZyBvbiB2ZXJ5IGxhcmdlIGRhdGEgc2V0cy5wZGYAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAQUBCgESA2oDbANxA3oDhQOJA5cDngOnA6wDrwAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAO8},
      Bdsk-Url-1 = {http://www.cs.ust.hk/~ivor/publication/tsang05a.pdf}
    }

CVMs reformulate the SVMs as a MEB problem. Since they use the BC algorithm and such an algorithm has been generalized to the Bregman divergences, the research on vector machines could have interesting implications.

[OT] Star galaxies separation via SVM/CVM classification - Part 2

This is a modification of the experiments in this post.

I rapidly built a new training set and this time I use only this training set for training the SVM/CVM. Than, I test the new trained classifier on all three dataset of the previous post.

The training set contain 500 points and has been built using stars and galaxies from another portion of sky.

New accuracy results (SVM)

Longo 01: 95,96 %
Longo 02: 98,08 %
Longo 03: 97,956 %

New accuracy results (CVM)

Longo 01: 96,31 %
Longo 02: 97,67 %
Longo 03: 97,138 %

Let us consider the Longo 02 tested with CVM. We have

Completeness for Stars: 98,4 %
Contamination for Stars: 4,7 %

Completeness for Galaxies: 95,4 %
Contamination for Galaxies: 1,5 %

Stesura tesi - Quarta bozza 01/11

UPDATED: Chapter 10 added

Fourth draft of thesis. Contents are

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download