December 24 2007

Stesura tesi - Ottava bozza 24/12

Eight draft of thesis. Contents are (in bold new contents)

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 8: Support Vector Clustering software development
  • Chapter 9: Experiments (Incomplete, only two experimental stages out of 5)
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Resources usage of the algorithms
  • Appendix C: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

December 20 2007

Stesura tesi - Settima bozza 20/12

Seventh draft of thesis. Contents are (in bold new contents)

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 8: Support Vector Clustering software development
  • Chapter 9: Experiments (Incomplete, only two experimental stages out of 5)
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Time and space consume (to be completed)
  • Appendix C: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

December 03 2007

Co-clustering softwares

The first co-clustering software is the Co-cluster developed at University of Austin, Texas. The software you can download here is the version 1.1 you can find also at the original web page.

The package hosted here includes a patch to allow the software compilation also with gcc 4.0 and so on modern Linux and Mac OS X systems. Furthermore, it also contains some bash scripts (*.sh) to analyze co-clustering results and produce clustering quality measures with respect to labeled datasets.

The original software is released under GPL license, and so do this.

Download

Co-clustering code


The original version of the second Co-clustering software is available here and it implements all the six approximation schemes for the Co-clustering, both for the Euclidean distance and for I-divergence.

The package hosted here includes also the same bash scripts included in the aforesaid Co-cluster package.

No license informations were included into the original Bregman co-clustering package, but it seems to be a fork of the Co-cluster software v. 1.0. The latter was released under GPL license, so the code of the Bregman co-clustering should be under the same license.

Download

Bregman Co-clustering code

December 03 2007

Support Vector Clustering Code

Here I put the preliminary alpha source code for the Support Vector Clustering. It implements the Cone Cluster Labeling for the cluster assignment part

  • S. Lee and K. M. Daniels, "Cone Cluster Labeling for Support Vector Clustering," in Proceedings of 6th SIAM Conference on Data Mining, 2006, pp. 484-488.
    @inproceedings{cone2006,
      author = {Sei-Hyung Lee and Karen M. Daniels},
      Booktitle = {Proceedings of 6th SIAM Conference on Data Mining},
      Date-Added = {2007-04-29 16:58:13 +0200},
      Date-Modified = {2007-06-19 18:52:22 +0200},
      Keywords = {SVM, clustering},
      Month = {May},
      Pages = {484–488},
      Title = {Cone Cluster Labeling for Support Vector Clustering},
      Url = {http://www.siam.org/meetings/sdm06/proceedings/046lees.pdf},
      Year = {2006},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEEsuLi8uLi8uLi9QYXBlcnMvTGVlL0NvbmUgQ2×1c3RlciBMYWJlbGluZyBmb3IgU3VwcG9ydCBWZWN0b3IgQ2×1c3RlcmluZy5wZGbSGw8cHVdOUy5kYXRhTxECLgAAAAACLgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyVBH0NvbmUgQ2×1c3RlciBMYWJlbGluIzJGMDk0My5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAvCUPCWn72AAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAANMZWUAABAACAAAvs5cjgAAABEACAAAwlpi1gAAAAEAFAA3JUEANxuAAACy8gAAEsYAABKtAAIATkRvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpMZWU6Q29uZSBDbHVzdGVyIExhYmVsaW4jMkYwOTQzLnBkZgAOAHAANwBDAG8AbgBlACAAQwBsAHUAcwB0AGUAcgAgAEwAYQBiAGUAbABpAG4AZwAgAGYAbwByACAAUwB1AHAAcABvAHIAdAAgAFYAZQBjAHQAbwByACAAQwBsAHUAcwB0AGUAcgBpAG4AZwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAXS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9MZWUvQ29uZSBDbHVzdGVyIExhYmVsaW5nIGZvciBTdXBwb3J0IFZlY3RvciBDbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAPgA/QEFAzcDOQM+A0cDUgNWA2QDawN0A3kDfAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAOJ},
      Bdsk-Url-1 = {http://www.siam.org/meetings/sdm06/proceedings/046lees.pdf}
    }

It also implements the Secant-like kernel width generator.

  • S. Lee and K. M. Daniels, "Gaussian Kernel Width Selection and Fast Cluster Labeling for Support Vector Clustering," Department of Computer Science, University of Massachussets Lowell2005.
    @techreport{kernwidthsvc2005,
      author = {Sei-Hyung Lee and Karen M. Daniels},
      Date-Added = {2007-05-18 10:44:22 +0200},
      Date-Modified = {2007-06-20 08:28:06 +0200},
      Institution = {Department of Computer Science, University of Massachussets Lowell},
      Keywords = {svm, clustering, kernel machines},
      Title = {Gaussian Kernel Width Selection and Fast Cluster Labeling for Support Vector Clustering},
      Url = {http://www.cs.uml.edu/~kdaniels/papers/SeiTechReport2005.pdf},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEG8uLi8uLi8uLi9QYXBlcnMvTGVlL0dhdXNzaWFuIEtlcm5lbCBXaWR0aCBTZWxlY3Rpb24gYW5kIEZhc3QgQ2×1c3RlciBMYWJlbGluZyBmb3IgU3VwcG9ydCBWZWN0b3IgQ2×1c3RlcmluZy5wZGbSGw8cHVdOUy5kYXRhTxECmgAAAAACmgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyVBH0dhdXNzaWFuIEtlcm5lbCBXaWR0IzMxQ0FDQS5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAxysrCcyn+UERGIAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAANMZWUAABAACAAAvs5cjgAAABEACAAAwnMN3gAAAAEAFAA3JUEANxuAAACy8gAAEsYAABKtAAIATkRvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpMZWU6R2F1c3NpYW4gS2VybmVsIFdpZHQjMzFDQUNBLnBkZgAOALgAWwBHAGEAdQBzAHMAaQBhAG4AIABLAGUAcgBuAGUAbAAgAFcAaQBkAHQAaAAgAFMAZQBsAGUAYwB0AGkAbwBuACAAYQBuAGQAIABGAGEAcwB0ACAAQwBsAHUAcwB0AGUAcgAgAEwAYQBiAGUAbABpAG4AZwAgAGYAbwByACAAUwB1AHAAcABvAHIAdAAgAFYAZQBjAHQAbwByACAAQwBsAHUAcwB0AGUAcgBpAG4AZwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAgS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9MZWUvR2F1c3NpYW4gS2VybmVsIFdpZHRoIFNlbGVjdGlvbiBhbmQgRmFzdCBDbHVzdGVyIExhYmVsaW5nIGZvciBTdXBwb3J0IFZlY3RvciBDbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqARwBIQEpA8cDyQPOA9cD4gPmA/QD+wQEBAkEDAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAQZ},
      Bdsk-Url-1 = {http://www.cs.uml.edu/~kdaniels/papers/SeiTechReport2005.pdf}
    }

The SVM training part is performed by the means of the LIBSVM library, whereas the graph utilities are provided by the Boost Graph Library. Both libraries allow to redistribute the source code under some license terms, so the package you download contains everything you need to compile the code, you have just to type “make” in the source root directory.

For more information, take a look to the README directory you find once you have unpacked the tarball.

Download

SVC Source Code - SVC Doxygen documentation

November 18 2007

Multivariate Data Analysis Software and Resources

A collection of the software for multivariate data analysis is available here.

November 14 2007

Stesura tesi - Sesta bozza 14/11

Sixth draft of thesis. Contents are (in bold new contents)

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 9: Support Vector Clustering software development
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

November 06 2007

Stesura tesi - Quinta bozza 06/11

Fifth draft of thesis. Contents are

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 9: Support Vector Clustering software development
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

November 06 2007

Bregman divergences, SVMs and possible implications

In order to find a connection between the works studied (Bregman Co-clustering and Support Vector Clustering) we have performed some research. An interesting result are the following paper:

  • R. Nock and F. Nielsen, "Fitting the smallest enclosing Bregman balls," in 16th European Conference on Machine Learning, 2005, pp. 649-656.
    @conference{bregmanmeb05,
      author = {Richard Nock and Frank Nielsen},
      Booktitle = {16th European Conference on Machine Learning},
      Date-Added = {2007-06-23 11:00:19 +0200},
      Date-Modified = {2007-11-14 12:55:32 +0100},
      Keywords = {bregman, MEB},
      Number = {3720},
      Pages = {649–656},
      Publisher = {Springer-Verlag},
      Series = {Lectures Notes on Computer Science Series},
      Title = {{Fitting the smallest enclosing Bregman balls}},
      Url = {http://www.sonycsl.co.jp/person/nielsen/BregmanBall/nn-ecml-05.pdf},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEEUuLi8uLi8uLi9QYXBlcnMvTm9jay9GaXR0aW5nIHRoZSBzbWFsbGVzdCBlbmNsb3NpbmcgQnJlZ21hbiBiYWxscy5wZGbSGw8cHVdOUy5kYXRhTxECHAAAAAACHAACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAAN6DVH0ZpdHRpbmcgdGhlIHNtYWxsZXN0IzM3QTBEMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3oNPCoreIAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAROb2NrABAACAAAvs5cjgAAABEACAAAwqKbaAAAAAEAFAA3oNUANxuAAACy8gAAEsYAABKtAAIAT0RvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpOb2NrOkZpdHRpbmcgdGhlIHNtYWxsZXN0IzM3QTBEMy5wZGYAAA4AYgAwAEYAaQB0AHQAaQBuAGcAIAB0AGgAZQAgAHMAbQBhAGwAbABlAHMAdAAgAGUAbgBjAGwAbwBzAGkAbgBnACAAQgByAGUAZwBtAGEAbgAgAGIAYQBsAGwAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAVy9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9Ob2NrL0ZpdHRpbmcgdGhlIHNtYWxsZXN0IGVuY2xvc2luZyBCcmVnbWFuIGJhbGxzLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAPIA9wD/Ax8DIQMmAy8DOgM+A0wDUwNcA2EDZAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANx},
      Bdsk-Url-1 = {http://www.sonycsl.co.jp/person/nielsen/BregmanBall/nn-ecml-05.pdf}
    }

The above paper generalizes the Minimum Enclosing Ball (MEB) problem to the Bregman divergences and also provide a generalization of the Bâdoiu-Clarkson (BC) approximation algorith. This is the same algorithm exploited in practical by the Core Vector Machines

  • I. W. Tsang, J. T. Kwok, and P. Cheung, "Core vector machines: Fast SVM training on very large data sets," Journal of Machine Learning Research, vol. 6, pp. 363-392, 2005.
    @article{cvm05,
      author = {Ivor W. Tsang and James T. Kwok and Pak-Ming Cheung},
      Date-Added = {2007-05-26 12:49:30 +0200},
      Date-Modified = {2007-06-23 08:23:02 +0200},
      Journal = {Journal of Machine Learning Research},
      Keywords = {SVM, CVM, MEB, SVDD},
      Pages = {363–392},
      Title = {Core vector machines: Fast SVM training on very large data sets},
      Url = {http://www.cs.ust.hk/%7Eivor/publication/tsang05a.pdf},
      Volume = {6},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEFguLi8uLi8uLi9QYXBlcnMvVHNhbmcvQ29yZSB2ZWN0b3IgbWFjaGluZXMgRmFzdCBTVk0gdHJhaW5pbmcgb24gdmVyeSBsYXJnZSBkYXRhIHNldHMucGRm0hsPHB1XTlMuZGF0YU8RAlQAAAAAAlQAAgAACURvY3VtZW50cwAAAAAAAAAAAAAAAAAAAAAAAL7OeK5IKwAAADctuR9Db3JlIHZlY3RvciBtYWNoaW5lcyMzMkY0MjEucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMvQhwn3aZgAAAAAAAAAAAAMAAwAACQAAAAAAAAAAAAAAAAAAAAAFVHNhbmcAABAACAAAvs5cjgAAABEACAAAwn2+RgAAAAEAFAA3LbkANxuAAACy8gAAEsYAABKtAAIAUERvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpUc2FuZzpDb3JlIHZlY3RvciBtYWNoaW5lcyMzMkY0MjEucGRmAA4AhgBCAEMAbwByAGUAIAB2AGUAYwB0AG8AcgAgAG0AYQBjAGgAaQBuAGUAcwAgAEYAYQBzAHQAIABTAFYATQAgAHQAcgBhAGkAbgBpAG4AZwAgAG8AbgAgAHYAZQByAHkAIABsAGEAcgBnAGUAIABkAGEAdABhACAAcwBlAHQAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAai9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9Uc2FuZy9Db3JlIHZlY3RvciBtYWNoaW5lcyBGYXN0IFNWTSB0cmFpbmluZyBvbiB2ZXJ5IGxhcmdlIGRhdGEgc2V0cy5wZGYAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAQUBCgESA2oDbANxA3oDhQOJA5cDngOnA6wDrwAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAO8},
      Bdsk-Url-1 = {http://www.cs.ust.hk/~ivor/publication/tsang05a.pdf}
    }

CVMs reformulate the SVMs as a MEB problem. Since they use the BC algorithm and such an algorithm has been generalized to the Bregman divergences, the research on vector machines could have interesting implications.

November 04 2007

[OT] Star galaxies separation via SVM/CVM classification - Part 2

This is a modification of the experiments in this post.

I rapidly built a new training set and this time I use only this training set for training the SVM/CVM. Than, I test the new trained classifier on all three dataset of the previous post.

The training set contain 500 points and has been built using stars and galaxies from another portion of sky.

New accuracy results (SVM)

Longo 01: 95,96 %
Longo 02: 98,08 %
Longo 03: 97,956 %

New accuracy results (CVM)

Longo 01: 96,31 %
Longo 02: 97,67 %
Longo 03: 97,138 %

Let us consider the Longo 02 tested with CVM. We have

Completeness for Stars: 98,4 %
Contamination for Stars: 4,7 %

Completeness for Galaxies: 95,4 %
Contamination for Galaxies: 1,5 %

November 02 2007

Stesura tesi - Quarta bozza 01/11

UPDATED: Chapter 10 added

Fourth draft of thesis. Contents are

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Thesis Web Log
  • Bibliography

Downloads

Changelog download - Thesis download

This blog is multi language by p.osting.it's Babel