July 09 2007

SVC: prestazioni del cluster assignment

With regard of this document, recently we have performed the test described at the paragraph 5 with all of two cluster assignment algorithms implemented: Complete Graph Cluster Labeling (CGCL, the classic one) and the Cone Cluster Labeling, respectively presented in

  • A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
    @article{svc,
      author = {A. Ben-Hur and D. Horn and H. T. Siegelmann and V. Vapnik},
      Date-Modified = {2007-06-19 14:44:40 +0200},
      Journal = {Journal of Machine Learning Research},
      Keywords = {clustering, SVM, gaussian kernel},
      Pages = {125-137},
      Title = {Support Vector Clustering},
      Url = {http://citeseer.ist.psu.edu/hur01support.html},
      Volume = 2, Year = 2001, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEDUuLi8uLi8uLi9QYXBlcnMvQmVuLUh1ci9TdXBwb3J0IFZlY3RvciBDbHVzdGVyaW5nLnBkZtIbDxwdV05TLmRhdGFPEQHqAAAAAAHqAAIAAAlEb2N1bWVudHMAAAAAAAAAAAAAAAAAAAAAAAC+zniuSCsAAAA3IEUdU3VwcG9ydCBWZWN0b3IgQ2×1c3RlcmluZy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACIbMsH5WY9QREYgAAAAAAADAAMAAAkAAAAAAAAAAAAAAAAAAAAAB0Jlbi1IdXIAABAACAAAvs5cjgAAABEACAAAwflLfwAAAAEAFAA3IEUANxuAAACy8gAAEsYAABKtAAIAUERvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpCZW4tSHVyOlN1cHBvcnQgVmVjdG9yIENsdXN0ZXJpbmcucGRmAA4APAAdAFMAdQBwAHAAbwByAHQAIABWAGUAYwB0AG8AcgAgAEMAbAB1AHMAdABlAHIAaQBuAGcALgBwAGQAZgAPABQACQBEAG8AYwB1AG0AZQBuAHQAcwASAEcvbmVtby9Eb2N1bWVudHMvVW5pdmVyc2l0YS9QYXBlcnMvQmVuLUh1ci9TdXBwb3J0IFZlY3RvciBDbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAOIA5wDvAt0C3wLkAu0C+AL8AwoDEQMaAx8DIgAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAMv},
      Bdsk-Url-1 = {http://citeseer.ist.psu.edu/hur01support.html}
    }

and

  • S. Lee and K. M. Daniels, "Cone Cluster Labeling for Support Vector Clustering," in Proceedings of 6th SIAM Conference on Data Mining, 2006, pp. 484-488.
    @inproceedings{cone2006,
      author = {Sei-Hyung Lee and Karen M. Daniels},
      Booktitle = {Proceedings of 6th SIAM Conference on Data Mining},
      Date-Added = {2007-04-29 16:58:13 +0200},
      Date-Modified = {2007-06-19 18:52:22 +0200},
      Keywords = {SVM, clustering},
      Month = {May},
      Pages = {484–488},
      Title = {Cone Cluster Labeling for Support Vector Clustering},
      Url = {http://www.siam.org/meetings/sdm06/proceedings/046lees.pdf},
      Year = {2006},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEEsuLi8uLi8uLi9QYXBlcnMvTGVlL0NvbmUgQ2×1c3RlciBMYWJlbGluZyBmb3IgU3VwcG9ydCBWZWN0b3IgQ2×1c3RlcmluZy5wZGbSGw8cHVdOUy5kYXRhTxECLgAAAAACLgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyVBH0NvbmUgQ2×1c3RlciBMYWJlbGluIzJGMDk0My5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAvCUPCWn72AAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAANMZWUAABAACAAAvs5cjgAAABEACAAAwlpi1gAAAAEAFAA3JUEANxuAAACy8gAAEsYAABKtAAIATkRvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpMZWU6Q29uZSBDbHVzdGVyIExhYmVsaW4jMkYwOTQzLnBkZgAOAHAANwBDAG8AbgBlACAAQwBsAHUAcwB0AGUAcgAgAEwAYQBiAGUAbABpAG4AZwAgAGYAbwByACAAUwB1AHAAcABvAHIAdAAgAFYAZQBjAHQAbwByACAAQwBsAHUAcwB0AGUAcgBpAG4AZwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAXS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9MZWUvQ29uZSBDbHVzdGVyIExhYmVsaW5nIGZvciBTdXBwb3J0IFZlY3RvciBDbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAPgA/QEFAzcDOQM+A0cDUgNWA2QDawN0A3kDfAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAOJ},
      Bdsk-Url-1 = {http://www.siam.org/meetings/sdm06/proceedings/046lees.pdf}
    }

In the first case we have a total execution time of 280.87 seconds; in the second case only 0.47 seconds was taken. In both cases, 0.25 seconds was taken in the domain description by the SVM. So, we can say that the classic algorithm was slower than CCL about of 99.92%.

The clustering results are the same and they are reported in the document mentioned above.

July 04 2007

SVC Preliminary Experiments

In the section Documents is available for download the PDF with the configurations used for tests and related results; is also available the ZIP archive containing the data-sets used for the experiments.

June 29 2007

Co-clustering Preliminary Experiments

In the section Documents is available for download the PDF with the configurations used for tests and related results; is also available the ZIP archive containing the data-sets used for the experiments.

June 19 2007

Dataset sintetici per Clustering Benchmark

Molto spesso, nell’eseguire i test di algoritmi di clustering, è molto utile avere a disposizione degli insiemi di dati campione sintetici, ovvero creati artificialmente e che non rispecchiano dei dati reali.

A tale scopo molto utile si rivela il lavoro fatto dal Center for Data Engineering, International Institute of Information Technology, Hyderabad, INDIA

  • J. R. Vennam and S. Vadapalli, "SynDECA: A Tool to Generate Synthetic Datasets for Evaluation of Clustering Algorithms," in 11th International Conference on Management of Data (COMAD 2005), Goa, India, 2005.
    @conference{syndeca2005, Address = {Goa, India},
      Author = {Jhansi Rani Vennam and Soujanya Vadapalli},
      Booktitle = {11th International Conference on Management of Data (COMAD 2005)},
      Date-Added = {2007-06-18 16:18:49 +0200},
      Date-Modified = {2007-07-03 18:34:02 +0200},
      Keywords = {clustering, tool, synthetic, dataset, generator},
      Month = {January},
      Organization = {http://cde.iiit.ac.in/syndeca},
      Title = {SynDECA: A Tool to Generate Synthetic Datasets for Evaluation of Clustering Algorithms},
      Url = {http://comad2005.persistent.co.in/COMAD2005Proc/pages027-036.pdf},
      Year = {2005},
      Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHAuLi8uLi8uLi9QYXBlcnMvVmVubmFtL1N5bkRFQ0EgQSBUb29sIHRvIEdlbmVyYXRlIFN5bnRoZXRpYyBEYXRhc2V0cyBmb3IgRXZhbHVhdGlvbiBvZiBDbHVzdGVyaW5nIEFsZ29yaXRobXMucGRm0hsPHB1XTlMuZGF0YU8RApwAAAAAApwAAgAACURvY3VtZW50cwAAAAAAAAAAAAAAAAAAAAAAAL7OeK5IKwAAADk5AR9TeW5ERUNBIEEgVG9vbCB0byBHZSMzOTM4RjcucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOTj3wrBG9QAAAAAAAAAAAAMAAwAACQAAAAAAAAAAAAAAAAAAAAAGVmVubmFtABAACAAAvs5cjgAAABEACAAAwrAq1QAAAAEAFAA5OQEANxuAAACy8gAAEsYAABKtAAIAUURvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpWZW5uYW06U3luREVDQSBBIFRvb2wgdG8gR2UjMzkzOEY3LnBkZgAADgC0AFkAUwB5AG4ARABFAEMAQQAgAEEAIABUAG8AbwBsACAAdABvACAARwBlAG4AZQByAGEAdABlACAAUwB5AG4AdABoAGUAdABpAGMAIABEAGEAdABhAHMAZQB0AHMAIABmAG8AcgAgAEUAdgBhAGwAdQBhAHQAaQBvAG4AIABvAGYAIABDAGwAdQBzAHQAZQByAGkAbgBnACAAQQBsAGcAbwByAGkAdABoAG0AcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAgi9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9WZW5uYW0vU3luREVDQSBBIFRvb2wgdG8gR2VuZXJhdGUgU3ludGhldGljIERhdGFzZXRzIGZvciBFdmFsdWF0aW9uIG9mIENsdXN0ZXJpbmcgQWxnb3JpdGhtcy5wZGYAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAR0BIgEqA8oDzAPRA9oD5QPpA/cD/gQHBAwEDwAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAQc},
      Bdsk-Url-1 = {http://comad2005.persistent.co.in/COMAD2005Proc/pages027-036.pdf}
    }

Lo strumento riesce a produrre dataset sintetici molto rapidamente; in genere un insieme con spazio delle feature 2D, con un milione di punti e centinaia di cluster, viene prodotto in pochi secondi.

Per ogni insieme prodotto, viene fornito dettagli sul clustering, come:

- quali punti appartengono a quali cluster
- quanti cluster
- quanti punti per cluster
- forma dei cluster
- etc.

This blog is multi language by p.osting.it's Babel