Thesis – Final Draft

The final draft of the thesis is available for download here.

Final contents are:

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 8: Support Vector Clustering software development
  • Chapter 9: Experiments
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Resource usage of the algorithms
  • Appendix C: Star/Galaxy separation via Support Vector Machines
  • Appendix D: Thesis Web Log
  • Bibliography

IMPORTANT: the file name is changed, the links in the previous posts are broken. Download the thesis from this post or from the Documents page.

Downloads

Changelog downloadThesis download

Stesura tesi – Bozza RC1 16/01

RC1 draft of the thesis. Contents are

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 8: Support Vector Clustering software development
  • Chapter 9: Experiments (only overall conclusion missing)
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Resources usage of the algorithms
  • Appendix C: Star/Galaxy separation via Support Vector Machines
  • Appendix D: Thesis Web Log
  • Bibliography

Downloads

Changelog downloadThesis download

SVC and different kernels

Our experiments showed more than once that the employment of kernels other than the Gaussian one can significantly improve the results in certain circumstances.

From our experiments we know that

  • The Laplacian Kernel works well on some scaled/normalized data.
  • The Exponential Kernel generally behaves the same of the Gaussian one, but in some situations makes the difference, as happened in the experiments with IRIS data (multivariate) or CLASSIC3 data (text documents in BOW model with TF-IDF encoding).

These results suggest to go deeper in the matter and explore other kernels that can be useful in clustering with SVC.

Creating Vector Models from Text Documents

MC is a C++ program that creates vector-space models from
text documents that can be used for text mining applications. MC provides
an efficient multi-threaded implementation that can process very
large document collections. For example, MC took 1,189 seconds using
only 17.5 MBytes of main memory to process a sample collection of
about 114,000 documents (the experiment was run on a Sun Ultra10
workstation). More details on MC and its use in a fast clustering
algorithm are available in
this paper.

Download

Stesura tesi – Bozza pre-finale 06/01

Pre-final draft of thesis. Contents are (in bold new contents and updated ones)

  • Chapter 1: Introduction
  • Chapter 2: Machine learning essentials
  • Chapter 3: Clustering and related issues
  • Chapter 4: Previous works on clustering
  • Chapter 5: Minimum Bregman Information principle for Co-clustering
  • Chapter 6: Support Vector Clustering
  • Chapter 7: Alternative Support Vector Methods for Clustering
  • Chapter 8: Support Vector Clustering software development
  • Chapter 9: Experiments (Incomplete, Text Clustering results missing)
  • Chapter 10: Conclusion and Future Work
  • Appendix A: One Class classification via Support Vector Machines
  • Appendix B: Resources usage of the algorithms
  • Appendix C: Star/Galaxy separation via Support Vector Machines
  • Appendix D: Thesis Web Log
  • Bibliography

Downloads

Changelog downloadThesis download