• Home
  • Abstract
  • Bibliography
  • Contacts
  • Documents
  • Other stuff
  • University and Professors
  • Subscribe

Thesis Neminis

Diario di lavoro della tesi di Vincenzo Russo / Work-log of Vincenzo Russo’s Thesis

Abstract

Clustering is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal is to create clusters that are coherent internally, but substantially different from each other. In plain words, objects in the same cluster should be as similar as possible, whereas objects in one cluster should be as dissimilar as possible from objects in the other clusters.

Clustering is an unsupervised learning technique, because it groups objects in clusters without any additional information: only the information provided by data is used and no human operation adds bits of information to improve the learning.

The application domains are manifold. For example, the grouping of text documents: in this case the goal is the construction of groups of documents related to each other, i.e. documents treating the same argument.

The goal of this thesis is studying in depth state-of-the-art and experimental clustering techniques. We consider two techniques. The first is known as Minimum Bregman Information principle. Such a principle generalizes the classic relocation scheme adopted yet by K-means, in order to allow the employment of a rich gamma of divergence functions said just Bregman divergences. A new, more general, clustering scheme was developed on top of this principle. Moreover, a co-clustering scheme is formulated too. This leads to an important generalization, as we will see in the sequel.

The second approach is the Support Vector Clustering. It is a clustering process which relies on the state-of-the-art of the learning machines: the Support Vector Machines. The Support Vector Clustering is currently subject of active research, as it still is in early stage of development. We have accurately analyzed such a clustering method and we have also provided some contributions which allow allow a reduction in the number of iterations and in the computational complexity and a gain in accuracy.

The main application domains we have dealt to are the text mining and the astrophysics data mining. Within these application domains we have verified and accurately analyzed the properties of both methodologies, by means of dedicated experiments.

The results are given in terms of robustness w.r.t. the missing values, the dimensionality reduction, the robustness w.r.t. the noise and the outliers, the ability of describing clusters of arbitrary shapes.

  • Pages

    • Abstract
    • Bibliography
      • Books
      • Conference papers
      • Journal papers
      • Others
      • Papers in collections
      • Technical reports
      • Theses
      • Unpublished papers
    • Contacts
    • Documents
    • Other stuff
    • University and Professors
  • Recent Posts

    • Thesis and Talk
    • Final Mark
    • Final talk
    • Thesis - Final Draft
    • Stesura tesi - Bozza RC2 19/01
  • Recent Comments

    • caleb on Support Vector Clustering Code
    • sam on Support Vector Clustering Code
    • Vincenzo Russo on Support Vector Clustering Code
    • Vincenzo Russo on Support Vector Clustering Code
    • mamatha on Support Vector Clustering Code
  • Categories

    • Appunti
    • Astrophysics
    • Benchmark
    • Bregman
    • Classification
    • Cluster Labeling
    • Clustering
    • Co-clustering
    • Data Mining
    • Dataset
    • Kernel Width Estimation
    • Keynote
    • LS-SVM
    • Missing values
    • MLP
    • Off-Topic
    • Software
    • SVC
    • SVM
    • Test
    • Text Mining
    • Thesis
    • TODO
  • Archives

    • April 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007

Get smart with the Thesis Theme from DIY Themes.

bilgi oyun indir saglik bilgileri msn nickleri forum bilgiler cilt bakimi