#include <SVClustering.h>
Inheritance diagram for damina::SVClustering:


Public Member Functions | |
| virtual void | clusterize () |
| Run the clustering process. | |
| virtual std::vector< int > & | getClustersAssignment () |
| Returns the clustering assignments. | |
| virtual int | getKernelType () |
| Returns the current kernel type. | |
| virtual double | getKernelWidth () |
| Returns the current value of the Kernel Width. | |
| virtual struct svm_model * | getModel () |
| Returns the trained model. | |
| virtual unsigned long int | getNumberOfClusters () |
| Returns the number of clusters found. | |
| virtual unsigned long int | getNumberOfNonSingletonClusters () |
| Returns the number of clusters containing more than one point. | |
| virtual unsigned long int | getNumberOfValidClusters (unsigned long int) |
| Returns the number of clusters with a number of elements greater than or equal to a certain threshold. | |
| virtual std::vector< int > & | getOriginalClasses (int) |
| Returns the original classes assignments for the points in input. | |
| virtual struct svm_parameter * | getParameters () |
| Returns the SVM paramters. | |
| virtual std::vector< unsigned long int > & | getPointPerCluster () |
| Returns the number of points per cluster. | |
| virtual struct svm_problem * | getProblem () |
| Returns the current input data set in the libsvm format. | |
| virtual double | getSoftConstraint () |
| Returns the current value of the Soft Margin Constraint. | |
| virtual double | getSoftConstraintEstimate () |
| Return the estimate of a more appropriate Soft Margin parameter. | |
| virtual double | getSphereRadius () |
| Returns the current value of the sphere radius. | |
| virtual DataSet * | getTrainingSet () |
| Returns the current input data set. | |
| virtual double | initialKernelWidth () |
| Computes the initial gaussian kernel width, as proposed in. | |
| virtual void | learn (struct svm_problem *) |
| The learning process. | |
| virtual void | learn (DataSet *) |
| The learning process. | |
| virtual void | learn () |
| The learning process. | |
| virtual void | setEuclideanDistance (EuclideanDistance *) |
| Set a vector-distance measure valid for the Euclidean Space. | |
| virtual void | setKernelType (int) |
| Set the kernel type for the SVM. | |
| virtual void | setKernelWidth (double) |
| Set the new width for Gaussian Kernel. | |
| virtual void | setProblem (struct svm_problem *) |
| Set the input data set in libsvm format. | |
| virtual void | setSoftConstraint (double) |
| Set the new value for the Soft Margin Constant. | |
| virtual void | setTrainingSet (DataSet *) |
| Set the input data set. | |
| SVClustering (double, double, struct svm_problem *) | |
| Constructor. | |
| SVClustering (double, double, DataSet *) | |
| Constructor. | |
| SVClustering (double, struct svm_problem *) | |
| Constructor. | |
| SVClustering (double, DataSet *) | |
| Constructor. | |
| SVClustering (struct svm_problem *) | |
| Constructor. | |
| SVClustering (DataSet *) | |
| Constructor. | |
| virtual void | useClassicClusteringPolicyForBSV () |
| Use the classic clustering policy to clusterize BSVs. | |
| virtual void | useSpecialClusteringPolicyForBSV () |
| Use a special clustering policy to clusterize BSVs. | |
| virtual bool | usingSpecialClusteringPolicyForBSV () |
| Return true if the SVC is using the special policy to clusterize BSV. | |
| virtual | ~SVClustering () |
| Destructor. | |
Protected Member Functions | |
| virtual double | beta (int, bool) |
| Returns the i-th lagrangian multiplier. | |
| virtual double | beta (int) |
| Returns the i-th lagrangian multiplier. | |
| virtual double | calculateDistanceFromCenter (struct svm_node *) |
| Calculate the distance of a point from the center of the sphere. | |
| virtual void | calculateSphereRadius () |
| Calculate the sphere radius. | |
| virtual struct svm_node * | pointOnThePath (struct svm_node *, struct svm_node *, double) |
| Samples a point on the path between two points. | |
| virtual double | rho (bool) |
| Returns the rho value of the decision function computed by One Class SVM. | |
| virtual double | rho () |
| Returns the rho value of the decision function computed by One Class SVM. | |
| virtual void | separateClusters () |
| Run the clustering process. | |
Protected Attributes | |
| double | C_estimation |
| An estimation of a more appropriate C value, based on the heuristics proposed in the last paragraph of. | |
| EuclideanDistance * | eucDist |
| Euclidean Measure for compute distance between vectors. | |
| std::vector< int > | labels |
| Clustering assignments. | |
| OneClassSVM * | trainer |
| The One Class SVM for the first step of the SV Clustering process. | |
Private Member Functions | |
| void | calculateQuadraticPartOfDistanceFromCenter () |
| Calculate the quadratic part of the point distance equation. | |
| virtual void | clusterize (DataSet *) |
| virtual DataSet * | getTestSet () |
| virtual void | setTestSet (DataSet *) |
Private Attributes | |
| double | C |
| The soft constraint in the original SV Clustering formulation. | |
| bool | clusterizeBSV |
| long int | numberOfClusters |
| The number of clusters found. | |
| std::vector< int > | originalClasses |
| This could be contains the right assignments. | |
| std::vector< unsigned long int > | pointPerCluster |
| Number of points per cluster. | |
| double * | quadratic |
| The quadratic part of the distance from the sphere's center It's the same for all points. | |
| double | sphereRadius |
| The radius of the Sphere in the Feature space. | |
This class implements the Support Vector Clustering as proposed in
A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
The Support Vector Clustering is among the few attempts to use the SVMs in an unsupervised way.
It consists mainly of two steps:
1. Cluster description
=======================
Find the Minimum Enclosing Sphere in the feature space, exactly as SVDD described in
D. M. J. Tax and R. P. W. Duin, "Data Domain Description using Support Vectors," in European Symposium on Artificial Neural Network, Bruges (Belgium), 1999.
D. M. J. TAX, "One-class classification: concept learning in the absence of counter-examples," PhD Thesis , 2001.
This leads to a Quadratic Programming problem like in the classic SVM training problem
This sphere, mapped back to the data space, split in various contours, each representing a cluster.
Here, we use One-class classification by Schoelkopf et al (2001) instead of SVDD by Tax.
The two methods are showed to be equivalent under following conditions:
a. The data point must be normalized to the unit norm vectors (implicitly done using the Gaussian Kernel)
b. The kernel width must be the same
c. The C parameter in SVDD must be equal to 1/(nu*N), where nu is the parameter in One-class SVM and N is the number of input elements.
2. Cluster labeling
===================
The cluster description algorithm does not differentiate between points that belong to differ- ent clusters. To do so, we use a geometric approach involving R(x), based on the following observation: given a pair of data points that belong to different components (clusters), any path that connects them must exit from the sphere in feature space.
So, we build a graph such that two vertex are connected if their images in feature space are connected by a path which lie entirely inside the sphere. The connected components algorithm on this graph returns the clusters.
Definition at line 98 of file SVClustering.h.
| damina::SVClustering::SVClustering | ( | DataSet * | train | ) |
Constructor.
| train | The input data set |
Definition at line 12 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, initialKernelWidth(), quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::AbstractSVM::setTolerance(), damina::OneClassSVM::setTrainingSet(), and trainer.
00012 { 00013 this->eucDist = new L2Distance(); 00014 this->trainer = new OneClassSVM(); 00015 this->trainer->setKernel(KernelType::gaussian); 00016 this->trainer->setCacheSize(100); 00017 this->trainer->setTolerance(0.000001); 00018 this->trainer->setTrainingSet(train); 00019 this->trainer->setKernelWidth(initialKernelWidth()); 00020 this->quadratic = NULL; 00021 this->clusterizeBSV = true; 00022 }
Here is the call graph for this function:

| damina::SVClustering::SVClustering | ( | struct svm_problem * | p | ) |
Constructor.
| p | The input data in libsvm format |
Definition at line 29 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, initialKernelWidth(), quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::OneClassSVM::setProblem(), damina::AbstractSVM::setTolerance(), and trainer.
00029 { 00030 this->eucDist = new L2Distance(); 00031 this->trainer = new OneClassSVM(); 00032 this->trainer->setKernel(KernelType::gaussian); 00033 this->trainer->setCacheSize(100); 00034 this->trainer->setTolerance(0.000001); 00035 this->trainer->setProblem(p); 00036 this->trainer->setKernelWidth(initialKernelWidth()); 00037 this->quadratic = NULL; 00038 this->clusterizeBSV = true; 00039 }
Here is the call graph for this function:

| damina::SVClustering::SVClustering | ( | double | C, | |
| DataSet * | train | |||
| ) |
Constructor.
| C | The soft margin constant | |
| train | The input data set |
Definition at line 47 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, damina::DataSet::getSize(), initialKernelWidth(), quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::OneClassSVM::setNU(), damina::AbstractSVM::setTolerance(), damina::OneClassSVM::setTrainingSet(), and trainer.
00047 { 00048 this->eucDist = new L2Distance(); 00049 this->trainer = new OneClassSVM(); 00050 this->trainer->setKernel(KernelType::gaussian); 00051 this->trainer->setCacheSize(100); 00052 this->trainer->setTolerance(0.000001); 00053 00054 this->trainer->setTrainingSet(train); 00055 00056 this->trainer->setKernelWidth(initialKernelWidth()); 00057 00058 00059 this->trainer->setNU((1 / (train->getSize() * C))); 00060 this->C = C; 00061 this->quadratic = NULL; 00062 this->clusterizeBSV = true; 00063 }
Here is the call graph for this function:

| damina::SVClustering::SVClustering | ( | double | C, | |
| struct svm_problem * | prob | |||
| ) |
Constructor.
| C | The soft margin constant | |
| prob | The input data set in libsvm format |
Definition at line 71 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, initialKernelWidth(), quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::OneClassSVM::setNU(), damina::OneClassSVM::setProblem(), damina::AbstractSVM::setTolerance(), and trainer.
00071 { 00072 this->eucDist = new L2Distance(); 00073 this->trainer = new OneClassSVM(); 00074 this->trainer->setKernel(KernelType::gaussian); 00075 this->trainer->setCacheSize(100); 00076 this->trainer->setTolerance(0.000001); 00077 00078 this->trainer->setProblem(prob); 00079 00080 this->trainer->setKernelWidth(initialKernelWidth()); 00081 00082 this->trainer->setNU((1 / (prob->l * C))); 00083 this->C = C; 00084 this->quadratic = NULL; 00085 this->clusterizeBSV = true; 00086 }
Here is the call graph for this function:

| damina::SVClustering::SVClustering | ( | double | C, | |
| double | w, | |||
| DataSet * | train | |||
| ) |
Constructor.
| C | The soft margin constant | |
| w | The gaussian kernel width | |
| train | The input data set |
Definition at line 95 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, damina::DataSet::getSize(), quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::OneClassSVM::setNU(), damina::AbstractSVM::setTolerance(), damina::OneClassSVM::setTrainingSet(), and trainer.
00095 { 00096 this->eucDist = new L2Distance(); 00097 this->trainer = new OneClassSVM(); 00098 this->trainer->setKernel(KernelType::gaussian); 00099 this->trainer->setCacheSize(100); 00100 this->trainer->setTolerance(0.000001); 00101 00102 this->trainer->setTrainingSet(train); 00103 00104 this->trainer->setNU((1 / (train->getSize() * C))); 00105 this->C = C; 00106 00107 this->trainer->setKernelWidth(w); 00108 this->quadratic = NULL; 00109 this->clusterizeBSV = true; 00110 }
Here is the call graph for this function:

| damina::SVClustering::SVClustering | ( | double | C, | |
| double | w, | |||
| struct svm_problem * | prob | |||
| ) |
Constructor.
| C | The soft margin constant | |
| w | The gaussian kernel width | |
| prob | The input data set in libsvm format |
Definition at line 119 of file SVClustering.cpp.
References clusterizeBSV, eucDist, damina::KernelType::gaussian, quadratic, damina::AbstractSVM::setCacheSize(), damina::AbstractSVM::setKernel(), damina::AbstractSVM::setKernelWidth(), damina::OneClassSVM::setNU(), damina::OneClassSVM::setProblem(), damina::AbstractSVM::setTolerance(), and trainer.
00119 { 00120 this->eucDist = new L2Distance(); 00121 this->trainer = new OneClassSVM(); 00122 this->trainer->setKernel(KernelType::gaussian); 00123 this->trainer->setCacheSize(100); 00124 this->trainer->setTolerance(0.000001); 00125 00126 this->trainer->setProblem(prob); 00127 00128 this->trainer->setNU((1 / (prob->l * C))); 00129 this->C = C; 00130 00131 this->trainer->setKernelWidth(w); 00132 this->quadratic = NULL; 00133 this->clusterizeBSV = true; 00134 }
Here is the call graph for this function:

| damina::SVClustering::~SVClustering | ( | ) | [virtual] |
| double damina::SVClustering::beta | ( | int | i, | |
| bool | scaled | |||
| ) | [protected, virtual] |
Returns the i-th lagrangian multiplier.
The One Class SVM solves the Quadratic Programming problem and computes the lagrangians "beta". LibSVM solves a scaled version of the One Class SVM problem, so this method returns either the scaled value of a beta or the normal one, depeding on the 'scaled' param.
To obtain the normal value of a lagrangian, we multiply it by C.
The betas equal to zero (non-support vector ones) are not in the solution.
| i | The index of the lagrangian | |
| scaled | True if you want the scaled version, false otherwise |
Definition at line 590 of file SVClustering.cpp.
References damina::OneClassSVM::getModel(), getSoftConstraint(), and trainer.
00590 { 00591 if (scaled) { 00592 return this->trainer->getModel()->sv_coef[0][i]; 00593 } 00594 else { 00595 return this->trainer->getModel()->sv_coef[0][i] * getSoftConstraint(); 00596 } 00597 }
Here is the call graph for this function:

| double damina::SVClustering::beta | ( | int | i | ) | [protected, virtual] |
Returns the i-th lagrangian multiplier.
The One Class SVM solves the Quadratic Programming problem and computes the lagrangians "beta". This method returns the i-th beta, multiplied by C, because LibSVM solves a scaled version of the One Class SVM problem. The betas equal to zero (non-support vector ones) are not in the solution.
| i | The index of the lagrangian |
Definition at line 569 of file SVClustering.cpp.
References C, damina::OneClassSVM::getModel(), and trainer.
Referenced by calculateDistanceFromCenter(), calculateQuadraticPartOfDistanceFromCenter(), damina::CCLSVClustering::experimentalSeparateClusters(), and damina::CCLSVClustering::separateClusters().
Here is the call graph for this function:

Here is the caller graph for this function:

| double damina::SVClustering::calculateDistanceFromCenter | ( | struct svm_node * | x | ) | [protected, virtual] |
Calculate the distance of a point from the center of the sphere.
Reference:
A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
The Equation nr. 13 in the paper above is the distance of a point from the center of Sphere
We sum over support vectors only (including Bounded SVs), because the Betas (lagrangian multipliers) are zero for non-SVs.
Furthermore, we replace K(x,x) with 1, because with the Gaussian Kernel, K(x,x) = 1 for all x.
| x | The point |
Definition at line 695 of file SVClustering.cpp.
References beta(), calculateQuadraticPartOfDistanceFromCenter(), damina::OneClassSVM::getModel(), and trainer.
Referenced by calculateSphereRadius(), and separateClusters().
00695 { 00696 double sum = 0; 00697 00698 struct svm_node **SV = this->trainer->getModel()->SV; 00699 int nSV = this->trainer->getModel()->l; 00700 00701 00702 for (int i = 0; i < nSV; i++) { 00703 sum += beta(i) * kernelfunction(SV[i], x, this->trainer->getModel()->param); 00704 } 00705 00706 00707 calculateQuadraticPartOfDistanceFromCenter(); 00708 00709 // 1 is because (in case of RBF kernel used here) K(x,x) is equal to 1 00710 // (therotically provable and pratically tested with libsvm) 00711 //return sqrt(1 - (2 * sum) + *(this->quadratic)); 00712 return 1 - (2 * sum) + *(this->quadratic); 00713 }
Here is the call graph for this function:

Here is the caller graph for this function:

| void damina::SVClustering::calculateQuadraticPartOfDistanceFromCenter | ( | ) | [private] |
Calculate the quadratic part of the point distance equation.
Reference:
A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
The Equation nr. 13 in the paper above is the distance of a point from the center of Sphere
In that equation, the quadratic part (sum over i, j beta_i beta_j K(x_i, x_j)) is the same for all points, so we compute it once.
Furthermore, we sum over support vectors only (including Bounded SVs), because the Betas (lagrangian multipliers) are zero for non-SVs
Definition at line 652 of file SVClustering.cpp.
References beta(), damina::OneClassSVM::getModel(), quadratic, and trainer.
Referenced by calculateDistanceFromCenter().
00652 { 00653 00654 if (quadratic != NULL) { 00655 return; 00656 } 00657 00658 struct svm_node **SV = this->trainer->getModel()->SV; 00659 int nSV = this->trainer->getModel()->l; 00660 double tmp = 0; 00661 00662 00663 for (int i = 0; i < nSV; i++) { 00664 for (int j = 0; j < nSV; j++) { 00665 tmp += beta(i) * beta(j) * kernelfunction(SV[i], SV[j], this->trainer->getModel()->param); 00666 } 00667 } 00668 00669 this->quadratic = (double *)malloc(sizeof(double)); 00670 *(this->quadratic) = tmp; 00671 }
Here is the call graph for this function:

Here is the caller graph for this function:

| void damina::SVClustering::calculateSphereRadius | ( | ) | [protected, virtual] |
Calculate the sphere radius.
Reference:
A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
Equation nr. 14 in the paper above is radius of the Sphere.
As suggested in
J. Yang, V. Estivill-Castro, and S. K. Chalup, "Support vector clustering through proximity graph modelling," in Neural Information Processing, 2002. ICONIP ‘02. Proceedings of the 9th International Conference on, 2002, pp. 898-903.
we can use the average among SVs' distances from center to calculate the radius.
However, theoretically, all the SVs must have the same distance from center, so we have tested and found that using the distance of just one SV is a good approximation as much as to compute the aforementioned average.
This solution avoids to compute the distance for all SVs.
Definition at line 741 of file SVClustering.cpp.
References calculateDistanceFromCenter(), and sphereRadius.
Referenced by getSphereRadius().
00741 { 00742 00743 // first tecnique 00744 // 00745 // assuming SVs have all the same distanace from center (theoretically a correct assumption) 00746 // to avoid a check on the distance of all SVs 00747 // 00748 this->sphereRadius = this->calculateDistanceFromCenter(this->trainer->getModel()->SV[0]); 00749 00750 //if (this->sphereRadius >= 1) { 00751 // free(quadratic); 00752 // this->sphereRadius = this->calculateDistanceFromCenter(this->trainer->getModel()->SV[0]); 00753 //} 00754 00755 // second tecnique 00756 // 00757 // we can use the average among SVs' distances to calculate the radius 00758 00759 //int nSV = this->trainer->getModel()->l; 00760 //this->sphereRadius = 0; 00761 //for (int i = 0; i < nSV; i++) { 00762 // this->sphereRadius += this->calculateDistanceFromCenter(this->trainer->getModel()->SV[i]); 00763 //} 00764 //this->sphereRadius = this->sphereRadius / nSV; 00765 00766 // third tecnique 00767 // Using One Class SVM, Radius = sqrt(1 - rho/2) 00768 //sphereRadius = sqrt(1 - (rho()/2)); 00769 }
Here is the call graph for this function:

Here is the caller graph for this function:

| void damina::SVClustering::clusterize | ( | ) | [virtual] |
Run the clustering process.
In this case it runs the Step 2 of Support Vector Clustering, called "Cluster Labeling" by its authors.
The implementation reflects the first solution proposed in
A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
namely, Complete Graph Cluster Labeling.
Implements damina::ClusteringEngine.
Reimplemented in damina::CCLSVClustering.
Definition at line 376 of file SVClustering.cpp.
References separateClusters().
00376 { 00377 this->separateClusters(); 00378 }
Here is the call graph for this function:

| virtual void damina::SVClustering::clusterize | ( | DataSet * | ) | [inline, private, virtual] |
| std::vector< int > & damina::SVC |