<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Support Vector Clustering Code</title>
	<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/</link>
	<description>Diario di lavoro della tesi di Vincenzo Russo / Work-log of Vincenzo Russo's Thesis</description>
	<pubDate>Thu, 24 Jul 2008 10:27:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: Domus Neminis &#8212; Announcing LIBSVM Plus</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-12415</link>
		<dc:creator>Domus Neminis &#8212; Announcing LIBSVM Plus</dc:creator>
		<pubDate>Mon, 23 Jun 2008 12:19:25 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-12415</guid>
		<description>[...] basic for the next generation of the Support Vector Clustering (SVC) library, which will replace the SVC software I developed for my master thesis and that will have a lot of new [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] basic for the next generation of the Support Vector Clustering (SVC) library, which will replace the SVC software I developed for my master thesis and that will have a lot of new [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincenzo Russo</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-10491</link>
		<dc:creator>Vincenzo Russo</dc:creator>
		<pubDate>Tue, 29 Apr 2008 07:58:48 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-10491</guid>
		<description>Dear Lawrence, 

   Sorry for the late. I was not able to connect for a week because I was not at home. 

To be short, this is what I would do: 

1. Run K-means several times with different 'k' values and choose the best instance. "Best" here means the instance that produce the best value for some validity index (C-index are supposed to  be a good choice for K-means).

2. Run SVC several times with different parameters settings (kernel, C, q, etc.) and choose the best instance according to a validity index (the best choice is the index specifically developed for SVC, I guess). 

3. Compare the results of the "best k-means instance" and the "best SVC instance".

That's all.

And no, I am sure that the validity index developed for the SVC does not fit the K-means because the index relies on specific characteristics of the SVC.

I hope I am of help. 

Best, 

   VR.</description>
		<content:encoded><![CDATA[<p>Dear Lawrence, </p>
<p>   Sorry for the late. I was not able to connect for a week because I was not at home. </p>
<p>To be short, this is what I would do: </p>
<p>1. Run K-means several times with different &#8216;k&#8217; values and choose the best instance. &#8220;Best&#8221; here means the instance that produce the best value for some validity index (C-index are supposed to  be a good choice for K-means).</p>
<p>2. Run SVC several times with different parameters settings (kernel, C, q, etc.) and choose the best instance according to a validity index (the best choice is the index specifically developed for SVC, I guess). </p>
<p>3. Compare the results of the &#8220;best k-means instance&#8221; and the &#8220;best SVC instance&#8221;.</p>
<p>That&#8217;s all.</p>
<p>And no, I am sure that the validity index developed for the SVC does not fit the K-means because the index relies on specific characteristics of the SVC.</p>
<p>I hope I am of help. </p>
<p>Best, </p>
<p>   VR.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-8782</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Sun, 20 Apr 2008 15:59:01 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-8782</guid>
		<description>Dear Vincenzo Russo，

   I have already have some idea on the evaluation of clustering results now. As you mentioned, from a review of the literature, generally there are three approaches, internal criteria, external criteria, and relative criteria, have been used for the quantitative evaluation of clustering results in many of the research studies. 

   Internal criteria are the only means to evaluate the clustering quality of a completely new domain. 

   External criteria imply clustering evaluation by means of external pre-specified structure information of a dataset. Generally, several real-life or synthetic datasets with prior class information such as Iris data, Wisconsin’s breast cancer database, and Spam database, are used as evaluation benchmark for clustering results. 

   Relative criteria (also known as validity index) evaluate clustering results with different input parameter settings of a same clustering algorithm. A number of validity indices have been developed and proposed in literature.

  If i want to compare the performance of K-means and SVC in a completely new domain without prior class information. There are two methods as follows:

1.In the process of running K-means and SVC, validity index is embeded respectively for parameter selection. When the optimal clusering resluts are found respectively, the clstering resluts are evaluated by internal criteria.

2.There is a validity index specific for SVC which is quite useful for parameter selection as you mentioned (I don't know if it can used for K-means). If different validity indices are used in K-means and SVC, I think they can't be compared. If the same validity index is used in both clustering algorithms, regardless of it is not appropriate for SVC, can the index value work as a crtieria of the performance of these two clustering algorithms?

Maybe I didn't well express my question. I hope you can understand. I am a little confused here.


Many thanks and Best regards

Lawrrence</description>
		<content:encoded><![CDATA[<p>Dear Vincenzo Russo，</p>
<p>   I have already have some idea on the evaluation of clustering results now. As you mentioned, from a review of the literature, generally there are three approaches, internal criteria, external criteria, and relative criteria, have been used for the quantitative evaluation of clustering results in many of the research studies. </p>
<p>   Internal criteria are the only means to evaluate the clustering quality of a completely new domain. </p>
<p>   External criteria imply clustering evaluation by means of external pre-specified structure information of a dataset. Generally, several real-life or synthetic datasets with prior class information such as Iris data, Wisconsin’s breast cancer database, and Spam database, are used as evaluation benchmark for clustering results. </p>
<p>   Relative criteria (also known as validity index) evaluate clustering results with different input parameter settings of a same clustering algorithm. A number of validity indices have been developed and proposed in literature.</p>
<p>  If i want to compare the performance of K-means and SVC in a completely new domain without prior class information. There are two methods as follows:</p>
<p>1.In the process of running K-means and SVC, validity index is embeded respectively for parameter selection. When the optimal clusering resluts are found respectively, the clstering resluts are evaluated by internal criteria.</p>
<p>2.There is a validity index specific for SVC which is quite useful for parameter selection as you mentioned (I don&#8217;t know if it can used for K-means). If different validity indices are used in K-means and SVC, I think they can&#8217;t be compared. If the same validity index is used in both clustering algorithms, regardless of it is not appropriate for SVC, can the index value work as a crtieria of the performance of these two clustering algorithms?</p>
<p>Maybe I didn&#8217;t well express my question. I hope you can understand. I am a little confused here.</p>
<p>Many thanks and Best regards</p>
<p>Lawrrence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7789</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Thu, 17 Apr 2008 13:25:50 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7789</guid>
		<description>Dear Vincenzo Russo， 

  Thank you very very much. It seems clear to me. I'd like to read relevant parts of your thesis and the paper suggested by you first. You really give me a great inspiration on how to evaluate the quality of clustering results when datasets are unlabeled. It's great appreciated for your help. 

   Best Regards

   Lawrence</description>
		<content:encoded><![CDATA[<p>Dear Vincenzo Russo， </p>
<p>  Thank you very very much. It seems clear to me. I&#8217;d like to read relevant parts of your thesis and the paper suggested by you first. You really give me a great inspiration on how to evaluate the quality of clustering results when datasets are unlabeled. It&#8217;s great appreciated for your help. </p>
<p>   Best Regards</p>
<p>   Lawrence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincenzo Russo</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7707</link>
		<dc:creator>Vincenzo Russo</dc:creator>
		<pubDate>Thu, 17 Apr 2008 08:14:59 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7707</guid>
		<description>Dear Lawrence, 

   welcome back. 

  Since your datasets are unlabeled, the first thing you need are some criteria to evaluate the quality of clustering results. The most effective way to evaluate clustering results when data are unlabeled, is a relative criterion (aka validity index).

In the section 3.3.3 of my thesis I present some of classical validity indices and in 

&lt;a href="http://portal.acm.org/citation.cfm?id=1294369.1294523&#038;coll=GUIDE&#038;dl=GUIDE&#038;CFID=15151515&#038;CFTOKEN=6184618" rel="nofollow"&gt;J. Wang and J. Chiang, "A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm," Pattern Recognition, vol. 41, iss. 2, pp. 506-520, 2008.&lt;/a&gt;

you can found a validity index specific for SVC. I used it to develop a new stopping criterion for SVC (not available in the software you are using).

So, you have to run the SVC and K-means several times, each with a different parameter settings and then evaluate the results. The better index value, the better the clustering. Since you probably don't know the number of clusters and K-means need it, a classical way is to run K-means with different number of clusters in input and then choose the instance that yields the best validity index value. As far as the SVC is concerned, you can try different combinations of q/C/kernel and choose the instance that yields the best value of the validity index.

I hope I was clear.

Best,
  VR.</description>
		<content:encoded><![CDATA[<p>Dear Lawrence, </p>
<p>   welcome back. </p>
<p>  Since your datasets are unlabeled, the first thing you need are some criteria to evaluate the quality of clustering results. The most effective way to evaluate clustering results when data are unlabeled, is a relative criterion (aka validity index).</p>
<p>In the section 3.3.3 of my thesis I present some of classical validity indices and in </p>
<p><a href="http://portal.acm.org/citation.cfm?id=1294369.1294523&#038;coll=GUIDE&#038;dl=GUIDE&#038;CFID=15151515&#038;CFTOKEN=6184618" rel="nofollow">J. Wang and J. Chiang, &#8220;A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm,&#8221; Pattern Recognition, vol. 41, iss. 2, pp. 506-520, 2008.</a></p>
<p>you can found a validity index specific for SVC. I used it to develop a new stopping criterion for SVC (not available in the software you are using).</p>
<p>So, you have to run the SVC and K-means several times, each with a different parameter settings and then evaluate the results. The better index value, the better the clustering. Since you probably don&#8217;t know the number of clusters and K-means need it, a classical way is to run K-means with different number of clusters in input and then choose the instance that yields the best validity index value. As far as the SVC is concerned, you can try different combinations of q/C/kernel and choose the instance that yields the best value of the validity index.</p>
<p>I hope I was clear.</p>
<p>Best,<br />
  VR.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7695</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Thu, 17 Apr 2008 07:36:46 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-7695</guid>
		<description>Dear Vincenzo Russo， 

   I notice that you compared the performance of different clustering methods using real-life benchmarks such as Iris data, Wisconsin’s breast cancer database, and Wine Recognition Database in your thesis. Camastra (2005) also compared the performance of the current clustering methods including K-means, Neural Gas, Self-Organizing Map (SOM), Spectral clustering algorithm and SVC on three kinds of real-life benchmarks. However, there already are class information for these data sets, that is, we have already know the class each observation belongs. It's quite OK and necessary that these datasets are used to compare the results of clustering methods. If I want to compare the performance of SVC and K-means on a dataset without class information ahead, do you have some suggestions?


   Many thanks and Best Regards

   Lawrence</description>
		<content:encoded><![CDATA[<p>Dear Vincenzo Russo， </p>
<p>   I notice that you compared the performance of different clustering methods using real-life benchmarks such as Iris data, Wisconsin’s breast cancer database, and Wine Recognition Database in your thesis. Camastra (2005) also compared the performance of the current clustering methods including K-means, Neural Gas, Self-Organizing Map (SOM), Spectral clustering algorithm and SVC on three kinds of real-life benchmarks. However, there already are class information for these data sets, that is, we have already know the class each observation belongs. It&#8217;s quite OK and necessary that these datasets are used to compare the results of clustering methods. If I want to compare the performance of SVC and K-means on a dataset without class information ahead, do you have some suggestions?</p>
<p>   Many thanks and Best Regards</p>
<p>   Lawrence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincenzo Russo</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4624</link>
		<dc:creator>Vincenzo Russo</dc:creator>
		<pubDate>Mon, 07 Apr 2008 14:44:55 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4624</guid>
		<description>Dear Lawrence, 

  the -s switch is supposed to be always used with 0.5 value. It's just a heuristics that allow to explore the 'q' value sequence in a smoother way. After a number of tries, the 0.5 value resulted the most suitable one. 

My advice is to perform the experiments both with and without the softening strategy heuristics, even though the softening should be result in more accurate clustering.

The second question: SVC i a hierarchical-like clustering, so it is a so-called non-parametric clustering algorithm. It DOES NOT use the number of clusters as input parameter for determining the clustering. Actually, it should not use it at all. In this version of my software such a parameter is a 'dirty trick' and serves for a stopping criterion in case of datasets which we know the exact number of clusters.

However, SVC could find not the same number of clusters you expect. This is why it auto-detects the latent structure. And it is not perfect. 

In such cases you need to try different parameters combination: differnet values of 'C' manually, different kernels (with -k switch you can use the Gaussian, the Laplace, the Exponential in my software), different metric distance (with -l switch you can use either the L1 or the L2 distance).

Only the 'q' value have to be found automatically by SVC software in any of the setups you try.


Best Regards, 

    VR</description>
		<content:encoded><![CDATA[<p>Dear Lawrence, </p>
<p>  the -s switch is supposed to be always used with 0.5 value. It&#8217;s just a heuristics that allow to explore the &#8216;q&#8217; value sequence in a smoother way. After a number of tries, the 0.5 value resulted the most suitable one. </p>
<p>My advice is to perform the experiments both with and without the softening strategy heuristics, even though the softening should be result in more accurate clustering.</p>
<p>The second question: SVC i a hierarchical-like clustering, so it is a so-called non-parametric clustering algorithm. It DOES NOT use the number of clusters as input parameter for determining the clustering. Actually, it should not use it at all. In this version of my software such a parameter is a &#8216;dirty trick&#8217; and serves for a stopping criterion in case of datasets which we know the exact number of clusters.</p>
<p>However, SVC could find not the same number of clusters you expect. This is why it auto-detects the latent structure. And it is not perfect. </p>
<p>In such cases you need to try different parameters combination: differnet values of &#8216;C&#8217; manually, different kernels (with -k switch you can use the Gaussian, the Laplace, the Exponential in my software), different metric distance (with -l switch you can use either the L1 or the L2 distance).</p>
<p>Only the &#8216;q&#8217; value have to be found automatically by SVC software in any of the setups you try.</p>
<p>Best Regards, </p>
<p>    VR</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4477</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Mon, 07 Apr 2008 03:29:11 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4477</guid>
		<description>Dear Vincenzo Russo，

   Why the number of clusters is usually not equal to the specified c value? If I have already know the number of clusters based on past literature,take 6 as an example, I want to obtain such kind of segmenation. Even the c value is specified to be 6, the results are usually not 6 clusters. What's the problem? Do you have any suggestions to adjust the parameters q/C or s to obtain the specified number of clusters?


   Many thanks and Best Regards

   Lawrence</description>
		<content:encoded><![CDATA[<p>Dear Vincenzo Russo，</p>
<p>   Why the number of clusters is usually not equal to the specified c value? If I have already know the number of clusters based on past literature,take 6 as an example, I want to obtain such kind of segmenation. Even the c value is specified to be 6, the results are usually not 6 clusters. What&#8217;s the problem? Do you have any suggestions to adjust the parameters q/C or s to obtain the specified number of clusters?</p>
<p>   Many thanks and Best Regards</p>
<p>   Lawrence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4359</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Sun, 06 Apr 2008 18:01:14 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4359</guid>
		<description>Dear Vincenzo Russo， 

  I quite agree with you that research activity should be open and collaborative. Althoughs there are many researchers did svc as well as its applications, You can't find any code or software from internet except yours. I really appreciate your work and kindness.

  One more thing. I checked myself according to your instructions. I run the svc in this way: svc -c 3 -f /path/to/iris-file to obtain the q value which is 0.163274. When I run the svc in another way: svc -c 3 -s 0.5 -f /path/to/iris-file, I will obtain the q/C value suggested by you. The problem is how to control s value. I also test other s values in this way, the output is not as good as s=0.5. Should I also use this s value When I apply svc to my own dataset? or you can obtain the q (0.0891501)value in the way (svc -c 3 -f /path/to/iris-file)?

   
    Best Regards

    Lawrence</description>
		<content:encoded><![CDATA[<p>Dear Vincenzo Russo， </p>
<p>  I quite agree with you that research activity should be open and collaborative. Althoughs there are many researchers did svc as well as its applications, You can&#8217;t find any code or software from internet except yours. I really appreciate your work and kindness.</p>
<p>  One more thing. I checked myself according to your instructions. I run the svc in this way: svc -c 3 -f /path/to/iris-file to obtain the q value which is 0.163274. When I run the svc in another way: svc -c 3 -s 0.5 -f /path/to/iris-file, I will obtain the q/C value suggested by you. The problem is how to control s value. I also test other s values in this way, the output is not as good as s=0.5. Should I also use this s value When I apply svc to my own dataset? or you can obtain the q (0.0891501)value in the way (svc -c 3 -f /path/to/iris-file)?</p>
<p>    Best Regards</p>
<p>    Lawrence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincenzo Russo</title>
		<link>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4334</link>
		<dc:creator>Vincenzo Russo</dc:creator>
		<pubDate>Sun, 06 Apr 2008 16:28:07 +0000</pubDate>
		<guid>http://thesis.neminis.org/2007/12/03/support-vector-clustering-code/#comment-4334</guid>
		<description>Dear Lawrence, 

 thank you for the greetings, I'm glad to be of help. 

My opinion about research activity is to be open and collaborative. I can't understand a closed research envinroment, because I think is a model that in constrat with the idea of research itself. While I was working on my master's thesis, I encountered many people with no intention to share anything at all. Unbelievable, from my perspective. I like to share, I need to share, I want to share.

This is why I wrote the SVC software and put it online. And this is why I will put online the future versions too.

The PhD in London will start in October. In the meanwhile, I am in my city (Naples, Italy) and I collaborate with my University (Federico II, Naples, Italy). 

Oh, thanks also for your promises of citation and acknowlegement. I'm glad of this.


Well, keep in touch. 
For any question, feel free to write me again. 

Best regards, 

   VR</description>
		<content:encoded><![CDATA[<p>Dear Lawrence, </p>
<p> thank you for the greetings, I&#8217;m glad to be of help. </p>
<p>My opinion about research activity is to be open and collaborative. I can&#8217;t understand a closed research envinroment, because I think is a model that in constrat with the idea of research itself. While I was working on my master&#8217;s thesis, I encountered many people with no intention to share anything at all. Unbelievable, from my perspective. I like to share, I need to share, I want to share.</p>
<p>This is why I wrote the SVC software and put it online. And this is why I will put online the future versions too.</p>
<p>The PhD in London will start in October. In the meanwhile, I am in my city (Naples, Italy) and I collaborate with my University (Federico II, Naples, Italy). </p>
<p>Oh, thanks also for your promises of citation and acknowlegement. I&#8217;m glad of this.</p>
<p>Well, keep in touch.<br />
For any question, feel free to write me again. </p>
<p>Best regards, </p>
<p>   VR</p>
]]></content:encoded>
	</item>
</channel>
</rss>
