<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Thesis Neminis &#187; Co-clustering</title>
	<atom:link href="http://thesis.neminis.org/category/co-clustering/feed/" rel="self" type="application/rss+xml" />
	<link>http://thesis.neminis.org</link>
	<description>Diario di lavoro della tesi di Vincenzo Russo / Work-log of Vincenzo Russo’s Thesis</description>
	<lastBuildDate>Mon, 04 Apr 2011 09:06:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Co-clustering softwares</title>
		<link>http://thesis.neminis.org/2007/12/03/co-clustering-softwares/</link>
		<comments>http://thesis.neminis.org/2007/12/03/co-clustering-softwares/#comments</comments>
		<pubDate>Mon, 03 Dec 2007 11:45:13 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/12/03/co-clustering-softwares/</guid>
		<description><![CDATA[The first co-clustering software is the Co-cluster developed at University of Austin, Texas. The software you can download here is the version 1.1 you can find also at the original web page. The package hosted here includes a patch to &#8230; <a href="http://thesis.neminis.org/2007/12/03/co-clustering-softwares/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The first co-clustering software is the <a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/Software_cocluster.tar.bz2">Co-cluster</a> developed at University of Austin, Texas. The software you can download here is the version 1.1 you can find also <a href="http://www.cs.utexas.edu/users/dml/Software/cocluster.html">at the original web page</a>.</p>
<p>The package hosted here includes a patch to allow the software compilation also with gcc 4.0 and so on modern Linux and Mac OS X systems. Furthermore, it also contains some bash scripts (*.sh) to analyze co-clustering results and produce clustering quality measures with respect to labeled datasets.</p>
<p>The original software is released under GPL license, and so is this.</p>
<p><strong>Download</strong></p>
<p><a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/Software_cocluster.tar.bz2">Co-clustering code</a></p>
<hr />
<p>The original version of the second Co-clustering software is available <a href="http://www.cs.utexas.edu/~hntuyen/projects/dm/">here</a> and it implements all the six approximation schemes for the Co-clustering, both for the Euclidean distance and for I-divergence.</p>
<p>The package hosted here includes also the same bash scripts included in the aforesaid Co-cluster package.</p>
<p>No license informations were included into the original Bregman co-clustering package, but it seems to be a fork of the <a href="http://www.cs.utexas.edu/users/dml/Software/coclusterOld.html">Co-cluster software v. 1.0</a>. The latter was released under GPL license, so the code of the Bregman co-clustering should be under the same license.</p>
<p><strong>Download</strong></p>
<p><a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/bregmanCocluster.tar.bz2">Bregman Co-clustering code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/12/03/co-clustering-softwares/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bregman divergences, SVMs and possible implications</title>
		<link>http://thesis.neminis.org/2007/11/06/bregman-divergences-svms-and-possible-implications/</link>
		<comments>http://thesis.neminis.org/2007/11/06/bregman-divergences-svms-and-possible-implications/#comments</comments>
		<pubDate>Tue, 06 Nov 2007 00:26:03 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[SVM]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/11/06/bregman-divergences-svms-and-possible-implications/</guid>
		<description><![CDATA[In order to find a connection between the works studied (Bregman Co-clustering and Support Vector Clustering) we have performed some research. An interesting result are the following paper: The above paper generalizes the Minimum Enclosing Ball (MEB) problem to the &#8230; <a href="http://thesis.neminis.org/2007/11/06/bregman-divergences-svms-and-possible-implications/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In order to find a connection between the works studied (Bregman Co-clustering and Support Vector Clustering) we have performed some research. An interesting result are the following paper:</p>
<ul>
</ul>
<p>The above paper generalizes the Minimum Enclosing Ball (MEB) problem to the Bregman divergences and also provide a generalization of the Bâdoiu-Clarkson (BC) approximation algorith. This is the same algorithm exploited in practical by the Core Vector Machines</p>
<ul>
</ul>
<p>CVMs reformulate the SVMs as a MEB problem. Since they use the BC algorithm and such an algorithm has been generalized to the Bregman divergences, the research on vector machines could have interesting implications.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/11/06/bregman-divergences-svms-and-possible-implications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New talk on SVC and MBI Principle</title>
		<link>http://thesis.neminis.org/2007/10/14/new-talk-on-svc-and-mbi-principle/</link>
		<comments>http://thesis.neminis.org/2007/10/14/new-talk-on-svc-and-mbi-principle/#comments</comments>
		<pubDate>Sun, 14 Oct 2007 12:12:15 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Kernel Width Estimation]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[SVM]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/10/14/new-talk-on-svc-and-mbi-principle/</guid>
		<description><![CDATA[In the Documents section are available the slides entitled: &#8220;Novel Clustering Techniques: Support Vector Methods and Minimum Bregman Information principle&#8221; SVC has been explained with more care because it still is a very experimental technique.]]></description>
			<content:encoded><![CDATA[<p>In the <em><a href="http://thesis.neminis.org/documenti/">Documents</a></em> section <a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/slide-3.pdf">are available the slides</a> entitled: &#8220;<em>Novel Clustering Techniques: Support Vector Methods and Minimum Bregman Information principle</em>&#8221;</p>
<p>SVC has been explained with more care because it still is a very experimental technique.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/10/14/new-talk-on-svc-and-mbi-principle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Euclidean Co-clustering Scheme 2 without Feature Clustering is K-means</title>
		<link>http://thesis.neminis.org/2007/10/10/euclidean-co-clustering-scheme-2-without-feature-clustering-is-k-means/</link>
		<comments>http://thesis.neminis.org/2007/10/10/euclidean-co-clustering-scheme-2-without-feature-clustering-is-k-means/#comments</comments>
		<pubDate>Wed, 10 Oct 2007 13:59:55 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Co-clustering]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/10/10/euclidean-co-clustering-scheme-2-without-feature-clustering-is-k-means/</guid>
		<description><![CDATA[In the previous posts, we have presented the results of some experiments about missing values robustness of SVC and Co-clustering. A note about Co-clustering is dutiful: the Scheme number two of the Bregman Co-clustering without feature clustering and with Euclidean &#8230; <a href="http://thesis.neminis.org/2007/10/10/euclidean-co-clustering-scheme-2-without-feature-clustering-is-k-means/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the previous posts, we have presented the results of some experiments about missing values robustness of SVC and Co-clustering.</p>
<p>A note about Co-clustering is dutiful: <strong>the Scheme number two of the Bregman Co-clustering without feature clustering and with Euclidean distance, is equal to the K-means algorithm</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/10/10/euclidean-co-clustering-scheme-2-without-feature-clustering-is-k-means/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Induced Missing Values Experiments &#8211; Stage 2</title>
		<link>http://thesis.neminis.org/2007/10/10/induced-missing-values-experiments-stage-2/</link>
		<comments>http://thesis.neminis.org/2007/10/10/induced-missing-values-experiments-stage-2/#comments</comments>
		<pubDate>Wed, 10 Oct 2007 10:12:09 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/10/10/induced-missing-values-experiments-stage-2/</guid>
		<description><![CDATA[This is the continuation of the experiments started few days ago. Two other datasets have been involved in this type of experiments. Both of them are Astrophysics datasets, more precisely two dataset containing Stars and Galaxies. Star/Galaxies separation is a &#8230; <a href="http://thesis.neminis.org/2007/10/10/induced-missing-values-experiments-stage-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This is the continuation of the experiments <a href="http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/">started few days ago</a>.</p>
<p>Two other datasets have been involved in this type of experiments. Both of them are Astrophysics datasets, more precisely two dataset containing Stars and Galaxies.</p>
<p>Star/Galaxies separation is a problem usually tackled with supervised learning methodologies. In our work several clustering testes are conducted on such type of data.</p>
<p>These two datasets was chosen to be quite simple to separate, because we are interested in the robustness with respect missing values.</p>
<p>Starting from the original datasets, I have created eight variants for each of them, in this way</p>
<ul>
<li>4 variants affecting only 3 features out of 15, with 5, 10, 20, 30 percent of objects reporting missing values for all of the 3 features, respectively</li>
<li>4 variants affecting 6 features out of 15, with 5, 10, 20, 30 percent of objects reporting missing values for all of the 6 features, respectively</li>
</ul>
<p>The experiments was done with Euclidean Co-clustering (Information-theoretic cannot work with negative values) and SVC.</p>
<p>An archive with all results <a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/induced-missing-values-experiments.zip">is available for download</a> (it contains also the results of the <a href="http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/">previous stage</a>).</p>
<p>In the files above:</p>
<p>- âMVâ? stands for âMissing Valuesâ?<br />
- âFCâ? stands for âFeature Clustersâ?<br />
- FC1 means no feature clustering<br />
- FC2 means two clusters of feature requested<br />
- FC3 means three clusters of feature requested<br />
- CC stands for Co-clustering</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/10/10/induced-missing-values-experiments-stage-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Induced Missing Values Experiments &#8211; Stage 1</title>
		<link>http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/</link>
		<comments>http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/#comments</comments>
		<pubDate>Mon, 01 Oct 2007 19:53:38 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/</guid>
		<description><![CDATA[Few days ago I made ready a tool to induce pseudo-random missing values within datasets. This tool allow us to test the robustness of both Bregman Co-clustering and SVC with respect to missing values. The tool accepts two parameters: the &#8230; <a href="http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Few days ago I made ready a tool to induce pseudo-random missing values within datasets. This tool allow us to test the robustness of both Bregman Co-clustering and SVC with respect to missing values.</p>
<p>The tool accepts two parameters: the fraction of objects that will be affected by the process, and the list of features involved.</p>
<p>As is my custom, I started this series of experiments with the IRIS data. So, I created these IRIS dataset variants</p>
<p>- IRIS 5a: 5% of objects with missing values. One feature (#3) involved.<br />
- IRIS 5b: 5% of objects with missing values. Two features (#3, #4) involved.<br />
- IRIS 10a: 10% of objects with missing values. One feature (#3) involved.<br />
- IRIS 10b: 10% of objects with missing values. Two features (#3, #4) involved.<br />
- IRIS 20a: 20% of objects with missing values. One feature (#3) involved.<br />
- IRIS 20b: 20% of objects with missing values. Two features (#3, #4) involved.<br />
- IRIS 30a: 30% of objects with missing values. One feature (#3) involved.<br />
- IRIS 30b: 30% of objects with missing values. Two features (#3, #4) involved.</p>
<p>We recall the IRIS data have 4 features.</p>
<p><a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/missing-values-experiments-iris.zip">Here you can download the results</a>.</p>
<p>The experiments was done with Co-clustering and SVC. Information-theoretic co-clustering results are not in the files above, because they were irrelevant (very poor performance).</p>
<p>In the files above:</p>
<p>- &#8220;MV&#8221; stands for &#8220;Missing Values&#8221;<br />
- &#8220;FC&#8221; stands for &#8220;Feature Clusters&#8221;<br />
- FC1 means no feature clustering<br />
- FC2 means two clusters of feature requested.<br />
- CC stands for Co-clustering</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/10/01/induced-missing-values-experiments-stage-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>High dimensionality: Co-clustering + SVC</title>
		<link>http://thesis.neminis.org/2007/08/14/high-dimensionality-co-clustering-svc/</link>
		<comments>http://thesis.neminis.org/2007/08/14/high-dimensionality-co-clustering-svc/#comments</comments>
		<pubDate>Tue, 14 Aug 2007 17:17:23 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/08/14/high-dimensionality-co-clustering-svc/</guid>
		<description><![CDATA[Uno degli obiettivi di tesi è avere buone prestazioni su high dimensional dataset. L&#8217;altro è quello di poter combinare le due tecniche studiate in maniera concreta e con buoni risultati. Il test illustrato di seguito coinvolte entrambi i punti. Il &#8230; <a href="http://thesis.neminis.org/2007/08/14/high-dimensionality-co-clustering-svc/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Uno degli obiettivi di tesi è avere buone prestazioni su high dimensional dataset. L&#8217;altro è quello di poter combinare le due tecniche studiate in maniera concreta e con buoni risultati.</p>
<p>Il test illustrato di seguito coinvolte entrambi i punti.</p>
<p>Il dataset coinvolto è stato l&#8217;Internet Advertisement dall&#8217;UCI repository of Machine Learning. Ne cito la descrizione</p>
<blockquote><p>This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image&#8217;s URL and alt text, the anchor text, and words occuring near the anchor text. The task is to predict whether an image is an advertisement (&#8220;ad&#8221;) or not (&#8220;nonad&#8221;).</p>
<p>Number of Instances: 3279 (2821 nonads, 458 ads)<br />
Number of Attributes: 1558</p>
<p>One or more of the first three features are missing in 28% of the instances.
</p></blockquote>
<p>Trattiamo dunque con un dataset in 1558D che possiede tra l&#8217;altro dei missing values.</p>
<p><strong>Co-clustering</strong></p>
<p>Il test effettuato col solo Co-clustering mette in evidenza i vantaggi dell&#8217;implicita capacità di questa tecnica di ridurre la dimensionalità effettuando feature clustering contestuale al data clustering.</p>
<p>Effettuando il test con lo schema 5 e la distanza euclidea, senza richiedere feature clustering, si ottiene quanto segue</p>
<blockquote><p>Class 1<br />
        TP: 223 FP: 98<br />
        FN: 223         TN: 2735<br />
Class 2<br />
        TP: 98  FP: 223<br />
        FN: 98  TN: 2860</p>
<p>Class 1 &#8211; Precision: 69.47% Recall: 50% F1: 58.148%<br />
Class 2 &#8211; Precision: 30.53% Recall: 50% F1: 37.912%</p>
<p>Accuracy: 9.79%</p></blockquote>
<p>Stesso schema, ma con feature clustering (10 cluster di feature richiesti), dà luogo a quanto segue</p>
<blockquote><p>Class 1<br />
        TP: 223 FP: 98<br />
        FN: 236         TN: 2722<br />
Class 2<br />
        TP: 2722        FP: 236<br />
        FN: 98  TN: 223</p>
<p>Class 1 &#8211; Precision: 69.47% Recall: 48.584% F1: 57.18%<br />
Class 2 &#8211; Precision: 92.022% Recall: 96.525% F1: 94.22%</p>
<p>Accuracy: 89.814%</p></blockquote>
<p>La riduzione della dimensionalità ha dato i suoi frutti (ricordiamo che il dataset possiede anche missing values)</p>
<p><strong>SVC</strong></p>
<p>Le risorse computazionali a disposizione insieme con la natura ancora non stabile del software sviluppato per il SVC non ha permesso di effettuare il test del SVC direttamente in 1558D. Ad ogni modo la letteratura riporta un forte degrado di prestazioni all&#8217;aumentare delle dimensioni dello spazio vettoriale, poiché allo stesso tempo si rende necessario un numero elevato di SV per la descrizione del contorno dei cluster.</p>
<p>Inoltre la letteratura mostra anche come la PCA perda di efficacia quando si necessita di un certo numero di componenti.</p>
<p><strong>Co-clustering + SVC</strong></p>
<p>Sappiamo che il Co-clustering, durante il processo di clustering, calcola una approssimazione della matrice di dati originale.</p>
<p>Pertanto, abbiamo usato il Co-clustering richiedendo un co-clustering di dimensioni 3279 x 3, ovvero abbiamo richiesto un numero di cluster di riga pari al numero di righe (quindi il numero di oggetti) e un numero di cluster di colonna considerevolmente inferiore al numero di colonne originale (3 feature clusters invece di 1558 features).</p>
<p>Così facendo il Co-clustering calcola una matrice di approssimazione dove ogni riga rappresenta ancora un solo oggetto della matrice originale, ma a differenza dell&#8217;originale, questa matrice è in 3D.</p>
<p>Ricordiamo inoltre che il processo di approssimazione genera una matrice priva di missing values.</p>
<p>Abbiamo così preso tale matrice approssimata e usata come input del SVC. Il risultato è stato addirittura superiore (anche se non di molto) al massimo ottenuto col Co-clustering</p>
<blockquote><p>Class 0<br />
        TP: 218 FP: 70<br />
        FN: 241 TN: 2750</p>
<p>Precision: 75.6944 &#8211; Recall: 47.4946 &#8211; F1: 58.3668</p>
<p>Class 1<br />
        TP: 2750        FP: 241<br />
        FN: 70  TN: 218</p>
<p>Precision: 91.9425 &#8211; Recall: 97.5177 &#8211; F1: 94.6481</p>
<p>Accuracy: 90.5154</p></blockquote>
<p><strong>Tempo macchina</strong></p>
<p>È importante notare che SVC ha impiegato soltanto 0.06 secondi in totale per concludere il processo, trovando il giusto valore di Kernel Width al secondo ciclo. Il Co-clustering ha invece impiegato 117 secondi a concludere il processo (ricordiamo completo anche della sua parte di clustering e non solo del calcolo della matrice approssimata, poiché allo stato attuale il software non permette di scindere le due operazioni).</p>
<p>Abbiamo dunque un totale di 117.06 secondi.</p>
<p>L&#8217;SVC su un dataset di 3000 elementi in 20 dimensioni ha impiegato circa 400 secondi per compiere due cicli. E il tempo impiegato cresce all&#8217;aumentare delle dimensioni e anche coll&#8217;avanzare dei cicli, poiché valori più alti di Kernel Width rallentano il processo di domain description. È facile dunque capire come si potrebbe comportare il solo SVC con un dataset di 1558D.</p>
<p><strong>Conclusioni</strong></p>
<p>Altri test simili erano stati effettuati su dataset troppo piccoli per avere dei risultati significativi.<br />
Questo test mostra come è possibile combinare le due tecniche, sfruttando la capacità di ridurre la dimensionalità e di trattare i missing values dell&#8217;uno, e l&#8217;accuratezza, la gestione degli outliers e dei cluster di forma arbitraria dell&#8217;altro.</p>
<p>Altri test saranno successivamente condotti.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/08/14/high-dimensionality-co-clustering-svc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Co-clustering &#8211; Missing Values Experiments</title>
		<link>http://thesis.neminis.org/2007/08/06/co-clustering-missing-values-experiments/</link>
		<comments>http://thesis.neminis.org/2007/08/06/co-clustering-missing-values-experiments/#comments</comments>
		<pubDate>Mon, 06 Aug 2007 10:05:26 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/08/06/co-clustering-missing-values-experiments/</guid>
		<description><![CDATA[Data la complessità nell&#8217;eseguire gli esperimenti con il Co-clustering e al fine di eseguire dei test ben ponderati, ho oggi deciso di eseguire dei test su dataset di dimensioni ridotte per capire la linea da seguire su dataset più complicati. &#8230; <a href="http://thesis.neminis.org/2007/08/06/co-clustering-missing-values-experiments/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Data la complessità nell&#8217;eseguire gli esperimenti con il Co-clustering e al fine di eseguire dei test ben ponderati, ho oggi deciso di eseguire dei test su dataset di dimensioni ridotte per capire la linea da seguire su dataset più complicati.</p>
<p>In riferimento agli <a href="http://thesis.neminis.org/wp-content/plugins/downloads-manager/upload/experiments-results.pdf">esperimenti preliminari già eseguiti</a>, ancora una volta oggetto dei test è l&#8217;IRIS data set.</p>
<p>Il test in questione è stato eseguito con lo Squared Euclidean Co-clustering. Nei test preliminari, con tale istanza di Co-clustering si era raggiunta un&#8217;accuratezza che oscillava tra l&#8217;88% e l&#8217;89%.</p>
<p>Nel test di oggi sono stati introdotti missing values nell&#8217;IRIS dataset, secondo una politica casuale.</p>
<p>Data la matrice di dati rappresentate il dataset (150 oggetti x 4 attributi), sono stati introdotti, nell&#8217;ordine, prima il 5, poi il 10 e poi il 20 per cento di missing values.</p>
<p>Nel primo caso l&#8217;accuratezza è stata del 88.667%, praticamente immutata.<br />
Nel secondo caso l&#8217;accuratezza è stata del 84%.<br />
Nel terzo caso l&#8217;accuratezza è stata del 81%.</p>
<p>Considerando la perdita di informazione introdotta, il Co-clustering ha fornito ugualmente risultati rispettabili, con una perdita di accuratezza non lineare rispetto al numero di missing values introdotti.</p>
<p>In giornata eseguirò altri test simili, per poi ritornare sul Pandora Dataset del prof. Longo e infine passare su un dataset di documenti testuali, come Reuters.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/08/06/co-clustering-missing-values-experiments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slide: Bregman Co-clustering ed applicazioni</title>
		<link>http://thesis.neminis.org/2007/08/04/slide-bregman-co-clustering-ed-applicazioni/</link>
		<comments>http://thesis.neminis.org/2007/08/04/slide-bregman-co-clustering-ed-applicazioni/#comments</comments>
		<pubDate>Sat, 04 Aug 2007 18:10:53 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Missing values]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/08/04/slide-bregman-co-clustering-ed-applicazioni/</guid>
		<description><![CDATA[Ottime slide che riassiumono le caratteristiche e le applicazioni del Bregman Framework per il Co-clustering e il Missing-values Prediction. N.B. nelle slide il termine matrice sparsa è usato secondo l&#8217;accezione geometrica classica. Download.]]></description>
			<content:encoded><![CDATA[<p><a href="http://hercules.ece.utexas.edu/~srujana/talks/bregcoclust.pdf">Ottime slide</a> che riassiumono le caratteristiche e le applicazioni del Bregman Framework per il Co-clustering e il Missing-values Prediction.</p>
<p><strong>N.B.</strong> nelle slide il termine <strong>matrice sparsa</strong> è usato secondo l&#8217;accezione geometrica classica.</p>
<p><a href="http://hercules.ece.utexas.edu/~srujana/talks/bregcoclust.pdf">Download</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/08/04/slide-bregman-co-clustering-ed-applicazioni/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pandora: nuovi risultati</title>
		<link>http://thesis.neminis.org/2007/07/23/pandora-nuovi-risultati/</link>
		<comments>http://thesis.neminis.org/2007/07/23/pandora-nuovi-risultati/#comments</comments>
		<pubDate>Mon, 23 Jul 2007 09:33:04 +0000</pubDate>
		<dc:creator>vincenzo russo</dc:creator>
				<category><![CDATA[Astrophysics]]></category>
		<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Bregman]]></category>
		<category><![CDATA[Co-clustering]]></category>
		<category><![CDATA[Dataset]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://thesis.neminis.org/2007/07/23/pandora-nuovi-risultati/</guid>
		<description><![CDATA[Impegni di lavoro non mi hanno permesso di lavorare al 100% sul test del Pandora Dataset, che data la mole di dati che produce a ogni test, ha necessitato, tra l&#8217;altro, la scrittura di strumenti di analisi, seppur abbastanza grezzi, &#8230; <a href="http://thesis.neminis.org/2007/07/23/pandora-nuovi-risultati/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Impegni di lavoro non mi hanno permesso di lavorare al 100% sul test del Pandora Dataset, che data la mole di dati che produce a ogni test, ha necessitato, tra l&#8217;altro, la scrittura di strumenti di analisi, seppur abbastanza grezzi, per il momento.</p>
<p>Ad ogni modo, ciò che è possibile fornire allo stato attuale delle cose è il numero di oggetti per cluster, sia nel caso di &#8220;Pandora depurato&#8221; sia nel caso di &#8220;Pandora originale&#8221;. Il numero di cluster richiesti in entrambi i casi è stato 21, numero che dovrebbe essere un&#8217;ottima stima della realtà, in seguito ai ripetuti test effettuati per stimare appunto tale numero. Ad ogni modo, ripeto ancora una volta, l&#8217;obiettivo primario di questi test è assicurare un comportamento stabile del Co-clustering in presenza di oggetti con <em>missing values</em>. Un raffinamento della stima dei cluster potrà essere ottenuto, ad esempio, applicando <em>relative criteria</em> per la valutazione dei risultati (<a href="http://thesis.neminis.org/2007/07/23/stesura-tesi-prima-bozza/">vedere prima bozza tesi</a>)</p>
<p>Di seguito si fa riferimento all&#8217; Information-Theoretic Co-clustering.</p>
<p><strong>Co-clustering di &#8220;Pandora depurato&#8221; &#8211; Richiesti 21 Cluster</strong></p>
<p>   2724<br />
   5840<br />
   8064<br />
   8365<br />
  11825<br />
  12340<br />
  15119<br />
  15591<br />
  18449<br />
  19838<br />
  20086<br />
  22064<br />
  23863<br />
  25575<br />
  26577<br />
  26650<br />
  28016<br />
  30956<br />
  33215<br />
  40045<br />
  40746<br />
 435948 tot</p>
<p><strong>Co-clustering di &#8220;Pandora originale&#8221; &#8211; Richiesti 21 Cluster</strong></p>
<p>666<br />
   5176<br />
   5999<br />
   9961<br />
  13076<br />
  13336<br />
  14091<br />
  16104<br />
  17879<br />
  18135<br />
  19523<br />
  20632<br />
  23703<br />
  25933<br />
  30699<br />
  31505<br />
  32621<br />
  33934<br />
  35085<br />
  38781<br />
  42432<br />
 449271 tot (i 13304 oggetti in più sono quelli riportanti missing values)</p>
<p>Di seguito si fa riferimento al Minimum Sum Squared Co-clustering (v. II)</p>
<p><strong>Co-clustering di &#8220;Pandora depurato&#8221; &#8211; Richiesti 21 Cluster</strong></p>
<p> 2427<br />
   3375<br />
   8471<br />
  10682<br />
  11176<br />
  11521<br />
  11702<br />
  12682<br />
  13262<br />
  15596<br />
  16760<br />
  19127<br />
  20744<br />
  20886<br />
  24549<br />
  28716<br />
  29123<br />
  30408<br />
  38123<br />
  41477<br />
  65141<br />
 435948 tot</p>
<p><strong>Co-clustering di &#8220;Pandora originale&#8221; &#8211; Richiesti 21 Cluster</strong></p>
<p>   1106<br />
   1441<br />
   2288<br />
   3350<br />
   3597<br />
   5492<br />
   8232<br />
   8329<br />
  13267<br />
  16174<br />
  19508<br />
  19696<br />
  21329<br />
  23955<br />
  24489<br />
  26848<br />
  29635<br />
  41147<br />
  55983<br />
  56935<br />
  66470<br />
 449271 tot</p>
<p>Si sta sviluppando un ulteriore strumento di analisi, per l&#8217;analisi incrociata dei cluster.<br />
Dato che il Co-clustering non produce i cluster sempre nello stesso ordine, è necessario un&#8217;analisi più complessa degli stessi per calcolare una misura di quanto il co-clustering sia rimasto stabile.</p>
<p>Inoltre si stanno riconducendo i test richiedendo 20 cluster, perché dagli ultimi test con 21 cluster, pur non essendoci cluster vuoti, è risultato in media, su 20 iterazioni, 1 cluster singleton.</p>
]]></content:encoded>
			<wfw:commentRss>http://thesis.neminis.org/2007/07/23/pandora-nuovi-risultati/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

