July 16 2007

Pandora Dataset: prime considerazioni

Innanzitutto, le prime operazioni sul dataset Pandora sono state quelle di “preparazione” all’esperimento:

  1. Eliminazione di alcune “feature” (quindi alcune colonne delle matrice dati)
  2. Eliminazione degli oggetti aventi missing values (13304 su 449271) in corrispondenza delle restanti feature (i missing value erano riportati come -9999.0)

Partendo da un numero sovrastimato di cluster che si voleva ottenere, 50, si è iniziato a far girare il Co-clustering. Una serie di esecuzioni successive hanno rilevato che 50 era effettivamente sovrastimato, riportando il numero di cluster che restavano vuoti. Questo numero è stato sottratto a 50 ed è stato poi ripetuta l’esecuzione del co-clustering.

Questa operazione è stata ripetuta finché non si è avuta una media di cluster vuoti su 20 iterazioni di al più 1 cluster. Il numero di cluster stimato sembra così essere 20-21.
Già con input di 24 cluster richiesti, su 20 iterazioni si otteneva cmq una media di cluster vuoti di 1.5 cluster.

Ad ogni modo, questa prima fase non si è concentrata sulla stima esatta del numero di cluster, ma sul paragonare il comportamento dei test eseguiti sul dataset Pandora “depurato” dai missing values e quello originale.

Gli stessi test sono stati dunque ripetuti sul dataset originale, dove però al valore -9999.0 è stato sostituito 0, così come indicato dalla letteratura (una serie di test di prova direttamente col valore -9999.0 è stato effettuato e portava all’individuazione di soli 2-3 cluster). Il comportamento è stato praticamente simile, ottenendo la stessa stima di numero di cluster e un valore simile di funzione obiettivo alla fine del processo.

Tra l’altro, un controllo veloce dei cluster in entrambi i testi rivela che il contenuto dei cluster è molto simile, comunicandoci che la qualità del clustering non è stata gravemente inficiata dalla presenza di oggetti con missing values.

I successivi passi saranno questi:

  1. Eseguire i test nuovamente stavolta aumentando il numero di step per l’algoritmo di local search, al fine di ottenere un migliore valore per i minimo locali e pertanto avere una maggiore affidabilità della stima di cluster finale.
  2. Elaborazione approfondita dell’output del co-clustering, al fine di paragonare accuratamente i cluster ottenuti dal dataset depurato con quelli ottenuti dal dataset impuro

July 13 2007

Astrophysics Dataset: Pandora

Inizio i lavori sul dataset Pandora fornitomi dal prof. Longo, basandomi sulle sue direttive.

  • Verrà usato un sottoinsieme delle colonne
  • Un primo clustering verrà effettuato depurando il dataset da missing values
  • Un successivo clustering verrà effettuato sul dataset non depurato
  • I due clustering verranno confrontati, utilizzando il primo come baseline di riferimento.
  • Maggiori dettagli sul dataset saranno disponibili al più presto.

    Il clustering verrà affrontato con Bregman Co-clustering, per affrontare il problema dei missing values.
    Il metodo di aggiornamento dei mediodi/centroidi sarà il Local Search, che evita minimi locali e ci permette, partendo da un numero iniziale sovrastimato di cluter, di “scovare” il numero effettivo di cluster (o nei casi difficili una buona approssimazione di esso), lavorando per raffinamenti successivi.
    In questo esperimento l’inizializzazione del co-clustering sarà lasciata casuale.

    In successive prove proveremo ad utilizzare l’inizializzazione spettrale proposta in

    • H. Cho, I. Dhillon, Y. Guan, and S. Sra, "Minimum sum squared residue co-clustering of gene expression data," in Proceedings of the Fourth SIAM International Conference on Data Mining, 2004, pp. 114-125.
      @inproceedings{cho04minimum,
        author = {H. Cho and I. Dhillon and Y. Guan and S. Sra},
        Booktitle = {Proceedings of the Fourth SIAM International Conference on Data Mining},
        Date-Added = {2007-04-12 11:30:35 +0200},
        Date-Modified = {2007-06-19 15:14:55 +0200},
        Keywords = {clustering, co-clustering, bioinformatics},
        Month = {April},
        Pages = {114–125},
        Title = {Minimum sum squared residue co-clustering of gene expression data},
        Url = {http://www.cs.utexas.edu/users/inderjit/public_papers/mssrcc_siam.pdf},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEFkuLi8uLi8uLi9QYXBlcnMvQ2hvL01pbmltdW0gc3VtIHNxdWFyZWQgcmVzaWR1ZSBjby1jbHVzdGVyaW5nIG9mIGdlbmUgZXhwcmVzc2lvbiBkYXRhLnBkZtIbDxwdV05TLmRhdGFPEQJYAAAAAAJYAAIAAAlEb2N1bWVudHMAAAAAAAAAAAAAAAAAAAAAAAC+zniuSCsAAAA3JQAfTWluaW11bSBzdW0gc3F1YXJlZCAjMkEzOTY0LnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACo5ZMI6vZZQREYgAAAAAAADAAMAAAkAAAAAAAAAAAAAAAAAAAAAA0NobwAAEAAIAAC+zlyOAAAAEQAIAADCOqF2AAAAAQAUADclAAA3G4AAALLyAAASxgAAEq0AAgBORG9jdW1lbnRzOm5lbW86RG9jdW1lbnRzOlVuaXZlcnNpdGE6UGFwZXJzOkNobzpNaW5pbXVtIHN1bSBzcXVhcmVkICMyQTM5NjQucGRmAA4AjABFAE0AaQBuAGkAbQB1AG0AIABzAHUAbQAgAHMAcQB1AGEAcgBlAGQAIAByAGUAcwBpAGQAdQBlACAAYwBvAC0AYwBsAHUAcwB0AGUAcgBpAG4AZwAgAG8AZgAgAGcAZQBuAGUAIABlAHgAcAByAGUAcwBzAGkAbwBuACAAZABhAHQAYQAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAay9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9DaG8vTWluaW11bSBzdW0gc3F1YXJlZCByZXNpZHVlIGNvLWNsdXN0ZXJpbmcgb2YgZ2VuZSBleHByZXNzaW9uIGRhdGEucGRmAAATABIvVm9sdW1lcy9Eb2N1bWVudHMAFQACABf//wAAgAbSHyAhIlgkY2xhc3Nlc1okY2xhc3NuYW1loyIjJF1OU011dGFibGVEYXRhVk5TRGF0YVhOU09iamVjdNIfICYnoickXE5TRGljdGlvbmFyeQAIABEAGwAkACkAMgBEAEkATABRAFMAXABiAGkAdAB8AIMAhgCIAIoAjQCPAJEAkwCgAKoBBgELARMDbwNxA3YDfwOKA44DnAOjA6wDsQO0AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA8E=},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/inderjit/public_papers/mssrcc_siam.pdf}
      }

    per migliorare la qualità del risultato finale.

    Infine, essendo presenti valori negativi nella matrice, l’istanza di Co-clustering basata su di divergenza KL e Mutua Informazione non potrà essere utilizzata

    • I. S. Dhillon, S. Mallela, and D. S. Modha, "Information-Theoretic Co-Clustering," in Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003, pp. 89-98.
      @inproceedings{dhillon:mallela:modha:03,
        author = {I. S. Dhillon and S. Mallela and D. S. Modha},
        Booktitle = {Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ({KDD}-2003)},
        Date-Modified = {2007-07-14 15:32:35 +0200},
        Keywords = {clustering, co-clustering, relative entropy},
        Pages = {89–98},
        Title = {Information-Theoretic Co-Clustering},
        Url = {http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf},
        Year = {2003},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfED8uLi8uLi8uLi9QYXBlcnMvRGhpbGxvbi9JbmZvcm1hdGlvbi1UaGVvcmV0aWMgQ28tQ2×1c3RlcmluZy5wZGbSGw8cHVdOUy5kYXRhTxECCgAAAAACCgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyNdH0luZm9ybWF0aW9uLVRoZW9yZXRpIzIzQThBNi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAjqKbCBy5VAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAdEaGlsbG9uAAAQAAgAAL7OXI4AAAARAAgAAMIHIEUAAAABABQANyNdADcbgAAAsvIAABLGAAASrQACAFJEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6RGhpbGxvbjpJbmZvcm1hdGlvbi1UaGVvcmV0aSMyM0E4QTYucGRmAA4AUAAnAEkAbgBmAG8AcgBtAGEAdABpAG8AbgAtAFQAaABlAG8AcgBlAHQAaQBjACAAQwBvAC0AQwBsAHUAcwB0AGUAcgBpAG4AZwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAUS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9EaGlsbG9uL0luZm9ybWF0aW9uLVRoZW9yZXRpYyBDby1DbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAOwA8QD5AwcDCQMOAxcDIgMmAzQDOwNEA0kDTAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANZ},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf}
      }

    July 06 2007

    Support Vector Methods and MBI Principle

    In the Documents section are available the slides entitled: “Data Clustering: High dimensionality, missing values and noise. Support Vector Methods and Minimum Bregman Information Principle

    June 29 2007

    Co-clustering Preliminary Experiments

    In the section Documents is available for download the PDF with the configurations used for tests and related results; is also available the ZIP archive containing the data-sets used for the experiments.

    June 22 2007

    Co-clustering - Synthetic Dataset Test #1

    Macchina usata:
    PowerPC G4, 1.5GHz, 768MB RAM, Mac OS X

    Software usato:

    • H. Cho, Y. Guan, and S. Sra, Co-cluster (v 1.1), 2004.
      @misc{coclus-software,
        author = {Hyuk Cho and Yuqiang Guan and Suvrit Sra},
        Date-Added = {2007-04-29 15:15:55 +0200},
        Date-Modified = {2007-06-25 17:10:33 +0200},
        Howpublished = {Bregman co-clustering software},
        Keywords = {co-clustering, relative entropy, euclidean distance, software},
        Title = {Co-cluster (v 1.1)},
        Url = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html},
        Year = {2004},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html}
      }

    Dataset usato:
    Il dataset usato in questo test è un dataset sintetico, generato grazie a

    • J. R. Vennam and S. Vadapalli, "SynDECA: A Tool to Generate Synthetic Datasets for Evaluation of Clustering Algorithms," in 11th International Conference on Management of Data (COMAD 2005), Goa, India, 2005.
      @conference{syndeca2005, Address = {Goa, India},
        Author = {Jhansi Rani Vennam and Soujanya Vadapalli},
        Booktitle = {11th International Conference on Management of Data (COMAD 2005)},
        Date-Added = {2007-06-18 16:18:49 +0200},
        Date-Modified = {2007-07-03 18:34:02 +0200},
        Keywords = {clustering, tool, synthetic, dataset, generator},
        Month = {January},
        Organization = {http://cde.iiit.ac.in/syndeca},
        Title = {SynDECA: A Tool to Generate Synthetic Datasets for Evaluation of Clustering Algorithms},
        Url = {http://comad2005.persistent.co.in/COMAD2005Proc/pages027-036.pdf},
        Year = {2005},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHAuLi8uLi8uLi9QYXBlcnMvVmVubmFtL1N5bkRFQ0EgQSBUb29sIHRvIEdlbmVyYXRlIFN5bnRoZXRpYyBEYXRhc2V0cyBmb3IgRXZhbHVhdGlvbiBvZiBDbHVzdGVyaW5nIEFsZ29yaXRobXMucGRm0hsPHB1XTlMuZGF0YU8RApwAAAAAApwAAgAACURvY3VtZW50cwAAAAAAAAAAAAAAAAAAAAAAAL7OeK5IKwAAADk5AR9TeW5ERUNBIEEgVG9vbCB0byBHZSMzOTM4RjcucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOTj3wrBG9QAAAAAAAAAAAAMAAwAACQAAAAAAAAAAAAAAAAAAAAAGVmVubmFtABAACAAAvs5cjgAAABEACAAAwrAq1QAAAAEAFAA5OQEANxuAAACy8gAAEsYAABKtAAIAUURvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpWZW5uYW06U3luREVDQSBBIFRvb2wgdG8gR2UjMzkzOEY3LnBkZgAADgC0AFkAUwB5AG4ARABFAEMAQQAgAEEAIABUAG8AbwBsACAAdABvACAARwBlAG4AZQByAGEAdABlACAAUwB5AG4AdABoAGUAdABpAGMAIABEAGEAdABhAHMAZQB0AHMAIABmAG8AcgAgAEUAdgBhAGwAdQBhAHQAaQBvAG4AIABvAGYAIABDAGwAdQBzAHQAZQByAGkAbgBnACAAQQBsAGcAbwByAGkAdABoAG0AcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAgi9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9WZW5uYW0vU3luREVDQSBBIFRvb2wgdG8gR2VuZXJhdGUgU3ludGhldGljIERhdGFzZXRzIGZvciBFdmFsdWF0aW9uIG9mIENsdXN0ZXJpbmcgQWxnb3JpdGhtcy5wZGYAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAR0BIgEqA8oDzAPRA9oD5QPpA/cD/gQHBAwEDwAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAQc},
        Bdsk-Url-1 = {http://comad2005.persistent.co.in/COMAD2005Proc/pages027-036.pdf}
      }

    Il dataset è così composto:
    Oggetti: 1000
    Attributi: 10
    Classi: 5, per un totale di 888 punti (Cluster 0: 327, Cluster 1: 134, Cluster 2: 162, Cluster 3: 132, Cluster 4: 133)
    Punti di disturbo: 112 (punti non classificabili)

    Algoritmo di co-clustering usato: Euclidean Distance Based, Minimum Sum Squared, Information Theoretic

    Problemi: Da questo primo test condotto su un dataset disturbato, lo schema di co-clustering sembra non essere pensato per identificare il rumore e separarlo dal resto della classificazione, col risultato che tutte le istanze di co-clustering tendono a classificare il rumore in una delle cinque classi richieste, sfalsando i risultati.

    Eliminazione punti di rumore: Eliminando i punti di rumore, abbiamo ottenuto un dataset di 888 punti e l’algoritmo (Euclidean Distance Based, con 5 co-cluster richiesti) ha separato perfettamente le 5 classi senza alcun errore in un tempo così espresso:
    User = 0 second(s) 138552 ms
    System = 0 second(s) 6630 ms
    Time/Run = 0.138552 second(s)

    June 22 2007

    Co-clustering - Real World Dataset Test #2

    Macchina usata:
    PowerPC G4, 1.5GHz, 768MB RAM, Mac OS X

    Software usato:

    • H. Cho, Y. Guan, and S. Sra, Co-cluster (v 1.1), 2004.
      @misc{coclus-software,
        author = {Hyuk Cho and Yuqiang Guan and Suvrit Sra},
        Date-Added = {2007-04-29 15:15:55 +0200},
        Date-Modified = {2007-06-25 17:10:33 +0200},
        Howpublished = {Bregman co-clustering software},
        Keywords = {co-clustering, relative entropy, euclidean distance, software},
        Title = {Co-cluster (v 1.1)},
        Url = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html},
        Year = {2004},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html}
      }

    Dataset Usato:
    Mushrooms Database
    Number of instances: 8124
    Number of Attributes: 22
    2480 missing values for attribute #12
    Original Class Distribution: edible: 4208 (51.8%), poisonous: 3916 (48.2%)
    Mushroom records drawn from The Audubon Society Field Guide to North
    American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf
    Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
    Date: 27 April 1987

    Algoritmo di co-clustering usato: Minimum Sum Squared Residue

    Prova #1
    Richiesti 2 cluster di riga e 1 di colonna. Totale: 2 co-cluster

    Tempo impiegato: User = 2 second(s) 127370 ms, System = 0 second(s) 40949 ms, Time/Run = 2.12737 second(s)

    Risultato: 3670 elementi nella classe “poisonous”, 4454 elementi nella classe “edible”.

    Percentuale d’errore (elementi non classificati correttamente): ~3%

    Prova #2
    Richiesti 2 cluster di riga e 2 di colonna. Totale: 4 co-cluster

    Tempo impiegato: User = 2 second(s) 158490 ms, System = 0 second(s) 40654 ms, Time/Run = 2.15849 second(s)

    Risultato: 3915 elementi nella classe “poisonous”, 4209 elementi nella classe “edible”.

    Percentuale d’errore: ~1.23 x 10^-4 (1 solo elemento è stato classificato erroneamente)

    June 22 2007

    Co-clustering - Real World Dataset Test #1

    Macchina usata:
    PowerPC G4, 1.5GHz, 768MB RAM, Mac OS X

    Software usato:

    • H. Cho, Y. Guan, and S. Sra, Co-cluster (v 1.1), 2004.
      @misc{coclus-software,
        author = {Hyuk Cho and Yuqiang Guan and Suvrit Sra},
        Date-Added = {2007-04-29 15:15:55 +0200},
        Date-Modified = {2007-06-25 17:10:33 +0200},
        Howpublished = {Bregman co-clustering software},
        Keywords = {co-clustering, relative entropy, euclidean distance, software},
        Title = {Co-cluster (v 1.1)},
        Url = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html},
        Year = {2004},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/dml/Software/cocluster.html}
      }

    Dataset usato:
    Iris Plant Database
    From Fisher, 1936
    3 classes, 4 numeric attributes, 150 instances
    1 class is linearly separable from the other 2, but the other 2 are not linearly separable from each other

    Algoritmo di co-clustering usato: Euclidean Distance Based.

    Prova #1:
    Richiesti 3 cluster di riga (sulle righe abbiamo gli oggeti, sulle colonne gli attributi) e 1 solo cluster di colonna. In tal modo non viene effettuato alcun feature clustering (che ricordiamo è contestuale al data clustering).

    Tempo impiegato: User = 0 second(s) 9193 ms, System = 0 second(s) 2709 ms, Time/Run = 0.009193 second(s)

    Risultato: Co-Cluster 1: 54 elementi di riga, Co-Cluster 2: 40 elementi di riga, Co-Cluster 3: 56 elementi di riga. Avendo specificato 1 solo cluster per le colonne, tutti i co-cluster hanno gli stessi elementi di colonna.

    Conclusioni: L’algoritmo è riuscito a separare i cluster sovrapposti (classi 2 e 3 del dataset), ma ha commesso svariati errori di classificazioni. Al cluster 2 mancano 10 elementi, 4 dei quali sono nel primo cluster e i restanti 6 nel terzo cluster.

    Prova #2:
    Richiesti 3 cluster di riga e 2 cluster di colonna.

    Tempo impiegato: User = 0 second(s) 8397 ms, System = 0 second(s) 3042 ms, Time/Run = 0.008397 second(s)

    Risultato: Le tre classi sono state perfettamente separate. Nello specifico, sono stati prodotti 6 co-cluster, poiché, detto C il numero di cluster di colonna, e R il numero di cluster di riga, si ottengono sempre C*R co-cluster. Per ogni cluster di riga chiesto, si ottengono in pratica C co-cluster.

    Conclusioni: Separare 2 cluster (classi 2 e 3 del dataset in esame) non linearmente separabili è notevole per un algoritmo non kernel-based.

    May 10 2007

    Missing values, co-clustering e predizione dei valori mancanti

    Il problema dei missing values è a quanto pare molto sentito, soprattutto in Astrofisica, dove, testimone il prof. Longo, si gettano via svariate migliaia di dati non completamente descritti. Il co-clustering sembra venire in aiuto per affrontare questo tedioso problema.

    Come viene espressamente detto in

    • A. B. Tchagang and A. H. Tewfik, "Robust biclustering algorithm (ROBA) for DNA microarray data analysis," in 13th IEEE Workshop on Statistical Signal Processing, 2005, pp. 984-989.
      @conference{roba2005,
        author = {Alan B. Tchagang and Ahmed H. Tewfik},
        Booktitle = {13th IEEE Workshop on Statistical Signal Processing},
        Date-Added = {2007-05-10 13:07:21 +0200},
        Date-Modified = {2007-07-15 11:14:28 +0200},
        Keywords = {co-clustering, bioinformatics, missing values},
        Pages = {984–989},
        Title = {Robust biclustering algorithm ({ROBA}) for {DNA} microarray data analysis},
        Url = {http://ieeexplore.ieee.org/iel5/10843/34164/01628738.pdf},
        Year = {2005},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEGIuLi8uLi8uLi9QYXBlcnMvVGNoYWdhbmcvUm9idXN0IGJpY2×1c3RlcmluZyBhbGdvcml0aG0gKFJPQkEpIGZvciBETkEgbWljcm9hcnJheSBkYXRhIGFuYWx5c2lzLnBkZtIbDxwdV05TLmRhdGFPEQJyAAAAAAJyAAIAAAlEb2N1bWVudHMAAAAAAAAAAAAAAAAAAAAAAAC+zniuSCsAAAA3MyQfUm9idXN0IGJpY2×1c3RlcmluZyAjMzczMzFFLnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADczHr8n0lIAAAAAAAAAAAADAAMAAAkAAAAAAAAAAAAAAAAAAAAACFRjaGFnYW5nABAACAAAvs5cjgAAABEACAAAvye2MgAAAAEAFAA3MyQANxuAAACy8gAAEsYAABKtAAIAU0RvY3VtZW50czpuZW1vOkRvY3VtZW50czpVbml2ZXJzaXRhOlBhcGVyczpUY2hhZ2FuZzpSb2J1c3QgYmljbHVzdGVyaW5nICMzNzMzMUUucGRmAAAOAJQASQBSAG8AYgB1AHMAdAAgAGIAaQBjAGwAdQBzAHQAZQByAGkAbgBnACAAYQBsAGcAbwByAGkAdABoAG0AIAAoAFIATwBCAEEAKQAgAGYAbwByACAARABOAEEAIABtAGkAYwByAG8AYQByAHIAYQB5ACAAZABhAHQAYQAgAGEAbgBhAGwAeQBzAGkAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAdC9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9UY2hhZ2FuZy9Sb2J1c3QgYmljbHVzdGVyaW5nIGFsZ29yaXRobSAoUk9CQSkgZm9yIEROQSBtaWNyb2FycmF5IGRhdGEgYW5hbHlzaXMucGRmABMAEi9Wb2×1bWVzL0RvY3VtZW50cwAVAAIAF///AACABtIfICEiWCRjbGFzc2VzWiRjbGFzc25hbWWjIiMkXU5TTXV0YWJsZURhdGFWTlNEYXRhWE5TT2JqZWN00h8gJieiJyRcTlNEaWN0aW9uYXJ5AAgAEQAbACQAKQAyAEQASQBMAFEAUwBcAGIAaQB0AHwAgwCGAIgAigCNAI8AkQCTAKAAqgEPARQBHAOSA5QDmQOiA60DsQO/A8YDzwPUA9cAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAAD5A==},
        Bdsk-Url-1 = {http://ieeexplore.ieee.org/iel5/10843/34164/01628738.pdf}
      }
    • Y. Cheng and G. M. Church, "Biclustering of Expression Data," in Intelligent Systems for Molecular Biology, 2000, pp. 93-103.
      @inproceedings{cheng-biclustering00,
        author = {Yizong Cheng and George M. Church},
        Booktitle = {Intelligent Systems for Molecular Biology},
        Date-Added = {2007-05-09 22:25:18 +0200},
        Date-Modified = {2007-06-29 08:47:17 +0200},
        Keywords = {clustering, co-clustering, bioinformatics, biclustering},
        Pages = {93–103},
        Publisher = {AAAI Press},
        Title = {Biclustering of Expression Data},
        Url = {http://citeseer.ist.psu.edu/cheng00biclustering.html},
        Year = {2000},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEDkuLi8uLi8uLi9QYXBlcnMvQ2hlbmcvQmljbHVzdGVyaW5nIG9mIEV4cHJlc3Npb24gRGF0YS5wZGbSGw8cHVdOUy5kYXRhTxEB+AAAAAAB+AACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyCfH0JpY2×1c3RlcmluZyBvZiBFeHByIzMwRDU0Qy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAw1UzCZ/nsAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAVDaGVuZwAAEAAIAAC+zlyOAAAAEQAIAADCZ93MAAAAAQAUADcgnwA3G4AAALLyAAASxgAAEq0AAgBQRG9jdW1lbnRzOm5lbW86RG9jdW1lbnRzOlVuaXZlcnNpdGE6UGFwZXJzOkNoZW5nOkJpY2×1c3RlcmluZyBvZiBFeHByIzMwRDU0Qy5wZGYADgBIACMAQgBpAGMAbAB1AHMAdABlAHIAaQBuAGcAIABvAGYAIABFAHgAcAByAGUAcwBzAGkAbwBuACAARABhAHQAYQAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIASy9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9DaGVuZy9CaWNsdXN0ZXJpbmcgb2YgRXhwcmVzc2lvbiBEYXRhLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAOYA6wDzAu8C8QL2Av8DCgMOAxwDIwMsAzEDNAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANB},
        Bdsk-Url-1 = {http://citeseer.ist.psu.edu/cheng00biclustering.html}
      }

    il co-clustering permette di raggruppare oggetti simili tra loro in base a un sottoinsieme di attributi e non rispetto a tutti gli attributi che rappresentano gli oggetti. Essendo questi sottoinsiemi ricavati tramite un feature clustering contestuale al data clustering, il processo dovrebbe, per costruzione, non essere inficiato dalla presenza di missing values.

    Infatti, in

    • A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. Modha, "A generalized Maximum Entropy approach to Bregman co-clustering and matrix approximation," UTCS TR04-24, UT, Austin2004.
      @techreport{banerjee04generalized, Address = {UT, Austin},
        Author = {A. Banerjee and I. S. Dhillon and J. Ghosh and S. Merugu and D. Modha},
        Date-Modified = {2007-07-15 11:15:53 +0200},
        Institution = {UTCS TR04-24},
        Keywords = {bregman, clustering, co-clustering, sparse data, missing values},
        Rating = {4},
        Title = {A generalized {Maximum Entropy} approach to {Bregman} co-clustering and matrix approximation},
        Url = {http://www.cs.utexas.edu/ftp/pub/techreports/tr04-24.ps.gz},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHUuLi8uLi8uLi9QYXBlcnMvQmFuZXJqZWUvQSBnZW5lcmFsaXplZCBtYXhpbXVtIGVudHJvcHkgYXBwcm9hY2ggdG8gQnJlZ21hbiBjby1jbHVzdGVyaW5nIGFuZCBtYXRyaXggYXBwcm94aW1hdGlvbi5wZGbSGw8cHVdOUy5kYXRhTxECrAAAAAACrAACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyQEH0EgZ2VuZXJhbGl6ZWQgbWF4aW11IzJCOUIxRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAArmx/CRVVZAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAhCYW5lcmplZQAQAAgAAL7OXI4AAAARAAgAAMJFOTkAAAABABQANyQEADcbgAAAsvIAABLGAAASrQACAFNEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6QmFuZXJqZWU6QSBnZW5lcmFsaXplZCBtYXhpbXUjMkI5QjFGLnBkZgAADgC6AFwAQQAgAGcAZQBuAGUAcgBhAGwAaQB6AGUAZAAgAG0AYQB4AGkAbQB1AG0AIABlAG4AdAByAG8AcAB5ACAAYQBwAHAAcgBvAGEAYwBoACAAdABvACAAQgByAGUAZwBtAGEAbgAgAGMAbwAtAGMAbAB1AHMAdABlAHIAaQBuAGcAIABhAG4AZAAgAG0AYQB0AHIAaQB4ACAAYQBwAHAAcgBvAHgAaQBtAGEAdABpAG8AbgAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAhy9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9CYW5lcmplZS9BIGdlbmVyYWxpemVkIG1heGltdW0gZW50cm9weSBhcHByb2FjaCB0byBCcmVnbWFuIGNvLWNsdXN0ZXJpbmcgYW5kIG1hdHJpeCBhcHByb3hpbWF0aW9uLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqASIBJwEvA98D4QPmA+8D+gP+BAwEEwQcBCEEJAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAQx},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/ftp/pub/techreports/tr04-24.ps.gz}
      }
    • A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha, "A generalized Maximum Entropy approach to Bregman co-clustering and matrix approximation," in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD), 2004, pp. 509-514.
      @inproceedings{banerjee04generalizedkdd,
        author = {A. Banerjee and I. Dhillon and J. Ghosh and S. Merugu and D. Modha},
        Booktitle = {Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD)},
        Date-Added = {2007-04-16 10:48:17 +0200},
        Date-Modified = {2007-07-15 11:15:39 +0200},
        Keywords = {clustering, co-clustering, bregman, sparse data, missing values},
        Month = {August},
        Pages = {509–514},
        Title = {A generalized {Maximum Entropy} approach to {Bregman} co-clustering and matrix approximation},
        Url = {http://citeseer.ist.psu.edu/banerjee04generalized.html},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHsuLi8uLi8uLi9QYXBlcnMvQmFuZXJqZWUvQSBnZW5lcmFsaXplZCBtYXhpbXVtIGVudHJvcHkgYXBwcm9hY2ggdG8gQnJlZ21hbiBjby1jbHVzdGVyaW5nIGFuZCBtYXRyaXggYXBwcm94aW1hdGlvbi1icmllZi5wZGbSGw8cHVdOUy5kYXRhTxECvgAAAAACvgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyQEH0EgZ2VuZXJhbGl6ZWQgbWF4aW11IzIyMzY1OC5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiNljB+igpAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAhCYW5lcmplZQAQAAgAAL7OXI4AAAARAAgAAMH6GhkAAAABABQANyQEADcbgAAAsvIAABLGAAASrQACAFNEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6QmFuZXJqZWU6QSBnZW5lcmFsaXplZCBtYXhpbXUjMjIzNjU4LnBkZgAADgDGAGIAQQAgAGcAZQBuAGUAcgBhAGwAaQB6AGUAZAAgAG0AYQB4AGkAbQB1AG0AIABlAG4AdAByAG8AcAB5ACAAYQBwAHAAcgBvAGEAYwBoACAAdABvACAAQgByAGUAZwBtAGEAbgAgAGMAbwAtAGMAbAB1AHMAdABlAHIAaQBuAGcAIABhAG4AZAAgAG0AYQB0AHIAaQB4ACAAYQBwAHAAcgBvAHgAaQBtAGEAdABpAG8AbgAtAGIAcgBpAGUAZgAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAjS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9CYW5lcmplZS9BIGdlbmVyYWxpemVkIG1heGltdW0gZW50cm9weSBhcHByb2FjaCB0byBCcmVnbWFuIGNvLWNsdXN0ZXJpbmcgYW5kIG1hdHJpeCBhcHByb3hpbWF0aW9uLWJyaWVmLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqASgBLQE1A/cD+QP+BAcEEgQWBCQEKwQ0BDkEPAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAARJ},
        Bdsk-Url-1 = {http://citeseer.ist.psu.edu/banerjee04generalized.html}
      }

    si parla anche di “Missing Value Prediction” (rispettivamente par. 5.3 e par. 4.2), dove si sfrutta il co-clustering per la predizione dei valori mancanti, impostando i missing values a 0 e facendo “girare” l’algoritmo di co-clustering. L’algoritmo prosegue non curante dei dati mancanti; trovato il co-clustering, la matrice approssimata basata su di esso può essere usata per “predirre” i valori mancanti con una buona percentuale di errore.

    May 05 2007

    Bregman matrix approximation

    Approfondire la teoria alla base delle (più) matrici approssimate che si ottengono dato un co-clustering di Bregman.

    Riferimenti

    • A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, "Clustering with Bregman Divergences," Journal of Machine Learning Research, vol. 6, pp. 1705-1749, 2005.
      @article{clusterbregman2005, Address = {Cambridge, MA, USA},
        Author = {Arindam Banerjee and Srujana Merugu and Inderjit S. Dhillon and Joydeep Ghosh},
        Date-Modified = {2007-11-14 12:55:53 +0100},
        Issn = {1533 - 7928},
        Journal = {Journal of Machine Learning Research},
        Keywords = {bregman, clustering},
        Month = {October},
        Pages = {1705 — 1749},
        Publisher = {MIT Press},
        Title = {{Clustering with Bregman Divergences}},
        Url = {http://www.cs.utexas.edu/users/inderjit/public_papers/bregmanclustering_jmlr.pdf},
        Volume = {6},
        Year = {2005},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEEAuLi8uLi8uLi9QYXBlcnMvQmFuZXJqZWUvQ2×1c3RlcmluZyB3aXRoIEJyZWdtYW4gRGl2ZXJnZW5jZXMucGRm0hsPHB1XTlMuZGF0YU8RAgwAAAAAAgwAAgAACURvY3VtZW50cwAAAAAAAAAAAAAAAAAAAAAAAL7OeK5IKwAAADckBB9DbHVzdGVyaW5nIHdpdGggQnJlZyMyODlENzcucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKJ13wi1+wwAAAAAAAAAAAAMAAwAACQAAAAAAAAAAAAAAAAAAAAAIQmFuZXJqZWUAEAAIAAC+zlyOAAAAEQAIAADCLWKjAAAAAQAUADckBAA3G4AAALLyAAASxgAAEq0AAgBTRG9jdW1lbnRzOm5lbW86RG9jdW1lbnRzOlVuaXZlcnNpdGE6UGFwZXJzOkJhbmVyamVlOkNsdXN0ZXJpbmcgd2l0aCBCcmVnIzI4OUQ3Ny5wZGYAAA4AUAAnAEMAbAB1AHMAdABlAHIAaQBuAGcAIAB3AGkAdABoACAAQgByAGUAZwBtAGEAbgAgAEQAaQB2AGUAcgBnAGUAbgBjAGUAcwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAUi9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9CYW5lcmplZS9DbHVzdGVyaW5nIHdpdGggQnJlZ21hbiBEaXZlcmdlbmNlcy5wZGYAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAO0A8gD6AwoDDAMRAxoDJQMpAzcDPgNHA0wDTwAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANc},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/inderjit/public_papers/bregmanclustering_jmlr.pdf}
      }
    • A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha, "A generalized Maximum Entropy approach to Bregman co-clustering and matrix approximation," in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD), 2004, pp. 509-514.
      @inproceedings{banerjee04generalizedkdd,
        author = {A. Banerjee and I. Dhillon and J. Ghosh and S. Merugu and D. Modha},
        Booktitle = {Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD)},
        Date-Added = {2007-04-16 10:48:17 +0200},
        Date-Modified = {2007-07-15 11:15:39 +0200},
        Keywords = {clustering, co-clustering, bregman, sparse data, missing values},
        Month = {August},
        Pages = {509–514},
        Title = {A generalized {Maximum Entropy} approach to {Bregman} co-clustering and matrix approximation},
        Url = {http://citeseer.ist.psu.edu/banerjee04generalized.html},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHsuLi8uLi8uLi9QYXBlcnMvQmFuZXJqZWUvQSBnZW5lcmFsaXplZCBtYXhpbXVtIGVudHJvcHkgYXBwcm9hY2ggdG8gQnJlZ21hbiBjby1jbHVzdGVyaW5nIGFuZCBtYXRyaXggYXBwcm94aW1hdGlvbi1icmllZi5wZGbSGw8cHVdOUy5kYXRhTxECvgAAAAACvgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyQEH0EgZ2VuZXJhbGl6ZWQgbWF4aW11IzIyMzY1OC5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiNljB+igpAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAhCYW5lcmplZQAQAAgAAL7OXI4AAAARAAgAAMH6GhkAAAABABQANyQEADcbgAAAsvIAABLGAAASrQACAFNEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6QmFuZXJqZWU6QSBnZW5lcmFsaXplZCBtYXhpbXUjMjIzNjU4LnBkZgAADgDGAGIAQQAgAGcAZQBuAGUAcgBhAGwAaQB6AGUAZAAgAG0AYQB4AGkAbQB1AG0AIABlAG4AdAByAG8AcAB5ACAAYQBwAHAAcgBvAGEAYwBoACAAdABvACAAQgByAGUAZwBtAGEAbgAgAGMAbwAtAGMAbAB1AHMAdABlAHIAaQBuAGcAIABhAG4AZAAgAG0AYQB0AHIAaQB4ACAAYQBwAHAAcgBvAHgAaQBtAGEAdABpAG8AbgAtAGIAcgBpAGUAZgAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAjS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9CYW5lcmplZS9BIGdlbmVyYWxpemVkIG1heGltdW0gZW50cm9weSBhcHByb2FjaCB0byBCcmVnbWFuIGNvLWNsdXN0ZXJpbmcgYW5kIG1hdHJpeCBhcHByb3hpbWF0aW9uLWJyaWVmLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqASgBLQE1A/cD+QP+BAcEEgQWBCQEKwQ0BDkEPAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAARJ},
        Bdsk-Url-1 = {http://citeseer.ist.psu.edu/banerjee04generalized.html}
      }
    • A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. Modha, "A generalized Maximum Entropy approach to Bregman co-clustering and matrix approximation," UTCS TR04-24, UT, Austin2004.
      @techreport{banerjee04generalized, Address = {UT, Austin},
        Author = {A. Banerjee and I. S. Dhillon and J. Ghosh and S. Merugu and D. Modha},
        Date-Modified = {2007-07-15 11:15:53 +0200},
        Institution = {UTCS TR04-24},
        Keywords = {bregman, clustering, co-clustering, sparse data, missing values},
        Rating = {4},
        Title = {A generalized {Maximum Entropy} approach to {Bregman} co-clustering and matrix approximation},
        Url = {http://www.cs.utexas.edu/ftp/pub/techreports/tr04-24.ps.gz},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEHUuLi8uLi8uLi9QYXBlcnMvQmFuZXJqZWUvQSBnZW5lcmFsaXplZCBtYXhpbXVtIGVudHJvcHkgYXBwcm9hY2ggdG8gQnJlZ21hbiBjby1jbHVzdGVyaW5nIGFuZCBtYXRyaXggYXBwcm94aW1hdGlvbi5wZGbSGw8cHVdOUy5kYXRhTxECrAAAAAACrAACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyQEH0EgZ2VuZXJhbGl6ZWQgbWF4aW11IzJCOUIxRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAArmx/CRVVZAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAhCYW5lcmplZQAQAAgAAL7OXI4AAAARAAgAAMJFOTkAAAABABQANyQEADcbgAAAsvIAABLGAAASrQACAFNEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6QmFuZXJqZWU6QSBnZW5lcmFsaXplZCBtYXhpbXUjMkI5QjFGLnBkZgAADgC6AFwAQQAgAGcAZQBuAGUAcgBhAGwAaQB6AGUAZAAgAG0AYQB4AGkAbQB1AG0AIABlAG4AdAByAG8AcAB5ACAAYQBwAHAAcgBvAGEAYwBoACAAdABvACAAQgByAGUAZwBtAGEAbgAgAGMAbwAtAGMAbAB1AHMAdABlAHIAaQBuAGcAIABhAG4AZAAgAG0AYQB0AHIAaQB4ACAAYQBwAHAAcgBvAHgAaQBtAGEAdABpAG8AbgAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAhy9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9CYW5lcmplZS9BIGdlbmVyYWxpemVkIG1heGltdW0gZW50cm9weSBhcHByb2FjaCB0byBCcmVnbWFuIGNvLWNsdXN0ZXJpbmcgYW5kIG1hdHJpeCBhcHByb3hpbWF0aW9uLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqASIBJwEvA98D4QPmA+8D+gP+BAwEEwQcBCEEJAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAQx},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/ftp/pub/techreports/tr04-24.ps.gz}
      }
    • H. Cho, I. Dhillon, Y. Guan, and S. Sra, "Minimum sum squared residue co-clustering of gene expression data," in Proceedings of the Fourth SIAM International Conference on Data Mining, 2004, pp. 114-125.
      @inproceedings{cho04minimum,
        author = {H. Cho and I. Dhillon and Y. Guan and S. Sra},
        Booktitle = {Proceedings of the Fourth SIAM International Conference on Data Mining},
        Date-Added = {2007-04-12 11:30:35 +0200},
        Date-Modified = {2007-06-19 15:14:55 +0200},
        Keywords = {clustering, co-clustering, bioinformatics},
        Month = {April},
        Pages = {114–125},
        Title = {Minimum sum squared residue co-clustering of gene expression data},
        Url = {http://www.cs.utexas.edu/users/inderjit/public_papers/mssrcc_siam.pdf},
        Year = {2004},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfEFkuLi8uLi8uLi9QYXBlcnMvQ2hvL01pbmltdW0gc3VtIHNxdWFyZWQgcmVzaWR1ZSBjby1jbHVzdGVyaW5nIG9mIGdlbmUgZXhwcmVzc2lvbiBkYXRhLnBkZtIbDxwdV05TLmRhdGFPEQJYAAAAAAJYAAIAAAlEb2N1bWVudHMAAAAAAAAAAAAAAAAAAAAAAAC+zniuSCsAAAA3JQAfTWluaW11bSBzdW0gc3F1YXJlZCAjMkEzOTY0LnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACo5ZMI6vZZQREYgAAAAAAADAAMAAAkAAAAAAAAAAAAAAAAAAAAAA0NobwAAEAAIAAC+zlyOAAAAEQAIAADCOqF2AAAAAQAUADclAAA3G4AAALLyAAASxgAAEq0AAgBORG9jdW1lbnRzOm5lbW86RG9jdW1lbnRzOlVuaXZlcnNpdGE6UGFwZXJzOkNobzpNaW5pbXVtIHN1bSBzcXVhcmVkICMyQTM5NjQucGRmAA4AjABFAE0AaQBuAGkAbQB1AG0AIABzAHUAbQAgAHMAcQB1AGEAcgBlAGQAIAByAGUAcwBpAGQAdQBlACAAYwBvAC0AYwBsAHUAcwB0AGUAcgBpAG4AZwAgAG8AZgAgAGcAZQBuAGUAIABlAHgAcAByAGUAcwBzAGkAbwBuACAAZABhAHQAYQAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAay9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9DaG8vTWluaW11bSBzdW0gc3F1YXJlZCByZXNpZHVlIGNvLWNsdXN0ZXJpbmcgb2YgZ2VuZSBleHByZXNzaW9uIGRhdGEucGRmAAATABIvVm9sdW1lcy9Eb2N1bWVudHMAFQACABf//wAAgAbSHyAhIlgkY2xhc3Nlc1okY2xhc3NuYW1loyIjJF1OU011dGFibGVEYXRhVk5TRGF0YVhOU09iamVjdNIfICYnoickXE5TRGljdGlvbmFyeQAIABEAGwAkACkAMgBEAEkATABRAFMAXABiAGkAdAB8AIMAhgCIAIoAjQCPAJEAkwCgAKoBBgELARMDbwNxA3YDfwOKA44DnAOjA6wDsQO0AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA8E=},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/inderjit/public_papers/mssrcc_siam.pdf}
      }
    • I. S. Dhillon, S. Mallela, and D. S. Modha, "Information-Theoretic Co-Clustering," in Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003, pp. 89-98.
      @inproceedings{dhillon:mallela:modha:03,
        author = {I. S. Dhillon and S. Mallela and D. S. Modha},
        Booktitle = {Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ({KDD}-2003)},
        Date-Modified = {2007-07-14 15:32:35 +0200},
        Keywords = {clustering, co-clustering, relative entropy},
        Pages = {89–98},
        Title = {Information-Theoretic Co-Clustering},
        Url = {http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf},
        Year = {2003},
        Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGBwpZJGFyY2hpdmVyWCR2ZXJzaW9uVCR0b3BYJG9iamVjdHNfEA9OU0tleWVkQXJjaGl2ZXISAAGGoNEICVRyb290gAGoCwwXGBkaHiVVJG51bGzTDQ4PEBMWWk5TLm9iamVjdHNXTlMua2V5c1YkY2xhc3OiERKABIAFohQVgAKAA4AHXHJlbGF0aXZlUGF0aFlhbGlhc0RhdGFfED8uLi8uLi8uLi9QYXBlcnMvRGhpbGxvbi9JbmZvcm1hdGlvbi1UaGVvcmV0aWMgQ28tQ2×1c3RlcmluZy5wZGbSGw8cHVdOUy5kYXRhTxECCgAAAAACCgACAAAJRG9jdW1lbnRzAAAAAAAAAAAAAAAAAAAAAAAAvs54rkgrAAAANyNdH0luZm9ybWF0aW9uLVRoZW9yZXRpIzIzQThBNi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAjqKbCBy5VAAAAAAAAAAAAAwADAAAJAAAAAAAAAAAAAAAAAAAAAAdEaGlsbG9uAAAQAAgAAL7OXI4AAAARAAgAAMIHIEUAAAABABQANyNdADcbgAAAsvIAABLGAAASrQACAFJEb2N1bWVudHM6bmVtbzpEb2N1bWVudHM6VW5pdmVyc2l0YTpQYXBlcnM6RGhpbGxvbjpJbmZvcm1hdGlvbi1UaGVvcmV0aSMyM0E4QTYucGRmAA4AUAAnAEkAbgBmAG8AcgBtAGEAdABpAG8AbgAtAFQAaABlAG8AcgBlAHQAaQBjACAAQwBvAC0AQwBsAHUAcwB0AGUAcgBpAG4AZwAuAHAAZABmAA8AFAAJAEQAbwBjAHUAbQBlAG4AdABzABIAUS9uZW1vL0RvY3VtZW50cy9Vbml2ZXJzaXRhL1BhcGVycy9EaGlsbG9uL0luZm9ybWF0aW9uLVRoZW9yZXRpYyBDby1DbHVzdGVyaW5nLnBkZgAAEwASL1ZvbHVtZXMvRG9jdW1lbnRzABUAAgAX//8AAIAG0h8gISJYJGNsYXNzZXNaJGNsYXNzbmFtZaMiIyRdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3TSHyAmJ6InJFxOU0RpY3Rpb25hcnkACAARABsAJAApADIARABJAEwAUQBTAFwAYgBpAHQAfACDAIYAiACKAI0AjwCRAJMAoACqAOwA8QD5AwcDCQMOAxcDIgMmAzQDOwNEA0kDTAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAANZ},
        Bdsk-Url-1 = {http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf}
      }
    This blog is multi language by p.osting.it's Babel