October 01 2007

Induced Missing Values Experiments - Stage 1

Few days ago I made ready a tool to induce pseudo-random missing values within datasets. This tool allow us to test the robustness of both Bregman Co-clustering and SVC with respect to missing values.

The tool accepts two parameters: the fraction of objects that will be affected by the process, and the list of features involved.

As is my custom, I started this series of experiments with the IRIS data. So, I created these IRIS dataset variants

- IRIS 5a: 5% of objects with missing values. One feature (#3) involved.
- IRIS 5b: 5% of objects with missing values. Two features (#3, #4) involved.
- IRIS 10a: 10% of objects with missing values. One feature (#3) involved.
- IRIS 10b: 10% of objects with missing values. Two features (#3, #4) involved.
- IRIS 20a: 20% of objects with missing values. One feature (#3) involved.
- IRIS 20b: 20% of objects with missing values. Two features (#3, #4) involved.
- IRIS 30a: 30% of objects with missing values. One feature (#3) involved.
- IRIS 30b: 30% of objects with missing values. Two features (#3, #4) involved.

We recall the IRIS data have 4 features.

Here you can download the results.

The experiments was done with Co-clustering and SVC. Information-theoretic co-clustering results are not in the files above, because they were irrelevant (very poor performance).

In the files above:

- “MV” stands for “Missing Values”
- “FC” stands for “Feature Clusters”
- FC1 means no feature clustering
- FC2 means two clusters of feature requested.
- CC stands for Co-clustering

Comments:

(01) posted on Induced Missing Values Experiments - Stage 1

Post a comment

This blog is multi language by p.osting.it's Babel