October 01 2007
Induced Missing Values Experiments - Stage 1
Few days ago I made ready a tool to induce pseudo-random missing values within datasets. This tool allow us to test the robustness of both Bregman Co-clustering and SVC with respect to missing values.
The tool accepts two parameters: the fraction of objects that will be affected by the process, and the list of features involved.
As is my custom, I started this series of experiments with the IRIS data. So, I created these IRIS dataset variants
- IRIS 5a: 5% of objects with missing values. One feature (#3) involved.
- IRIS 5b: 5% of objects with missing values. Two features (#3, #4) involved.
- IRIS 10a: 10% of objects with missing values. One feature (#3) involved.
- IRIS 10b: 10% of objects with missing values. Two features (#3, #4) involved.
- IRIS 20a: 20% of objects with missing values. One feature (#3) involved.
- IRIS 20b: 20% of objects with missing values. Two features (#3, #4) involved.
- IRIS 30a: 30% of objects with missing values. One feature (#3) involved.
- IRIS 30b: 30% of objects with missing values. Two features (#3, #4) involved.
We recall the IRIS data have 4 features.
Here you can download the results.
The experiments was done with Co-clustering and SVC. Information-theoretic co-clustering results are not in the files above, because they were irrelevant (very poor performance).
In the files above:
- “MV” stands for “Missing Values”
- “FC” stands for “Feature Clusters”
- FC1 means no feature clustering
- FC2 means two clusters of feature requested.
- CC stands for Co-clustering

[...] This is the continuation of the experiments started few days ago. [...]