Data imputation

Sometimes your data matrix may contain non numerical values. (e.g. NAN = Not A Number as result of illegal mathematical upstream operations in the data pre-processing; or any other text which can not be converted into a floating point number).
SUMO converts these data cells into "Zero", which may add a BIAS to your data and subsequent statistical analyses.

It might be be better to impute those missing values in a well defined manner by more or less intelligent estimated values.
SUMO offers four methods for data imputation:

SUMO can impute multiple missing values within a gene in one go. Already imputed values are not influencing the other genes (non-recursive imputation).

Select Main menu | Adjust data | Data imputation | Row wise

The group selection dialog pops-up:

In the example we first try to impute missing values filter for the "MPA" hybs, then the "G" hybs and finally the "MPA+G" hybs.
With the # of genes edit field you can define how many most similar genes are search and averaged.

!! ----- NB ----- !!
Imputations should only be performed for a small number of missing values (10% of a group or less).
Otherwise a strong BIAS is added to the data resulting in misleading statistical analyses.

Use the filter to remove genes with too many missing values.
 



last edited 23.09.2007