# Non parametric tests

A question in any kind of experimental data production is:
are my measured data only statistical noise from my measurement system or are they statistically significant

Often this test is performed, assuming that data are Gaussian like distributed around the "real" value. Under such an assumption tests based on the well known Gaussian  distribution may be performed (e.g. t-test or ANOVA). Here we can use two parameters (=> parametric tests), to exactly describe the distribution: Mean and Standard deviation.

But often this assumption may not be true.
Under this assumption, more general non parametric tests not assuming a specific distribution should be used.

SUMO offers two variants for non-parametric tests.

### A. Tests which mainly analyze the position of the Median in the underlying data sets:

Man Whitney-, Wilcoxon-, Kruskal_Wallis- tests

### B. Tests which analyze differences in both location and shape in the underlying data sets:

Kolmogorv-Smirnov Tests

## Man Whitney-, Wilcoxon-, Kruskal_wallis- tests

1-class: U-/Man-Whitney test
2-class: U-/Man-Whitney test
2-class paired samples: Wilcoxon test

Multi-Class: Kruskal-Wallis test

How the tests work
One-class test
Two-class test

## How the tests works

Assume an experiment with 2x3 hybs and following data for one gene:

 Un-treated Treated 0.5 0.6 0.7 0.8 0.9 1.0

Join the data and sort (=rank) them by their ratio:

 Ratio: 0.5 0.6 0.7 0.8 0.9 1 Rank: 1 2 3 4 5 6

Now calculate the rank sum for each group:

Un-treated:    1+3+5=9 = R1>
Treated:         2+4+6=12 = R2

Question: How probable is it  (p-value) to find a rank sum as small as 9 in the data set.

Compute all possible rank sums:

1,2,3=6
1,2,4=7
1,2,5=8
1,2,6=9
1,3,4=8
1,3,5=9
1,3,6=10
1,4,5=10
1,4,6=11
1,5,6=12
2,3,4=9
2,3,5=10
2,3,6,=11
2,4,5=11
2,4,6=12
2,5,6=13
3,4,5=12
3,4,6=13
3,5,6=14
4,5,6=15

Total  = 20 possible ranksums
For more then 3 groups ranksums are calculated accordingly

Build ranksum distribution:
 Rank sum 6 7 8 9 10 11 12 13 14 15 Number 1 1 2 3 3 3 3 2 1 1 p-value 0.05 0.05 0.1 0.15 0.15 0.15 0.15 0.1 0.05 0.05

Now sum up the p-values for all rank sums below the rank-sum of our smaller group (here: R1= 9)
=> p-value <=0.2 that the two groups are identical.

For larger groups it will become very time consuming to calculate all possible rank sums. sumo calculates the correct rank sum distribution up to a total group size of 32 (e.g. n1=16 hybs in one, n2=16 hybs in the other group), which is done within a few seconds. For larger groups the rank sum distribution is approximated by a Gaussian distribution deriving the p-value from the Gauss distribution with:

 Normal approximation: z = ( U - mU) / sU Mean mU  = n1 * n2/ 2 Standard deviation: sU  = sqrt (   ( n1 * n2 * (n1 * n2 +1) ) / 12  )
U1  = n1 * n2 +(n1 *  ( n2 + 1)) / 2 - R1
U2  = n1 * n2 + (n1 * ( n2 + 1)) / 2 - R2

U = Min ( R1 , R2 )

Tie-correction

How to handle identical repeated measured values? Assume the following data set:
 Un-treated Treated 0.5 0.7 0.7 0.9 0.9 1.1

Join the data and sort (=rank) them by their ratio:
 Ratio: 0.5 0.7 0.7 0.9 0.9 1.1 Rank': 1 2 3 4 5 6 Rank'': 1 2 3 4 5 6

Obviously both rankings (Rank' and Rank'') are identical but would create different rank sums.
To avoid this for each run of repeated values (=Tie) the average rank for all members of this run is calculated (=mid-rank) and used as tie-corrected rank:
 Ratio: 0.5 0.7 0.7 0.9 0.9 1.1 Rank: 1 2.5 2.5 4.5 5.5 6

For more details about U-/Mann Whitney distribution go here, here or here.
A detailed description (German version) can be found at:
AG Psychologische Methodenlehre, Uni Konstanz, compiled by Dr. Nagl, in this document.

### Single-class U-/Mann-Whitney-test

Here we want to test, whether a single gene is different to a fixed value (e.g. 0 regulation) in all selected hybridisations.
E.g. find genes which are highly regulated in all different cancer tissues => general impact for cancer.
A 1-class U-/Mann-Whitney test is not a "standard" statistical test, but we can perform it similarly like a 2-class test:
group 2 will contain only 1 member with ratio == 0.

All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Click the Non-parametric tests button and select One-class U-/Mann-Whitney-Test

The parameter dialog-pops up
In the Groups tab-sheet select all required hybridisations: On the Parameter page select: Select the algorithm how p-value is estimated:
• Ranksum-distribution to derive p-values from all possible rank sums which could arise from the dataset.
• Permutations and number of permutations to estimate the p-value using a permutation scheme

Click OK button, and run the analysis.

In the experiment tree a new entry shows up: Like with t-tests (see above)

• View the Parameters for this analysis
• Use p-graph to select significant genesets
• Use Volano plot to select significant / reegulated gene sets
• View Profiles of selected genes
• View Centroid profilesfrom selected genes
• View Heat maps from selected genes
• ...

Save the matching genes:
• Gene lists: gene names and numerical test values
• Gene sub-expression matrices: Expression matrix including gene-expression values

### Two-class U-/Mann-Whitney-test

Here we test, whether a single gene is different in two selected sub-sets of our hybridisations (e.g. Cancer <=> Normal)
E.g. find genes which are clearly differentially regulated in the two sub-sets (up<=>down or up<=>more up or down<=>more down).
All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Click the Non-parametric tests button and select Two-class U-/Mann-Whitney-Test (unpaired)

Like above:
• Assign samples to groups
• Set parameters
• Set filters to selevt genes (p-graph/Volcano Plot
• View Profiles/Centroids/Heatmaps from selected genes

NB: Not very surprisingly you hardly get any any genes with low p-values when analysing small groups.
With e.g. 2 out of 6 you can only generate 15 different rank sums. Thus the smallest p-value will be 1/15~0.07.

### Wilcoxon test (2-classpaired samples)

This test is used in situations in which the observations are paired: e.g. Blood pressure of patients before/after medicatuion.
This test assumes that there is information in the magnitudes of the differences between paired observations, as well as the signs:

• Take the paired observations,
• Calculate the differences, and rank them from smallest to largest by ABSOLUTE value.
• Ignore values with difference=0
• Add all the ranks associated with positive differences, giving the T+ statistic.
• Add all the ranks associated with negative differences, giving the T- statistic.
• Finally, the P-value associated with this statistic is calculated the same way as for a standard U-/Man-Whitney test (see above).

Example: Assume measurement of a parameter for 10 individuals before and after treatment:

 Individual Before After Difference (Before-After) |Difference| Raw Ranks (one possibility, Ties!!) Ranks Tie-corrected 1 8 4 4 4 7 7.5 2 23 16 7 7 10 10 3 7 6 1 1 1 2 4 11 12 -1 1 2 2 5 5 6 -1 1 3 2 6 9 7 2 2 4 4.5 7 12 10 2 2 5 4.5 8 6 10 -4 4 8 7.5 9 10 10 0 0 - - 10 18 13 5 5 9 9 11 9 6 3 3 6 6

Ranksum for negative values R-=2+2+7.5=11.5
Ranksum for positive values R+=2+4.5+4.5+6+7.5+9+10=435

NB: Tie correction: Ranke values for |Difference|=1 (3 concurrencies) can not be defined. Therefore use average values for all three replicated values (Ties). Here: For |Difference|=1 (1+2+3)/3=2; |Difference|=2 (4+5)/2=4.5; |Difference|=4 (7+8)/2=7.5

Use ranksums R+=43.5, N+=7  and R-=11.5, N+=3 with normal 2-class U-Mann-Whitney test (see above).

### Multi-class Kruskal-Wallis

Here we test, whether a single gene is different between multiple (3 or more) selected sub-sets of our hybridisations (e.g. Kidney <=> Liver <=> Lung <=> Heart)
E.g. find genes which are clearly different regulated in any different grouping of the four sub-sets  (K-LLH or L-KLH or KH-LL ...).
To do this SUMO builds all possible 2-group unique combinations of the sub set and performs for each combination a T-test. The lowest p-value of any of the possible combinations is reported as p-value for this gene.
All genes with low p-value are statistically significantly able to distinguish between the original sub-sets (in any way described above).

Click the Non-parametric tests button and select Kruskal-Wallis test

• Assign groups, set parameters
• Filter significant genes
• Heatmaps, ... from selected genes
• ...

### Kolmogorov-Smirnov test

Wikipedia explains::
"... the Kolmogorov-Smirnov test (K-S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K-S test), or to compare two samples (two-sample K-S test). ...
... The two-sample K-S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. ..."

## Test principle, in brief:

Build the "density function" from your data: For our set of distinct measurements we build histograms counting the frequncies of measurements in intensity-/ratio bins.

• Convert the "density function" into the Empirical Distribution Function (EDF) of your data set (~cumulative density function): • find the position with largest difference Dn,m between the two distributions.

• Derive a p-value from Dn,m with samplesizes (n for first data set and m for second dataset) as parameters.

Following
Wikipedia we would reject the null hypothesis at level α if: (1)

with (2)

Inserting (2) in (1) and performing some algebraic transformations we can compute α depending on Dn,mwith parameters n and m: This α is the p-value (depending on Dn,m, n, m) at which we would reject the null hypotheses.

Click the Non-parametric tests button and select Kolmogorov-Smirnov-test.

• Assign groups, set parameters
• Filter significant genes
• Heatmaps, ... from selected genes
• ...

To explore in more details the KS-Test - with demo data or custom datasets - see Utilities | Statistics test | Kolmogorov-Smirnov test.

## Two-class Kolmogorov-Smirnov test

As described above, used to test similarity between two groups of data.

## Multi-class Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test is defined to compare 2 data groups.

But similar to the Multi-class t-test we can perform a multi-class Kolmogorov-Smirnov test.

For all possible unique combinatons of two groups (e.g. Group1 vs Group2, 1 vs 3,..,2 vs 3,...(n-1) vs n, but not 2 vs 1,...) we perform a Kolmogorov-Smirnov test and memorize all respective p-values.

Under the null-hypothesis (all groups are randomly distributed), thus all pairwise tests should report a p ~ 1.

If one or more group pairs have different distributions for one feature, one or mores p-values will be p << 1
And those are the features which may be helpful to explain the grouping.

On the Paramter-tabsheet you may chosse:
• MIN p-value between any group-pair
E.g. you defined 5 groups =>10 group pairs to test.
As long as at least ONE pair delivers a p-value<=Threshold, this feature will be reported as signifcant.
A filter strategy comparable to ANOVA
• MAX p-value between any group-pair
E.g. you defined 5 groups =>10 group pairs.
This time we require that ALL 10 pairs deliver a p-value<=Threshold, to filter this feature as signifcant
And thus even the largest p-value of from any of the pairs must be <=Threshold.
A filter, much more restrictive compared to ANOVA, and returning a different kind of information.