A basic question in any kind of experimental data production is:

*are my measured data only noise from my measurement system or
are they statistically significant*

A very basic - and commonly used - test to determine this is a
**T-Test.
**This test answers the questions:

How probable is it that my measured data distribution is part of another distribution (e.g. noise).

Case 1: How probably is it that a single **gene is regulated**,
or only part of stochastic noise in my data => **Single class T-test**

Case 2: How probably is it, that the observed regulation of a
single **gene is different in two sub-sets** of patients (healthy - ill)
=> **2-class T-Test**

Case 3: How probably is it, that the observed regulation of a
single **gene is able to distinguish between multiple sub-sets** (lung -
kidney - liver - brain) => **Multi-class T-test ** or **ANOVA**

Case 4: Show
paired samples (e.g. Control vs. Treated) same regulation trend ? => **Paired samples 2 class test**

Common to all these tests is the basic 2 class T-test:

Assume two randomly Gaussian distributed measurements (e.g. expression ratios from 10 cancer and 10 healthy patients):

Obviously, the above curves (distributions) are nicely separated, and thus we
can assume, the expression ratio is different between cancer /
healthy.

Also those distributions are nicely separated:

Whereas those are not nicely separated:

Good rules to describe nicely separated distribution might be:

as "fare away" as better

as "smaller" as better

A bit more formalised: calculate

**average**(M1 and M2 for the two distributions)**standad deviation**(s1 and s2 for the two distributions)

and calculate distance between averages (M1-M2) and divide by the weighted "widths" (s1+s2)

which gives you the **t-value**.

If you assume the distributions to be Gaussian like, you can directly transform
the t-vale into a **p-value** (probability value) based on the
**t-distribution**.

For 2-/Multi-class T-tests * SUMO* assumes un-equal group variances,
using above formula for T calculation and

The p-value from a one-sided test is reported and gives you the probability, that the two distributions are identical, i.e. they have same average and standard deviation.

As smaller the p-value, as less similar the curves are.

In all tests we search genes with low small p-value.

Single-class T-test

Here we test, whether a single gene is different to a fixed value
(e.g. 0 regulation) in all selected hybridisations.

E.g. find genes which are highly regulated in all different cancer tissues
=> general impact for cancer.

In this case we assume M2=0 and S2=0, thus the T-test formula degenerates
to

All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Here we test, whether paired
samples (e.g. Treatment1 vs. Treatment2 for each single patient) show the same
regulation trend (e.g. down- regulated from T1 -> T2).

We compute for each pair the difference and use a single class t-test to
estimate, whether the differences are statistically significantly positive or
negative.

Lets assume we have measured the expression of a single gene in several patients (n patients) after treatment with compound A and compund B, respectively.

First we compute the difference for each patient:

$$

Next we compute the t-value:

With
** t** and

Here we test, whether a single gene is different to a fixed value
(e.g. 0 regulation) in all selected hybridisations.

E.g. find genes which are highly regulated in all different cancer tissues
=> general impact for cancer.

In this case we assume M2=0 and S2=0, thus the T-test formula degenerates
to

All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Click the **T-Test** button

and select **One-class t-Test**

The parameter dialog-pops up

In the **Groups** tab-sheet select
all required hybridisations:

On the **Parameter** page
select:

Select the algorithm how p-value is estimated:

**T-distribution**to derive p-values from T-distribution**Permutations**and**number of permutations**to estimate the p-value using a permutation scheme

Click **OK** button, and run the analysis.

In the experiment tree a new entry shows up:

Click any of the listed items:

Set the critical p-value **alpha**using the **p-Graph**:

The Graph shows the number of matching genes (i.e. those with
corresponding p-values) in dependence of the p-value (here decadic
logarithm).

**Blue curves** show p-value
according to T-distribution, **green
curves** show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the **slider to adjust** alpha
and watch the changing number of genes in the**table**.

Click the respective **table field** to use for subsequent data
views. E.g. if you want to use the 713 genes which passed the Bonferroni
corrected T-distribution test click the table cell Distribution /
p-Bonf.

View the **Parameter** for this analysis

Analyse the different views of those genes which have p-values
< **alpha**

**Gene profiles****Centroid profiles****Heat maps**

Save the matching genes

**Gene lists**: gene names and numerical T-Test values**Gene sub-expression matrices**: Expression matrix including gene-expression values

Two-class T-test with

Here we test, whether a single gene is different in two selected
sub-sets of our hybridisations (e.g. Cancer <=> Normal)

E.g. find genes which are clearly different regulated in the two sub-sets
(up<=>down or up<=>more up or down<=>more down).

Her we use the standard T-test formula:

All genes with low p-value are statistically significant regulated
(i.e. either up- or down-regulated).

Click the **T-Test** button

and select **Two-class t-Test**

The parameter dialog-pops up

In the **Groups** tab-sheet select
all required hybridisations:

On the **Parameter** page
select:

Select the algorithm how p-value is estimated:

**T-distribution**to derive p-values from T-distribution**Permutations**and**number of permutations**to estimate the p-value using a permutation scheme

Click **OK** button, and run the analysis.

In the experiment tree a new entry shows up:

Click any of the listed items:

Set the critical p-value **alpha**using the **p-Graph**:

The Graph shows the number of matching genes (i.e. those with
corresponding p-values) in dependence of the p-value (here decadic
logarithm).

**Blue curves** show p-value
according to T-distribution, **green
curves** show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the **slider to adjust** alpha
and watch the changing number of genes in the**table**.

Click the respective **table field** to use for subsequent data
views. E.g. if you want to use the 713 genes which passed the Bonferroni
corrected T-distribution test click the table cell Distribution / p-Bonf.

View the **Parameter** for this analysis

Analyse the different views of those genes which have p-values
<= **alpha**

**Gene profiles****Centroid profiles****Heat maps**

Save the matching genes

**Gene lists**: gene names and numerical T-Test values**Gene sub-expression matrices**: Expression matrix including gene-expression values

Paired samples 2-class t-test

Here we test, whether paired samples (e.g. Treatment1 vs. Treatment2 for each single patient) show the same regulation trend (e.g. down- regulated from T1 -> T2).

Click the t-test
button and select **Two-class t-Test (paired samples)**

The parameter dialog-pops up:

Assign the
hybridisation pairwise to the two groups. Click a hyb in the
**Conditions** list, then click the respective cell in the
**Pairs** table.

Click the
**Clear** button to empty the table.

In the experiment tree a new
entry shows up:When done, click the **Run** button to
perform the test.

Go to p-graph and adjust critical, p-value.

As usual, show Gene list / sub-expression matrix / Gene / Centroid profiles or heatmap from the selected genes.

Here we test, whether a single gene is different between multiple
(3 or more) selected sub-sets of our hybridisations (e.g. Kidney <=>
Liver <=> Lung <=> Heart)

E.g. find genes which are clearly different regulated in any different grouping
of the four sub-sets (K-LLH or L-KLH or KH-LL ...).

To do this * SUMO* builds all possible 2-group unique combinations
of the sub set and performs for each combination a T-test. The lowest p-value
of any of the possible combinations is reported as p-value for this gene.

The number of combinations increases in the order of the Faculty
function. Therefore, the number of combinations becomes very large and
computational intensive with large numbers of sub sets. SUMO is limited to
handle a maximum of 9 groups.

A more flexible and less challenging way to analyse larger groups is ANOVA.

All genes with low p-value are statistically significantly able to
distinguish between the original sub-sets (in any way described above).

Click the **T-Test** button

and select **Multi-class t-Test**

The parameter dialog-pops up

In the **Groups** tab-sheet select
all required hybridisations. Set the number to 3 or more:

On the **Parameter** page
select:

Select the algorithm how p-value is estimated:

**T-distribution**to derive p-values from T-distribution**Permutations**and**number of permutations**to estimate the p-value using a permutation scheme**(not implemented yet)**

Click **OK** button, and run the analysis.

In the experiment tree a new entry shows up:

Click any of the listed items:

Set the critical p-value **alpha**using the **p-Graph**:

The Graph shows the number of matching genes (i.e. those with
corresponding p-values) in dependence of the p-value (here decadic
logarithm).

**Blue curves** show p-value
according to T-distribution, **green
curves** show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the **slider to adjust** alpha
and watch the changing number of genes in the**table**.

Click the respective **table field** to use for subsequent data
views. E.g. if you want to use the 713 genes which passed the Bonferroni
corrected T-distribution test click the table cell Distribution / p-Bonf.

View the **Parameter** for this analysis

Analyse the different views of those genes which have p-values
< **alpha**

**Gene profiles****Centroid profiles****Heat maps**

Save the matching genes

**Gene lists**: gene names and numerical T-Test values**Gene sub-expression matrices**: Expression matrix including gene-expression values