# T-Tests

A basic question in any kind of experimental data production is:

are my measured data only noise from my measurement system or are they statistically significant

A very basic - and commonly used - test to determine this is a T-Test.

How probable is it that my measured data distribution is part of another distribution (e.g. noise).

Case 1: How probably is it that a single gene is regulated, or only part of stochastic noise in my data => Single class T-test

Case 2: How probably is it, that the observed regulation of a single gene is different in two sub-sets of patients (healthy - ill) => 2-class T-Test

Case 3: How probably is it, that the observed regulation of a single gene is able to distinguish between multiple sub-sets (lung - kidney - liver - brain) => Multi-class T-test or ANOVA

Case 4: Show paired samples (e.g. Control vs. Treated) same regulation trend ? => Paired samples 2 class test

Common to all these tests is the basic 2 class T-test:

Assume two randomly Gaussian distributed measurements (e.g. expression ratios from 10 cancer and 10 healthy patients): Obviously, the above curves (distributions) are nicely separated, and thus we can assume, the expression ratio is different between cancer / healthy.

Also those distributions are nicely separated:  Whereas those are not nicely separated:  Good rules to describe nicely separated distribution might be:

• as "fare away" as better

• as "smaller" as better

A bit more formalised: calculate

• average (M1 and M2 for the two distributions)

• standad deviation (s1 and s2 for the two distributions) and calculate distance between averages (M1-M2) and divide by the weighted "widths" (s1+s2) which gives you the t-value.
If you assume the distributions to be Gaussian like, you can directly transform the t-vale into a p-value (probability value) based on the t-distribution.
For 2-/Multi-class T-tests SUMO assumes un-equal group variances, using above formula for T calculation and Welch-Satterthwaite-Approximation for calculation of Degrees of Freedom.

The p-value from a one-sided test is reported and gives you the probability, that the two distributions are identical, i.e. they have same average and standard deviation.

As smaller the p-value, as less similar the curves are.

In all tests we search genes with low small p-value.

## Single-class T-test

Here we test, whether a single gene is different to a fixed value (e.g. 0 regulation) in all selected hybridisations.
E.g. find genes which are highly regulated in all different cancer tissues => general impact for cancer.

In this case we assume M2=0 and S2=0, thus the T-test formula degenerates to All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

## Paired samples 2-class t-test

Here we test, whether paired samples (e.g. Treatment1 vs. Treatment2 for each single patient) show the same regulation trend (e.g. down- regulated from T1 -> T2).
We compute for each pair the difference and use a single class t-test to estimate, whether the differences are statistically significantly positive or negative.

Lets assume we have measured the expression of a single gene in several patients (n patients) after treatment with compound A and compund B, respectively.

First we compute the difference for each patient: Next we compute the t-value: With t and n we can compute the p-value from a 1-class t-test against 0-regulation.

## Single-class T-test with SUMO

Here we test, whether a single gene is different to a fixed value (e.g. 0 regulation) in all selected hybridisations.
E.g. find genes which are highly regulated in all different cancer tissues => general impact for cancer.

In this case we assume M2=0 and S2=0, thus the T-test formula degenerates to All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Click the T-Test button and select One-class t-Test

The parameter dialog-pops up

In the Groups tab-sheet select all required hybridisations: On the Parameter page select: Select the algorithm how p-value is estimated:

• T-distribution to derive p-values from T-distribution

• Permutations and number of permutations to estimate the p-value using a permutation scheme

Click OK button, and run the analysis.

In the experiment tree a new entry shows up: Click any of the listed items:

Set the critical p-value alphausing the p-Graph: The Graph shows the number of matching genes (i.e. those with corresponding p-values) in dependence of the p-value (here decadic logarithm).
Blue curves show p-value according to T-distribution, green curves show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the slider to adjust alpha and watch the changing number of genes in thetable.

Click the respective table field to use for subsequent data views. E.g. if you want to use the 713 genes which passed the Bonferroni corrected T-distribution test click the table cell Distribution / p-Bonf.

View the Parameter for this analysis

Analyse the different views of those genes which have p-values < alpha

• Gene profiles

• Centroid profiles

• Heat maps

Save the matching genes

• Gene lists: gene names and numerical T-Test values

• Gene sub-expression matrices: Expression matrix including gene-expression values

## Two-class T-test with SUMO

Here we test, whether a single gene is different in two selected sub-sets of our hybridisations (e.g. Cancer <=> Normal)
E.g. find genes which are clearly different regulated in the two sub-sets (up<=>down or up<=>more up or down<=>more down).

Her we use the standard T-test formula: All genes with low p-value are statistically significant regulated (i.e. either up- or down-regulated).

Click the T-Test button and select Two-class t-Test

The parameter dialog-pops up

In the Groups tab-sheet select all required hybridisations: On the Parameter page select: Select the algorithm how p-value is estimated:

• T-distribution to derive p-values from T-distribution

• Permutations and number of permutations to estimate the p-value using a permutation scheme

Click OK button, and run the analysis.

In the experiment tree a new entry shows up: Click any of the listed items:

Set the critical p-value alphausing the p-Graph: The Graph shows the number of matching genes (i.e. those with corresponding p-values) in dependence of the p-value (here decadic logarithm).
Blue curves show p-value according to T-distribution, green curves show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the slider to adjust alpha and watch the changing number of genes in thetable.

Click the respective table field to use for subsequent data views. E.g. if you want to use the 713 genes which passed the Bonferroni corrected T-distribution test click the table cell Distribution / p-Bonf.

View the Parameter for this analysis

Analyse the different views of those genes which have p-values <= alpha

• Gene profiles

• Centroid profiles

• Heat maps

Save the matching genes

• Gene lists: gene names and numerical T-Test values

• Gene sub-expression matrices: Expression matrix including gene-expression values

## Paired samples 2-class t-test

Here we test, whether paired samples (e.g. Treatment1 vs. Treatment2 for each single patient) show the same regulation trend (e.g. down- regulated from T1 -> T2).

Click the t-test button and select Two-class t-Test (paired samples) The parameter dialog-pops up: Assign the hybridisation pairwise to the two groups. Click a hyb in the Conditions list, then click the respective cell in the Pairs table.

Click the Clear button to empty the table.

In the experiment tree a new entry shows up:When done, click the Run button to perform the test. Go to p-graph and adjust critical, p-value.

As usual, show Gene list / sub-expression matrix / Gene / Centroid profiles or heatmap from the selected genes.

## Multi-class T-test with SUMO

Here we test, whether a single gene is different between multiple (3 or more) selected sub-sets of our hybridisations (e.g. Kidney <=> Liver <=> Lung <=> Heart)
E.g. find genes which are clearly different regulated in any different grouping of the four sub-sets  (K-LLH or L-KLH or KH-LL ...).
To do this SUMO builds all possible 2-group unique combinations of the sub set and performs for each combination a T-test. The lowest p-value of any of the possible combinations is reported as p-value for this gene.

The number of combinations increases in the order of the Faculty function. Therefore, the number of combinations becomes very large and computational intensive with large numbers of sub sets. SUMO is limited to handle a maximum of 9 groups.
A more flexible and less challenging way to analyse larger groups is ANOVA.

All genes with low p-value are statistically significantly able to distinguish between the original sub-sets (in any way described above).

Click the T-Test button and select Multi-class t-Test

The parameter dialog-pops up

In the Groups tab-sheet select all required hybridisations. Set the number to 3 or more: On the Parameter page select: Select the algorithm how p-value is estimated:

• T-distribution to derive p-values from T-distribution

• Permutations and number of permutations to estimate the p-value using a permutation scheme (not implemented yet)

Click OK button, and run the analysis.

In the experiment tree a new entry shows up: Click any of the listed items:

Set the critical p-value alphausing the p-Graph: The Graph shows the number of matching genes (i.e. those with corresponding p-values) in dependence of the p-value (here decadic logarithm).
Blue curves show p-value according to T-distribution, green curves show p-values according to permutations.

The table show numbers of genes with p-values equal or smaller the set critical p-value.

Use the slider to adjust alpha and watch the changing number of genes in thetable.

Click the respective table field to use for subsequent data views. E.g. if you want to use the 713 genes which passed the Bonferroni corrected T-distribution test click the table cell Distribution / p-Bonf.

View the Parameter for this analysis

Analyse the different views of those genes which have p-values  < alpha

• Gene profiles

• Centroid profiles

• Heat maps

Save the matching genes

• Gene lists: gene names and numerical T-Test values

• Gene sub-expression matrices: Expression matrix including gene-expression values