Utilities - Standalone Statistics test

SUMO performs several different statistical test with alld data from a data mtrix.

For convenience, some of these tests may be performed interactively on sigular data sets.







t-test







U-test







Kolmogorov-Smirnov test

Wikipedia explains:
"... the Kolmogorov-Smirnov test (K-S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K-S test), or to compare two samples (two-sample K-S test). ...
... The two-sample K-S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. ..."

Use this utilitiy to test equality of your data.

In SUMO select Main menu | Utilities | Statistic tests | Kolmogorov-Smirnov test

The KS-test dialog opens up:



Fill in data values for group1 and group2.
Or copy Tab/NewLine/space separated values into ClipBoard.
Click Paste group1/2 button to paste clipboard content into the respective data grid row.
(SUMO tries to convert German "," nto international "." decimal divider)

Click the Evaluate butto to perform the KS-test.

Results are shown in the Resutls table:



The Result table indicates:

Top:Results from KS-test
Middle:For comparison - results from t-test
Bottom:   Some descriptive statistics

Additionally are shown:
   

To experiment with KS-test, try Demo data.

2 identical gaussiansTwo different randomly samples Gaussian distributed data sets.
Both with Mean=0, SDev=1

2 gaussians different meanTwo different randomly samples Gaussian distributed data sets.
Both with different Mean, similar SDev

2 gaussians, different SDev   Two different randomly samples Gaussian distributed data sets.
Both with similar Mean, different SDev

2 different gaussiansTwo different randomly samples Gaussian distributed data sets.
Both with different Mean and SDev

Gaussian <=> RandomA Gaussian distributed data sets versus a random dataset.

Random <=> RandomTwo different random dataset.

For more details about KS-test see Non parametric tests | Kolmogorov-Smirnov test.






Fisher-Exact-test







Grubbs outlier test







2x2 Cross tables / Contingency tables

A 2x2 Contingency table shows the frequency of the respective two states in two varaibals.

A simple example:
Analyse the handedness of members ina group of individual depending ongender:
 Handedness
Left-handedright-handedSum
GenderFemale44448
Male43952
Sume8713100


Very often 2x2 tables are used to compare predicted paramters versius observed paramters.
Here thea are called Confusion matrix.
 Predicted
PosiivteNegative
ObservedPositiveTrue PositiveFalse Negative
NegativeFalse PositiveTrue Negative


With the values TP,FP,FN,TN we can derive various values to indicate association betweeen the two variables (Predicted/Observed):

TPTrue positives (Hits)
FPFalse positives (Class 1 error)
FNFalse negatives (Class 2 error)
TNTrue negatives (Correct rejection)
 
SPCSpecificity = TN / (FP+TN) = True Negative Rate (TNR) = Selectivity
SENSensitivity = TP / (TP+FN) = True Positive Rate (TPR) = Recall = Hit rate
PPVPositive Predictive Value = TP / (TP+FP) = Precision
NPVNegative Predictive Value = TN / (TN+FN)
PREPrevalence = (TP+FN)/(TP+FP+FN+TN)
ACCAccuracy = (TP+TN)/(TP+FP+FN+TN) = Random precision
BABalance Accuracy = (TPR+TNR)/2
FPRFalse Positive Rate = FP / (FP+TN) = fall out
FNRFalse Negative Rate = FN ( FN+TP) = Miss rate
GainPPV / ACC
MCCMatthews Correlation Coefficient = ((TP*TN)-(FP*FN)) / sqrt((TP+FP)*(FN+TN)*(TP+FN)*(FP*TN)) = phi-Coefficient
FDRFalse Dicovery Rate = FP / (FP+TP)
FORFalse Omission Rate = FN / (FN+TN)
LR+Positive Likelihood ratio = TPR/FPR = Senistivity/FPR
LR-Negative Likelihood ratio = FNR/TNR = FNR/Specificity
FSF-Score = 2*TP / (TP+FP+TP+FN) = F1-Score or F Measure
RRRelative Risk = TP/(TP+FP) / FN/(FN+TN)
DOPdifference between disproportion = | TP/(TP+FP) - FN/(FN+FP)
PTPrevalence Threshold = sqrt(FPR) / (sqrt(TPR)+sqrt(FPR))
TSThreat Score = Critical Succes Index (CSI = TP / (TP+FN+FP)
FMFowlkes-Mallows Index = sqrt(PPV*TPR)
BMBookmarker Informdness = Informdness = TPR + TNR -1
MKMArkness = DeltaP = PPS+NPV-1
Odds ratio   (TP/FP) / (FN/TN)'
pDOFSignificance by Pearson''s Goodness-of-Fit Test'
pFETSignificance by Fisher''s Exact test
Yule's QYule coefficient of association = (TP-TN) / (TP+TN)
YuleS's Ycoefficient of colligation = = 1 - sqrt( 1-sqr(1-YulesQ)) / YulesQ


Additionally, several
Similarity measures< - as used in clustering of binary verctors - can be computed:

Simple Matching
Russel-Rao
Tanimoto
Kulczynski
Braun
Hamann<
Cohen's Kappa
Bandigwala
Ochiai
Phi
Sneath
Simpson
Yule
Accuracy
F1-Score


From SUMO slect

MAin menu | Utilities | Statistic tests | 2x2 cross table

A new Winwow opens up:



Fill in the values und press Evaluate button to compute the association values.






Neumann trend test







Correlation coefficient r => p-value

What is the probability that a given correlaton value (r) measured with a given number of samples (n) is just random (p~1) or very unexpected (p<<0) ?

In a first step we can compute a t-value:

     r should be in the range: -1 < r < 1, n > 2

(adopted from: Miles and Banyard's (2007), Understanding and Using Statistics in Psychology --- A Practical Introduction)

From t-distribution we can find the corresponding p-value.

In SUMO select Main menu | Utilities | Statistic tests | Correlation coefficient => p-value

Enter your data (r and n).



Click OK-button to compute p:



Edit r / n and recompute p, or Cancel this utility.







Probability => Correlation coefficient r

What correlation coefficent r do I need to yield a certain probability value p ?

In a first step, we can get the t-value from an inverse t-distrubtion for the desired p-value.

Applying and converting the above formula we can compute r:



In SUMO select Main menu | Utilities | Statistic tests | p-value => Correlation coefficient

Enter your data(p and n).



Click OK-button to compute p:



Edit p / n and recompute r, or Cancel this utility.







Mantel test - Test for similarity of two matrices

Wikipedia explains:
The Mantel test, named after Nathan Mantel, is a statistical test of the correlation between two matrices. The matrices must be of the same dimension; in most applications, they are matrices of interrelations between the same vectors of objects.
Originally the Mantel test was introduced to compare distance matrices: square matrices with identical dimensions and positive data values.
But the tst may be also performed with not squared matrices.

The two matrices must have:

Mantel's test statistic is comupted with a basic cross product formula:

With x,y two (square) non-negative matrices with n, m number of rows/columns.

A normalized correlation value is computed:

with ‾x, ‾y = average from matrix x and y respectively.
sx, sy, = standard deviation from matrix x and y respectively

r ranges from -1 ... 1-

A significance value for similarity is computed based on a permutation scheme wher rows and columns of (one) matrix are radomly shuffeld.
For each of the np permutation an r' is computed
The number m of permutations where r' < r is counted and onverted into a p-value:
p = (m+1) / np

To extract a trustful p-value, the number of permutations should be adopted to the critical p-value:
pnp
0.051000
0.015000
0.001   50000
......

With SUMO select:

Main menu | Utilities | Statistics | Mantel test

In the parameter dialog select/specify: