Bumphunter

"Instead of looking for association between a single genomic location and a phenotype of interest, bumphunter looks for genomic regions that are differential in/between predefined sample (patient groups). "
The method may be applied to Beta, M or CNV values.

SUMO's basic bumpunter function is implementated according to the algrithm sketched in minfi tutorial.

1.Step

bumphunter identifies clusters of features located in local neighbourhood.
Between each pair of direct neighbouring cluster members, the distance (in base pairs) shall not exceed a custom defined distance (MaxGap, e.g. 50 bp):

2.Step

Compute average significance of the cluster, by averaging p-values from all cluster members.
p-values are extracted from a previously performed t-test,ANOVA,Man-Whitney of Kruskall-Wallace test).

Compute average "signal" of the cluster, by averaging "signal" from all cluster members.
"Signal" will be class means (absolute difference) frome the underlying statistical test:

absolute difference from the two class measn of two class t-test / Man-Whitney test
maximum pairwise absolute difference between all class' means of an ANOVA
absolute class mean from a 1-class t-test/Man-Whitney test

3.Step

Compute a permutation p-value, to estimate stability of a cluster.

Permutation data from a previous underlying class test is used.
In the class test, define a permutation file.
During computation of statistic, all p-values for the individual permutation for each feature is stored in the file.

bumphunter can reuse this file.
Compute mean p-values for each of the permutated featues in each cluster.
The ratio better Count-better-permutation / Number-of permutation gives the p-permutation.

This p-value is only calculated if a permutation file is defined in the paramter dialog.
Otherwise p-perm is always = 0.

4. Step

Filter cluster by:

Size of cluster (number of features in cluster)
Significance
Signal

The remaining cluster are the bumps we are looking for.

Bumphunter with SUMO

Select a previously performed statistical test from the analysis tree:

1-class t-/Man-Whitney-test
Use class mean as "signal"
2-class 2-/man Whitney test
Use abs(Class1-Mean - Class2-Mean) as signal
Multi-class ANOVA/Kruskal-Wallace test
Use maximm(abs(Class_i-Mean - Class_j-Mean)) from all classes i ≠ j

SUMO will extract the respective data from the selected analysis and use them in the following steps.

Select Main menu | Utilities | Methylation | bumphunter:

A dialog opens up, allowing to set paramters for bumphunter:

Chr.xxxx Position Column-ID	For cluster building, bumphunter needs the genomic position of the individual oligos. Define the feature annotation colum which contains this paramter. bumphunter expects the cooridinates given in a fomat: "chromosome.baseposition" (e.g. "chr1.00123456" or "01.00123456" or "001.00123456-00123490" in all cases bumphunter would extract "00123456" as feature position, chromosome id can be neglectid for cluster building). You might use use main menu \| Adjust \| Annotation - function to create such a parameter. bumphunter expects data to be sorted in increasing order. If not yet done, sort the matrix now (Main menu adjust annotation \| Sort) You might also sort the cooridinates on the fly for this particular analysis: Define "001.00123456:sort".
MaxGap	The maximum distance between two neighbouring oligos which are contained in the same cluster Small MaxGap will generate less but dense, smaller clusters Larger MaxGap will generate more, larger and wide spread clusters To see the effect of MaxGap specify Action : "clusterscan"
p-Averaging method	Define how to compute average p-value for a cluster. Median: a tolerant average. Ignoring up to 49% high p-value features (i.e. higher then the "average"). Max: very strict "average". The worst (largest) p-value is returned.
Regulation Averagin method	Define how to compute average regulation for a cluster. In case of log2-values (or other signed data) you should NOT use geometric mean (e.g. M-values).
Permutation file name	A permutation data file created previously with *SUMO* !!! NB: the permutation values MUST have the same (sorted) order of features as the data used for bumphunter NB !!!
Action	What to do: bumphunter : find significantly regulated clusters as described above, applying above parameters. clusterscan : run clusterbuilding with different MaxGap values and show clustersize distriubtion. Define "clusterscan=Start_MaxGap:End_MaxGap:SteSize". Eg. "clusterscan=50:1000:50" This will run 20 cycles of cluster-building using Maxgap=50,10,150,...900,950,1000

Cluster scan

A tool to view and analyze cluster size distribution depending on MaxGap parameter.

ClusterScan runs several rounds of cluster buidling with varying MxaGap size and generates histograms with cluster size distribution.

Define e.g. "clusterscan=50:1000:50"

ClusterScan will run 20 cycles of cluster-building using MaxGap=50,10,150,...900,950,1000

For each round a histogram is build.

Final graph with all 20 histograms is shown:

Convert the histogram to a Cumulative Distribution plot:
Histogram viewer | Edit | Transform data | Cumulative

Bumphunter

Find clusters of significantly reguated features.

A few diagnstic plots are displayed, as well as a filtering dialog:

Close the filter dialog (Cancel button) and review the diagnostic plots.

Volcano plot

The Volcano plot shows for each detected cluster:

x-axes: "signal" (e.g. Beta difference of the two classes from a a 2-class t-test)
y-axes: p-value - significance of regulation
Color: clustersize

In the example, there are no big clusters (red marker, cluster size n>50).
Larger clusters (green, 20..49 members) show low p-vale (>0.01) and weak regulation (<0.2).
Singnifcant clusters are very small (black, violet, n<5).

3-D Volcano plot

In the 3-D Volcano plot all three parameters are shown simultaneously:

X-Axes: significane (p-value)
y-Axes: regulation
z-Axes: cluster size

You may interactively scale and rotate the graph to analyze data distribution.

p-distribution scatterplot

The scatter plot compares p-values for each cluster generated by

y-axis: direct averaging of cluster menbers (y-axis)
x-axis: randomisation test.
Probability of a cluster to show higher significance when randomizing members (patients) in underlying class test.

Filter cluster

Based on the diagnostic plots you now may want to filter your data.

Just rerun bumphunter.

Set filter parameters:

Set filter parameters accordingly.

To let all cluster pass the filter, set:

Minmal bumps size => 2
p-values => 1
Regulation => 1

Click OK-button to apply filter.

A dialog opens up, showing filter results:

A total of 4896 bumps/clusters were found and analyzed (depending on MaxGap, here: 250bp)
All bumps passed the size filter (here: N=2)
4418 bumps had an average p-value <0.05
From the remaining 478 bumps 276 were rejected due to differential regulation <0.2 in the 2-class test
all remaining 203 bumps had a permutation p-vlaue <0.05

In case you get too many / too few bumps, rerun the filter - click No-button.

If filter settings are acceptabe - click OK-button.

A few more diagnostic plots are generated:

More important: a genelist is generated in the analysis tree.

From there you may view a submatrix / heatmap from all features building the clusters.

Additionally, a new heatmap is shown:

all filtered bumps with all their underlying features
only those condtions unsed in the underlying statistical test
full feature7condition annotations

additional feature annoptation columns:

Cluster-ID	(Random) unique ID for each cluster
Original-Row	Row number of the respective feature in the original data matrix
p-Cluster	mean p-value for the cluster as computed from te cluster members. Obviously p-mean will be identical for all members of the same cluster
Signal	Mean "signal" (e.g. Class-Mean difference) for the cluster as computed from te cluster members. Obviously - see above.
p-Perm	p value from permutation test. Always "0" if no permutation test was performed.