Bumphunter

"Instead of looking for association between a single genomic location and a phenotype of interest, bumphunter looks for genomic regions that are differential in/between predefined sample (patient groups). "
The method may be applied to Beta, M or CNV values.

SUMO's basic bumpunter function is implementated according to the algrithm sketched in minfi tutorial.

1.Step

bumphunter identifies clusters of features located in local neighbourhood.
Between each pair of direct neighbouring cluster members, the distance (in base pairs) shall not exceed a custom defined distance (MaxGap, e.g. 50 bp):


2.Step

Compute average significance of the cluster, by averaging p-values from all cluster members.
p-values are extracted from a previously performed t-test,ANOVA,Man-Whitney of Kruskall-Wallace test).

Compute average "signal" of the cluster, by averaging "signal" from all cluster members.
"Signal" will be class means (absolute difference) frome the underlying statistical test:

3.Step

Compute a permutation p-value, to estimate stability of a cluster.

Permutation data from a previous underlying class test is used.
In the class test, define a permutation file.
During computation of statistic, all p-values for the individual permutation for each feature is stored in the file.

bumphunter can reuse this file.
Compute mean p-values for each of the permutated featues in each cluster.
The ratio better Count-better-permutation / Number-of permutation gives the p-permutation.

This p-value is only calculated if a permutation file is defined in the paramter dialog.
Otherwise p-perm is always = 0.





4. Step

Filter cluster by:
The remaining cluster are the bumps we are looking for.






Bumphunter with SUMO

Select a previously performed statistical test from the analysis tree:
SUMO will extract the respective data from the selected analysis and use them in the following steps.

Select Main menu | Utilities | Methylation | bumphunter:


A dialog opens up, allowing to set paramters for bumphunter:


Chr.xxxx Position Column-ID For cluster building, bumphunter needs the genomic position of the individual oligos.
Define the feature annotation colum which contains this paramter.
bumphunter expects the cooridinates given in a fomat:
"chromosome.baseposition"
(e.g. "chr1.00123456" or "01.00123456" or "001.00123456-00123490"
in all cases bumphunter would extract "00123456" as feature position,
chromosome id can be neglectid for cluster building).
You might use use main menu | Adjust | Annotation - function to create such a parameter.
bumphunter expects data to be sorted in increasing order.
If not yet done, sort the matrix now (Main menu adjust annotation | Sort)
You might also sort the cooridinates on the fly for this particular analysis:
Define "001.00123456:sort".

MaxGapThe maximum distance between two neighbouring oligos
which are contained in the same cluster
Small MaxGap will generate less but dense, smaller clusters
Larger MaxGap will generate more, larger and wide spread clusters
To see the effect of MaxGap specify Action : "clusterscan"

p-Averaging method Define how to compute average p-value for a cluster.
  • Median: a tolerant average.
    Ignoring up to 49% high p-value features (i.e. higher then the "average").
  • Max: very strict "average".
    The worst (largest) p-value is returned.
Regulation Averagin method Define how to compute average regulation for a cluster.
In case of log2-values (or other signed data) you should NOT use geometric mean (e.g. M-values).

Permutation file nameA permutation data file created previously with SUMO
!!! NB: the permutation values MUST have the same (sorted) order of features
as the data used for bumphunter NB !!!

ActionWhat to do:
  • bumphunter : find significantly regulated clusters as described above,
    applying above parameters.
  • clusterscan : run clusterbuilding with different MaxGap values and show
    clustersize distriubtion.
    Define "clusterscan=Start_MaxGap:End_MaxGap:SteSize".
    Eg. "clusterscan=50:1000:50"
    This will run 20 cycles of cluster-building using Maxgap=50,10,150,...900,950,1000





Cluster scan

A tool to view and analyze cluster size distribution depending on MaxGap parameter.

ClusterScan runs several rounds of cluster buidling with varying MxaGap size and generates histograms with cluster size distribution.

Define e.g. "clusterscan=50:1000:50"

ClusterScan will run 20 cycles of cluster-building using MaxGap=50,10,150,...900,950,1000

For each round a histogram is build.

Final graph with all 20 histograms is shown:


Convert the histogram to a Cumulative Distribution plot:
Histogram viewer | Edit | Transform data | Cumulative






Bumphunter

Find clusters of significantly reguated features.

A few diagnstic plots are displayed, as well as a filtering dialog:


Close the filter dialog (Cancel button) and review the diagnostic plots.



Volcano plot

The Volcano plot shows for each detected cluster:


In the example, there are no big clusters (red marker, cluster size n>50).
Larger clusters (green, 20..49 members) show low p-vale (>0.01) and weak regulation (<0.2).
Singnifcant clusters are very small (black, violet, n<5).





3-D Volcano plot

In the 3-D Volcano plot all three parameters are shown simultaneously:


You may interactively scale and rotate the graph to analyze data distribution.





p-distribution scatterplot

The scatter plot compares p-values for each cluster generated by





Filter cluster

Based on the diagnostic plots you now may want to filter your data.

Just rerun bumphunter.

Set filter parameters:


Set filter parameters accordingly.

To let all cluster pass the filter, set:

Click OK-button to apply filter.

A dialog opens up, showing filter results:



In case you get too many / too few bumps, rerun the filter - click No-button.

If filter settings are acceptabe - click OK-button.

A few more diagnostic plots are generated:


More important: a genelist is generated in the analysis tree.


From there you may view a submatrix / heatmap from all features building the clusters.

Additionally, a new heatmap is shown: