Data views

To get an impression about your data and to analye and diagnose then, it may be helpful to use different graphic representations of the data.
These graphs may help to explore whether your data contain systematic biasis / distortions and thus need additional data adjustment / normalisations or could be used for statistical analysis.

SUMO can generate

Heat maps visualising intensities / ratios of the expression matrix. The famous red-green heat maps
Box plots summarizing global intensity/ratio distribution in all hybridisations
Scatter plots Analyze pair wise signal distributions See more details about the scatterplot viewer. of hybridisations
Scatter plots mosaic Get an overview of signal distributions in all your sample data
Histograms Show signal distribution in any or all loaded samples
Line graphs / Profiles Illustrating e.g. gene/sample data profiles
Correlation maps Illustrating inter-sample similarity
Population maps
Dot charts
Deviation plot
RLE plot

Heat maps

Heat maps are a direct graphical representation of your expression matrix.
The numerical value of each data cell is translated into a colour in a 2-image.
Traditionally , numerical values

> 0 are translated into shades of red,
< 0 are translated into shades of green,
high values are translated into bright colours
small values are translated into dark colours

SUMO offers various interactive tools to zoom and scale the heat map view, change colours, genes or condition annotations, search in gene and condition annotations or sort the data.

For more details about the heat map viewer, go here.

Box plots

Box-Whisker plots summarize the signal distributions in al hybridisations.

For each hybridisation a single box plot shows:

The box represents all signals between 25% - 75% signal strength
The Whisker (= the line) represents all signals between 5% - 95% signal strength
The blue line represent the median average (50% intensity value)
The red line represents mean average intensity.

Use cursor left/right keys to squeeze / expand width of the plot, or click the toolbar buttons to scale, size the data or switch between linear / logarithmic scaling.

Scatter-plots

The scatter plots are used to pair-wise display two hybridisations' intensity of ratio signals.

The hybridisation selector opens-up:

Select the two conditions (=hybridisations) and the Gene annotation which shall be shown in the scatter plot.

Click one of the four scatter plot buttons:

X-Y	Signals from selected hybridisations are displayed
R / I	Ratio from selected hybs on Y-axes, Product of hybridisations on X-axes
Q / Q	Ranked (=intesity sorted) signals from selected hybridisations are displayed See more details about the Scatterplot viewer
3D	Signals from up to six selected hybridisations may be displayed in a 3D scatterplot

X-Y Scatter plots

See more details about the Scatterplot viewer

R-I Scatter plots

See more details about the Scatterplot viewer

Quantile scatter plots

See more details about the Scatterplot viewer

3-D scatter plots

See more details about the 3D-Scatterplot viewer

Scatterplot Mosaic Scatterplot mosaics may be used to get an overview of signal distributions in all your samples.
Therefore, all possible pair wise dotplots are performed. Small views of the single dotplots are stitched to a mosiac image, showing a quadratic matrix of dotplots. Dependng of the numer of selected samples, generation of the graph may require some time. The resulting graph may become very big, too.

Scatterplot mosaics may be generated as:
- X-Y scatterplots:

- RI = Ratio intensity plots. Here, log-ratios are shown on y-axis, average intensity on x-axis.

- Quantile plots,. Here data are ranked by intensity.

See a full resolution dot plot mosaic image.

Histograms

Histograms may be used to show signal distribution in any or all you loaded samples.
Data may be shown in linear- or log scaling. Scaling and perspective view may be freely adjusted.
Histogram data can be extracted as tab delimited data and used in other applications.

Correlation maps

Correlatons maps are a simple tool to visualize and explore global similarity between the individual hybridisations.
SUMO computes all pairwise correlations between all hybridisations and shows them in the heatmap viewer:

Sorting rows and colums (cluster with Ecuclidean distance/average linkage) shows coarse structure in the data:
- two major sample groups (V and IBC)
- four IBC samples (8,4,19,13) look like the v-group

Correlation maps may be computed using different similarity metrics:

Pearson correlation
(red indicating similarity, r~1; green dissimilarity r~-1)

Deviation plot

The deviation plot visualizes Median as well as 5%/95% as well as 25%/75% percentiles/quantiles for each individual features (e.g. gene) in the dataset.
Features are ranked by median signal, from lowest (left) to highest value (right).

Display of percentiles allows to visualize asymmetric data distributions in the features.
To display deviation plots for conditions (samples), just transpose the matric (Main menu | Adjust data | Transpose matrix).

Obviously the low variability of low intensity features (~100 cts) can not be visualized comparable to the much hijher variabilty of high intensity feautes (>10000 cts).

A Deviaion plot relative to median may solve this issue.
Here the deviations are normalized to the respective median feature-wise.
For medians close to zero, deviations may be misleading.

The example shows the intensity deviation plot from Illumina gene expression arrays (~48000 features):

The graph indicates, that ~30000 features measure signals at background level.

Zoom the intensity axes to see the low intensity features in detail:

Deviation versus Median normalized:

Deviation plot	Media normalized deviation plot

Two subsets from the above dataset (Main menu | Utilities | Demo data | Intensities)

"V"-Samples	"IBC"-Samples

Data views

Heat maps

Box plots

Scatter-plots

X-Y Scatter plots

R-I Scatter plots

Quantile scatter plots

3-D scatter plots

Histograms

Correlation maps

Deviation plot

RLE plot