Load data as a "single flatfile data base" containg gene expression data as well as sample and gene annotations.
SUMO supports several kinds of data files / import:
- Tab delimited text files for loading of
NB: Data cells should NOT contain internal additional tabs !!
- Comma delimited (CSV) text files for loading of
NB: Data cells should NOT contain internal commas !!
- Paste tab delimited data set from clipboard into SUMO
(Main menu | File | Paste matrix from cliboard)
- Amplification-data files exported from ABI's RQ-Manager software
- SUMO proprietary analysis file format
To load a data file goto select Main menu | File | Open Data
With this option you can load an expression matrix as tab-/comma-delimited text file.
In the File-open dialog box select the corresponding file type from the File-type drop-down list.
Such files are easily
generated from micro array databases or exported from spreadsheet programs (e.g.
MS EXCEL | File save as | Tab delimited text).
- a tab delimited text / comma delimited or EXCEL file:
Select the file type in the Open dialog's file type drop-down-list.
- Each line contains a single feature(=gene).
- Each column contains a single condition (=
- All line and columns should have the same number of data cells.
- The number of columns is derived from the FIRST line in the file.
Additional data cells in later lines are ignored.
Missing data cells in later lines are interpreted as "0.00".
- Expression data should have scientific format.
e.g. "12.000" or "1.2E01".
German "decimal comma" is automatically converted into "decimal point".
Non numeric values are interpreted as "nan" (=not a number, e.g. "14.May" or
"nan"). Non numerical values might be imputed or
For statistical analysis, "nan"-values are interpreted as 0.00.
- Single cells can not contain internal tabs or commas respectively.
- First data lines may contain multiple column headers
describing the conditions (Hyb ID, Slide version, treatment, time point, ...)
- First data columns may contain multiple gene annotations (gene names, database
IDs, function, ...)
Alternatively, you may drag and drop tab-delimited text files into SUMO.
A file preview window (showing first few hundreds of lines from the selected file) opens up:
Double click the most left / upper data cell
containing expression data.
- All data right / below this cell are used for analyses
- All lines above are used for hybridisation annotations
- all columns left are used for gene annotations.
The size of the expression matrix is mainly limited by the computer's free RAM.
File name and dimensions of the expression matrix are
shown in the analysis tree:
Click the Data table
node to preview the data table:
The data file is shown in a spreadsheet. For more details
see information about data tables.
SUMO analyses files
Complete analyses generated with SUMO may be saved, including expression data, backu-up data sets and the multiple
statistical tests which have been performed (no SAM analyses).
Main menu | Save analysis
to save an analysis, correspondingly
Main menu | Load analysis
to load a previously saved analysis.
Amplification data files
SUMO may be used to analyze RT-PCR data.
Data generated with ABI's RQ-Manager software (exported as amplification data files) may be imported into SUMO.
Main menu | File | Import | ABI rtPCR amplification data
Select one or multiple files.
A file preview window shows up.
Ensure the correct data column (containing the CT values) is selected and load the data files.
SUMO extracts RN-values (which are used as "comments", useful to identify genes
with low signal levels generating arbitrary CT-values) and CT-values.
Sometimes, very weak signal are named "undetermined" as CT-value by RQ-Manager software.
SUMO recognizes such missing values.
It is recommended to replace those values with some meanignful value (e.g. "40", the highest cycle number).
Most simple use Main menu | Adjust data | Data imputation | Row wise | Constant.
Select all samples and define "40" as replacement value.
SUMO tries to detect multiplex samples.
If found, SUMO requests a name for multiplex enodgenous controls.
I case such controls were used, give the unique name of the controls (or a unique part of the name).
IF no - cancel the dialogue.
SUMO now performs:
Replicates, i.e. same Gene-ID and Sample-ID are automatically averaged - even across multiple amplification data files.
- find all respective multiplex controls
- comnpute grand mean-ct from all multiplex controls
- for each multiplex control compute difference = individual-control - mean-ct
- for each multiplex gene adjust : gene-ct - difference, to adjust the samples original dna-amont according to its individual endogenous control
- remove all multiplex controls
Additionally, SUMO computes averages and standard deviation from both deltaRN as well as
from ct values and places them into the gene annotations. Such values might be used to filter
genes with overall low abundance (i.e. high ct-values, e.g. >35) or low signal (i.e. low delta RN , e.g. <<1).
A new file containg averaging information is automatically created
(original filename extended with "_MenaSDevN", e.g. "MyExperiment.sdm-Amplification Data_MeanSDevN.txt").
For each sample it contains three data colums:
- Mean/Median CT-value from all technical replicates
- number of replicates
Now you may use SUMO's functionality to analyse the PCR data.
But keep in mind:
CT-values represent ~log2 values !!
- Therefore, any normalisations should be performed as "Centering", i.e. reference values
are subtracted (corresponding to division with linear intensity data).
- raw ct-values are scaled inverse compared to gene expression raw intensities:
higt ct => low abundance
low ct => high abundance
- log values: a ct difference of 2 (delta-delta-ct) corresponds to a ~4 fold regulation,
a ct difference of 5 corresponds to a ~32 fold regulation,
- After normalization (subtraction) of biological controls, you still have the difference in ct-values.
Invert the sign to convert ct-values to log2-ratios