Statistical tests generate lists of significantly (more or less) regulated genes.
They do not tell us anything about biological (medical) importance or impact.
One way to explore this could be, to use already existing informations about interactions
between genes / proteins /... .
Copy selected nodes
Paste a list of search keys
A variaty of expermiments have been performed in the scientific community to explore which genes
/ proteins interact directly with other genes / proteins.
E.g. the Yeast.-2-Hybrid system has been used
to systematically explore direct binary interactions between proteins or between proteins and DNA.
As result a list of binary protein interactions is generated.
Assume a (partial) interaction list (data base) like:
Assume your statistical test / cluster analysis results in a list of significant genes:
Now you can search the members of your gene list in the interaction database:
Genes P7,P32,P1875 are not found at all in the interaction list => they can not be used for building of
Genes P15,P165, P2917,P7192,
P9123 are found in the interaction database => they can be used for building of networks.
Direct interaction network:
In the most simple case we can try to build direct interaction networks: We onyl use
interaction partners which are also part of our list of significant genes.:
Lets look on our genes:
P15: partners P1, P260 - but these interactors are not
found in our list of significant genes, P2917 - member of our list.
P165 : partner P7612 no member.
P2917 : partner P450 - no member, P15, P7192 - member of our list.
P7192 : P2917 - member of our list.
P9123 : partner P4376 - no member
Thus we can try to build a networks with genes P15,P2917,P7192, resulting in a simple linear network:
The other two genes (P165, P9123 )
have interactions partners, but those are not members of our gene list => we don't use them.
One Gene Interpolation network
Assume the following genes:
Only the ACTG1 and TMSB4X genes are direct interaction partner.
But we can try to interpolate genes. I.e. we try to find genes from our interaction database which might be
linking bridges between our selection genes.
Limitation: there should be only one linking gene.
E.g. Selection gene - Interpolation gene - Selection gene.
Bridges with multiple linking genes are not considered (e.g. Selection - Interpolate - Interpolate - Selection).
With this approach we could improve the above network:
The light gray genes ACTB, TGM2, ILK, ACTA1, PPP4c and IGSF21 were interpolated.
I.e. they were NOT in our selection list - we did not identify them to be e.g. differentially regulated in our experiment.
But with the help of these 6 interpolated genes we can place all the other selection genes in one network - and possibly into one
To switch between the two network modes, click the tool-bar buttons:
||Click the button to select building of Direct
Interaction networks (Default setting)
||Click the button to select building of One-Gene-Interpolation
Evidence for a network
Often the question arises: is my network specific or would you generate such kind of network with any arbitrarily selected set of genes.
One way to answer this questions could be to test this with random gene-lists.
SUMO allows to generate a user defined number of random genelists with given size, and try to build nets.
A histogram ist build giving the number of nets with a certain numer of nodes:
Build random nets
V1.00a, 18.05.2015; email@example.com
Number of interacting genes: 16242
Number of interactions: 345992
Size of geneset: 100
Number of permutations: 100
Total number of nets found= 340
4.399s elapsed time.
Size distribution of random Nets
Size n C-Sum p
67 1 1 0.0029411765281111
15 1 2 0.0058823530562222
13 2 4 0.0117647061124444
12 4 8 0.0235294122248888
11 1 9 0.0264705885201693
9 2 11 0.0323529429733753
8 5 16 0.0470588244497776
7 4 20 0.0588235296308994
6 4 24 0.0705882385373116
5 7 31 0.0911764726042747
4 12 43 0.126470595598221
3 61 104 0.30588236451149
2 236 340 1
The table sumarizes the permutation analysis results:
|Size||Size of of the net = number of nodes in the net|
|n||Number of nets with given size|
|C-Sum||Cumulative sum, the number of nets with given or evene larger size|
|p||probability values to find this (or larger sized networks |
This procedure does not explicitly take into account connectivity of single nodes.
E.g. a set of 5 linearly connected nodes will be treated exactly as a star configuration where any node connects any other:
| = |
Obviously, the p-value depends on the size of the genelist as well as on the interaction data base (number of genes and interactions).
Thus, you can estimate the probaility of a certain sized net to be generated by random genelists.
In SUMO Select Netbuilder | Utilities | Random net (to test for user defined genelist size), or
select "P-permutation" from context menu in the analysis tree for a specific net.
In SUMO click Gene-Net Builder button, or select NetBuilder
from the analysis menu:
An empty Net-Builder windows opens:
Ensure databases are loaded.
Use Preferences tab to set-up database parameters.
Next paste a genelist from clipboard or load a gene-list file.
Paste a genelist
Select Edit | Paste genelist from main menu to to get a list of genes from
clipboard for network analysis.
At present, SUMO expects a list of gene symbols (e.g RAS, MYC, HPRT1, VEGFR, ....)
Genes should be separated by spaces, tabs, commas, semi colons or one gene per line.
You can also supply regulation information for each pasted gene.
In this case NetBuilder expects one gene per line followed by a
positive or negative number, sperated by a tab (e.g. VEGF tab
NetBuilder automatically replaces German decimal comma by English
Genes are compared with the loaded interactions, and a Direct-Interaction network (see above:
Basic Idea) is built.
In the analysis tree a new noded is added:
You can see basic statistics
- Number of pasted gene names (All genes 379)
- Number of genes mapped on in interactions ( Matching genes 239)
- Number of genes not found in intgeraction data base (Non-matching genes 140)
- assembled direct interaction networks, including number of genes wihtin each single net.<
Click a gene list (e.g. Non-Matching genes). The corresponding genes are listed in
Log-tabsheet and may be copied.
Click a net (e.g. CDK4 (21) ). The net is shown in a basic network viewer:
You can rearrange single (or multiple nodes) by selecting and freely dragging them.
Here you can customize Net-Builder.
Most important: Define interaction data-bases and gene index.
In the data bases field you can define the available interaction data-base files.
To add a new one click the " ... " button.
Select the data base type (HPRD or NCBI). Next select a suited file.
In the example a single data base file is defined. It contains all direct binary protein-protein interactions availabe from
In this case the data base identifier is: HPRD::
followed by the file specification: D:\Data\...
See the presently supported data-bases.
To load a defined interaction data base double-click the corresponding line in the list.
Check the Autoload checkbox:
Next time you open Net-Builder the defined interacton databases are loaded automatically.
Review the Log-tabsheet to see the data-base loading success:
Presently supported data-base formats:
Human Protein Reference data base.
From their download site you can download a list of binary protein-protein
Download the file interactions.gz and unpack the contained interaction list (
Use the file "... containing human binary protein-protein
interactions in tab delimited format."
Each line in the data file describes a binary interaction, e.g.:
05772 NP_060910.3 VAV1 01284
NP_005419.2 in vitro 10790433
At present, Net-Builder only uses the Gene symbol.
Any other interaction list may be easily converted into a format compatible to the HPRD format and loaded into Net-Builder.
NCBI's interaction database.
From their ftp site you can download a list of binary protein-protein
interactions. Presently, the NCBI list contains interactions from three sources: HPRD, BIND and BIOGRID.
Download the interactons.gz file
and unpack the interaction list with a suited unzip program (e.g. 7-Zip).
Each line in the data file describes a binary interaction, e.g.
9606 1 P04217 RecName:
Full=Alpha-1B-glycoprotein - 9606 10321 GeneID P54108 RecName:
Full=Cysteine-rich secretory protein 3 - - - 15461460 2009-12-17
15:32 - 00726 HPRD
||RecName: Full=Cysteine-rich secretory protein 3
The file contains interactions for all genomes. To filter interactions for a specific genome use the
tax-id columns using e.g. EXCEL's autofilter (N.B. the file contains hundred thousands of lines) or a tool
like TableButler's rowfilter.
Recent (up-dated every few weeks) interaction lists may be downloaded from SUMO site:
interactions_homo-sapiens.txt (~145,000 interactions)
interactions_mus-musculus.txt (~ 5,000 interactions)
interactions_rattus-norvegicus.txt (~2,000 interactions)
NCBI's interaction list uses Gene_IDs. Thus you MUST load the Gene-Index (see below) to
resolve Gene-IDs into Gene Symbols
The gene index is used to combine and link different gene descriptions (e.g. Symbols, Gene-IDs, Aliases, ....
For this task, SUMO uses NCBI's Gene-Info.files. They may be downloaded freely from NCBI's ftp-site:
In the Mammalia subfolder you can find gene infos for the respective mammalia.
Download the respective archive (.gz) and extract the data file with an Unzip program (e.g. 7-Zip.
Recent (up-dated every few weeks) already unpacked gene-info files may be downloaded from SUMO site (better
download updated data from NCBI):