TableButler - Recalculate data rows

Some times it is required to re-calculate data columns in hybridzation data files :

When using wrong national settings with Excel ratio values get scrambled up
For filtering derived data shall be used (e.g. average intensity from red and green)
Invert ratio from dye swap experiments
Calc replica averages / standard deviations from combined datafiles
repalce strings
.....

Such functionality is found in the Compute tabsheet.
All operations are performed on single rows, sometimes multiple or even the whole data file are used to calculate data values which will be placed in a single row.

If you want to recompute columns goto to the Column CPU tabsheet.

TableButler shows a preview of the data file :

To set-up a single Operation do the following steps:

Select target column
Select source column(s)
Select operation
Define required parameters
Add single Operation to list of "Filters"
Click Go button

By default, TableButler save the result in a new file with extension "_cpu" (e.g. selected file is "my_table.txt", a result file with name "my_table.txt_cpu" wil be created.
To overwrite the source file check the Replace Source check box.

If you want to reuse a set-up list of operations, save the "filter" by clicking the Save filter button.

In a next session reload the filter by clicking the Load filter button.

Select Target column:

Click a colum in the preview grid, then click the Target column field to define where the result of the operation shold be placed, or leave target column empty tp append the new data as new data fields.

If you later click the Add button the respective Target Column value will show up in the Filters list.
If you do not supply a Target Column value (= leave field empty) or type new, the result of the operation will be appended to the table.
Some operations will create multiple column results or place the results in the source cells.
See more details for the respective functions.

Select source column(s)

Click a colum in the preview grid, then click the Col1, Col2, ... to select the columns for the operation:

If you later click the Add button the respective Target Column value will show up in the Filters list.

If you want / need to define a larger number of source columns you can also define a simple regular expression range expression in edit field Col1.
Either define a

List of columns (e.g. "2,4,6,8,10" )
Range of columns (e.g. "2..10",14..20" )
mix of lists and columns (e.g. "2,4,6,10..15,20,22,24,26..30,34" )

e.g. 2,3,6..10,12,13,20..24 will select columns 2,3,6,7,8,9,10,12,13,20,21,22,23,24

You can also use add multiple coluns to the Col1-edit field by Double-Clicking a column in the preview grid while holdeing Ctrl-Key pressed.

Select the operation :

Select the requested operation from the Operation list:

To get a brief description pf the selected operation, click the information button ( "???" ).
A windows message box pops-up showing brief essential information about the selected operation.

For more details about a operation see below.

At present the following groups and operations are available:

Basic arithmetics :

add constant (or add -constant=subtract constant) to data fields in selected column(s)
multiply data fields in selected column(s) with constant (or multiply 1/constant = divide by constant)
set data fields in selected column(s) to constant value (number or text)

Constant value is defined in edit field P1

Sum of columns
Product of columns
Ratio of 2 columns
Log2 of column
Inverse of a column ( 1/ Col1 )
change sign of column ( -Col1)
abolute value of a column
Gauss Klammer
Min value
Max value

Data imputation

replace non numerical cells with constant value
replace non numerical cells with row average (arithmetic mean)
replace non numerical cells with any value from row (Hot deck imputation)
replace non numerical cells with average from xx most similar gene vectors (eucledian distance)

Data adjustment

Mean centering
Median centering
Mean normnalize
Median normalize

Basic Statistics

Min
Max
Arithmetic mean of columns
Geometric mean of columns
Median mean of columns
Standard deviation of columns
Squared mean
Average and Standard deviation, excluding outliers

Divisive statistics

t-test
Single factor ANOVA

Row arithmetics

Replicate averaging

Conditional operations

numeric comparisons

Text operations

Find and replace complete string in complete row or single cell
Find and replace sub-string in complete row or single cell
Concatenate strings from multiple columns into one, using a selectable divider
Bring all lines to the same user defined length (number of data fields)
Text to columns
Copy substring
Remove tailing characters
Tagged text to columns

Coordinate transformation

GAL -> ADF : Subgrid,Column,Row -> MetaColum,MetaRow,Column,Row
ADF -> GAL : MetaColum,MetaRow,Column,Row -> Subgrid,Column,Row

Enter required parameters into P1..P6

Add filter to the list of filters and run filter.

A defined filter may contain multiple operations and may be saved for later use.

The examples shows recalculation of combined data files created with Expression matrix builder of TableButler data file :

1..3 : Calc Average Intensity from Red and Green signal2. Spot should belong to the 75% brightest spots in Green color channel
4..6 : Replace "-1000.00" expression values from empty spots with "0".

Detailed Function description:

Tagged text to columns

This function splits the contents of a cell into individual cells. Target cells are adressed by tags found in the original cell.
TableButler can handle two cases:

Example1:
assume you have a gene annotation column like:

Header line	GeneAnno
...	...
Gene25:	[Genbank]123498;[ENSEMBLE]AS34985
...	...
Gene77:	[Locuslink]234567;[GO]456789
...	...
Gene88:	[Genbank]678990];[Swissprot]788901
...	...

TableButler will generate new column with header names

[Genbank] [ENSEMBLE] [Locuslink] [GO] [Swissprot]

and sort the corresponding "values" of all genes sorted into the respective columns:

Header line	Genbank	ENSEMBLE	Locuslink	GO	Swissprot
...					...
Gene25:	123498	AS34985	--	--	--
...					...
Gene77:	--	--	234567	456789	--
...					...
Gene88:	678990	--	--	--	788901
...					...

The number of inserted columns is dependant of the found tags.
Therefore 'this function should be run INDEPENDANT and NOT in combination with other functions where Column addresses are required.

Parameters:
    Col1 edit filed = ID of column to split
    P1 edit field = character (or string) which divides single fields.
        In the example ";"
    P2 edit field = character or string wich divides "Tag" from "Value"
        In the the example "]"

Example2:
assume you have a gene annotations like:

Header line	GeneAnno
...	...
Gene25:	Genbank\|123498\|ENSEMBLE\|AS34985
...	...
Gene77:	Locuslink\|234567\|GO]456789
...	...
Gene88:	Genbank\|678990\|Swissprot\|788901
...	...

Parameters:
    Col1 = as above
    P1= the onyl field divider
        In the example "|"
    P2 edit filed MUST be empty !!!!!!!
Any value in P2 would be interpreted as divider and thus scramble the operation.

Replicate averaging

Assume a result file from micro array scanning where for unique genes there were muliple replicates on the microarray. Furthermore you have different slide layouts where the total number of features and the number of replicas per feature are different.
Trying to merge such kind of data on a per feature basis is problematic.
Here it might be better to first average replicas per single slide layout (or even per single hyb) and then join the differtn layouts using UNIQUE gene-IDs.

Replicate averaging allows to do this.

Define the column which contains the Key (e.g. gene name) used to find replicas in TARGET COLUMN edit field.
Define the columns for which replica averages shall be calculated in edit fields (COL1 ..COL12). At present only 12 columns may be averaged in one run.

A result file is created which contains

Key-Column, Averages Col1..Col12, Number of replicas:

ID	Arithmetic_Mean	Geometric_Mean	#
A_23_P101972	55.18	51.78	1
A_23_P101992	42.61	35.97	1
A_23_P102000	816.484	92.21	10
A_23_P102017	8.59	5.96	1
A_23_P102037	274.42	269	1
A_23_P102058	72.69	52	1
A_23_P102454	779.68	728.34	1
A_23_P102462	118.35	110.66	1
A_23_P102471	2315.386	2136.46	10
A_23_P102481	12.25	8.68	1
A_23_P102494	326.35	302.44	1

Last edited 22.11.2007,