SeedScan - Data files

SeedScan requires several input files to generate the resulting data matrix.

Just drag the files from Windows Explorer into the respective text field,
or click the "..." button to select a file.












Barcodes

During amplification of the specific sample, a pair of sample specific PCR-primers is used to introduce sample specific "barcode" sequences.

SeedScan expects a barcode list:
To preview the barcode file, click the View button right to the Barcode field:

(Click image to see the file)
Obviously, the barcode list may be applied to any analysis where the same barcodes were used.

The barcode list should not contain more barcodes than used for the multiplex PCRs.
Although they should not disturb the counting, many additional (not used) barcodes would slow down the analysis.











Barcode pairs

To improve the identification of multiplex samples, a forward/reverse barcode may be introduced during PCR amplification.

Thus a list of "allowed" barcode-pairs is required.
In cases where multiple seed libraries were used to generate the individual sample, additionlally specify the ID of the respective seed-library.

SeedScan expects a barcode-pair list:
To preview the barcode-pairs file, click the View button right to the Barcode-pairs field:

(Click image to see the file)
The barcode-pairs file has to be adapted to a specific analysis.
Check that:










Seeds

The sequence are compared to the list of (gene) target specific constructs used in the experiment.

SeedScan expects a list of these seeds:
To preview the seeds file, click the View button right to the Seed file field:

(Click image to see the file)

The example shows a part (first hundred from 65383 seeds) of the Gecko A library.











Sequences

Sequences to analyze (one forward and optional one reverse read) are multiple sequence files in FastQ sequence format.
FastQ files may be supplied as straight text files or as GZ-compressed archives.

The example shows a part of a FastQ multiple sequence file, sequence lines highlighted.:

(Click image to see the file)
First hundred from ~350 Million reads in the original file.











Result matrix

Counting results are saved as a tab-delimited count matrix. The matrix is stored at regluar intervals during an anlysis run. Thus the scan may be Paused and the result matrix may be viewed and preliminaryly analyzed.

The TotCount row gives the absolute number od sequences for the respective barcode-pair / sample.
This allows to compare total amount of analyzed DNA and thus, draw conclusions about sample concentration/preparation.

To allow easy comparison of samples from different runs, count values for the seeds (genes) are normalized to 1 Million total reads (~RPFM).
As each seed should match only once on a gene, counts are not normalized to the target (gene) length.


To preview the result matrix file, click the View button right to the Result matrix field:

(Click image to see the file)

The example shows a part (first hundred from 65383 seeds, 6 from 15 samples) of a targeting screen with the Gecko A library.
Due to the experiments selection pressure only few target's guides are preserved in the analyzed cells. Thus most counts seed/sample-cells are 0.
Especially in the tiny partial result sample.