Distribution of undetermined indices in Illumina Hiseq experiments

Posted: March 23rd, 2015 | Author: | Filed under: Sequencing | Comments Off on Distribution of undetermined indices in Illumina Hiseq experiments

It is often that one wants to look at distribution of undetermined indices in Illumina Hiseq experiments to spot problems with experiment’s sample sheet or with the library itself. These indices are stored in fastq file(s) in the undetermined_indices folder. Below is the Bash script that processes these fastq files (fastq files can be in gzip’ed and/or uncompressed forms), and prints distribution of indices to stdout in a form of comma-separated values. If the script is named “index_stats”, one calls it from the undetermined_indices folder as:

   index_stats > lane2_undetermined.csv

Latest versions of GNU sort program support parallel sorting. The script as presented below, runs GNU sort with 8 threads. It also utilizes parallel unpigz with 8 cores/processors to uncompress fastq.gz files, if unpigz is present. Otherwise, it uses single core gunzip command.
Read the rest of this entry »