Demultiplex.py

On cn45, the database used is not currently accesible from cn106 because of security reasons.

Activating the environment

The conda environment is located at /vol/mbconda/slrinzema/SeqAd2 and can be activated by running the following command: conda activate /vol/mbconda/slrinzema/SeqAd2 Make sure conda is installed!

Running demultiplex.py

Demultiplex.py can only be run from CN45.

Demultiplexing into a custom directory

Since the normal fastq directory is write protected by a group, you could rerun the process by specifying a fastq root. The fastq directory will be made in this rootpath: python3 demultiplex.py HHLJTBGXB --fastq-root /my/custom/directory/

Troubleshooting faulty samples

Once a run is complete, check the multiqc for any weird samples. In the flowcell directory is a file called undetermined_bin.csv, which contains all barcodes found in the undetermined sorted by amount. Two of the columns are a list of barcodes that look like the sequence found in either the I7 or I5 side. Here you can check if there are any barcodes found that are similar to a barcode used. Here are some examples of things that can go wrong and how to solve them:

Wrong barcode

A wrong barcode is entered into the system. Maybe you thought your sample had the barcode A35, but the barcode A37 has been found 17 million times. This could be your barcode, change this in the system and rerun the demultiplexing!

Example: Sample 100 has been entered with barcode A10 and expects 20M reads, but 0.0M reads have been assigned. If you check the undetermined_bin.csv, the most frequently found barcode corresponds to A11 and has around 20M reads:

# indexes, amount, i7 matches, i5 matches
CTGTAGCCAT, 20724931, A11,
GGGGGGGGGG, 650639, ,
etc...

Changed primer remnant

The primer remnant differs from the usual remnant. This usually happens when your sample has only a barcode on the I7 side, but the flowcells was run using both I7 and I5. Check if the barcode you used is in the undetermined_bin.csv, and check the I5 sequence that belongs to this barcode. To solve this, copy the SampleSheet.csv to a safe location (try your home directory) and change the I5 sequence to the sequence found in the undetermined_bin.csv. Re-run the demultiplexing with the --samplesheet flag and supply the correct path to your new samplesheet.

Example: Sample 100 has entered with barcode A10, and is placed on a flowcell where both the I7 and I5 barcodes are read. After demultiplexing sample 100 had 0.0M reads, but the A10 barcode is found in the undetermined_bin.csv. If you check the second sequence (I5) and the samplesheet you'll see they differ:

SampleSheet.csv

Sample_ID,SampleName,index,index2
100_0,100_SAMPLE_A10,ACAAGCTA,AGATCTCG
etc...

undetermined_bin.csv

# indexes, amount, i7 matches, i5 matches
ACAAGCTA+CGAGATCT, 20724931, A10, ←check the sequence after the +
GGGGGGGG+GGGGGGGG, 650639, ,
etc...

Remake a samplesheet with the found primer remnant as the index2, and rerun with the --samplesheet flag:

Sample_ID,SampleName,index,index2
100_0,100_SAMPLE_A10,ACAAGCTA,CGAGATCT
etc...

*I have no idea why this happens, if you find out please let me know. I suspect it has to do with sample prep in the lab.

Flipped barcodes

A rare case is flipped barcodes. For some reason the I7 barcode is in the I5 position, and the I5 barcode is in the I7 position. Both are reverse complemented in this case. Check this by reverse complementing the found sequences and comparing them to the used barcodes. If this is the case, copy the SampleSheet.csv to a safe location, change the sequences and re-run with the --samplesheet flag supplying the correct path to your new samplesheet.

Contact

For assistance with demultiplexing (in case of vacation etc.) please send a message on slack, detailing which flowcell and what went wrong. Please copy or type out any info instead of sending screenshots. Make sure to push the message through if 'pause notifications' is on. In case of urgency ask for my personal number at the secretariat as well as sending a message on slack with details.