On cn45, the database used is not currently accesible from cn106 because of security reasons.
The conda environment is located at /vol/mbconda/slrinzema/SeqAd2
and can be activated by running the following command: conda activate /vol/mbconda/slrinzema/SeqAd2
Make sure conda is installed!
Demultiplex.py can only be run from CN45.
cd /ceph/rimlsfnwi/data/moldevbio/heeringen/slrinzema/projects/seqad/tools/demultiplex
Demultiplex.py's first argument is either AUTO (which is being periodically run on cn45) or a specific flowcell name. The latter is done with:
python demultiplex.py <FLOWCELL_NAME> <OPTIONAL FLAGS>
or:
python /home/slrinzema/ceph/projects/seqad_new/tools/demultiplex/demultiplex.py <FLOWCELL_NAME> <OPTIONAL FLAGS>
See all optional flags with:
python demultiplex.py -h
Since the normal fastq directory is write protected by a group, you could rerun the process by specifying a fastq root. The fastq directory will be made in this rootpath:
python3 demultiplex.py HHLJTBGXB --fastq-root /my/custom/directory/
Once a run is complete, check the multiqc for any weird samples. In the flowcell directory is a file called undetermined_bin.csv, which contains all barcodes found in the undetermined sorted by amount. Two of the columns are a list of barcodes that look like the sequence found in either the I7 or I5 side. Here you can check if there are any barcodes found that are similar to a barcode used. Here are some examples of things that can go wrong and how to solve them:
A wrong barcode is entered into the system. Maybe you thought your sample had the barcode A35, but the barcode A37 has been found 17 million times. This could be your barcode, change this in the system and rerun the demultiplexing!
Example: Sample 100 has been entered with barcode A10 and expects 20M reads, but 0.0M reads have been assigned. If you check the undetermined_bin.csv, the most frequently found barcode corresponds to A11 and has around 20M reads:
# indexes, amount, i7 matches, i5 matches
CTGTAGCCAT, 20724931, A11,
GGGGGGGGGG, 650639, ,
etc...
The primer remnant differs from the usual remnant. This usually happens when your sample has only a barcode on the I7 side, but the flowcells was run using both I7 and I5. Check if the barcode you used is in the undetermined_bin.csv, and check the I5 sequence that belongs to this barcode. To solve this, copy the SampleSheet.csv to a safe location (try your home directory) and change the I5 sequence to the sequence found in the undetermined_bin.csv. Re-run the demultiplexing with the --samplesheet
flag and supply the correct path to your new samplesheet.
Example: Sample 100 has entered with barcode A10, and is placed on a flowcell where both the I7 and I5 barcodes are read. After demultiplexing sample 100 had 0.0M reads, but the A10 barcode is found in the undetermined_bin.csv. If you check the second sequence (I5) and the samplesheet you'll see they differ:
SampleSheet.csv
Sample_ID,SampleName,index,index2
100_0,100_SAMPLE_A10,ACAAGCTA,AGATCTCG
etc...
undetermined_bin.csv
# indexes, amount, i7 matches, i5 matches
ACAAGCTA+CGAGATCT, 20724931, A10, ←check the sequence after the +
GGGGGGGG+GGGGGGGG, 650639, ,
etc...
Remake a samplesheet with the found primer remnant as the index2, and rerun with the --samplesheet flag:
Sample_ID,SampleName,index,index2
100_0,100_SAMPLE_A10,ACAAGCTA,CGAGATCT
etc...
*I have no idea why this happens, if you find out please let me know. I suspect it has to do with sample prep in the lab.
A rare case is flipped barcodes. For some reason the I7 barcode is in the I5 position, and the I5 barcode is in the I7 position. Both are reverse complemented in this case. Check this by reverse complementing the found sequences and comparing them to the used barcodes. If this is the case, copy the SampleSheet.csv to a safe location, change the sequences and re-run with the --samplesheet
flag supplying the correct path to your new samplesheet.
For assistance with demultiplexing (in case of vacation etc.) please send a message on slack, detailing which flowcell and what went wrong. Please copy or type out any info instead of sending screenshots. Make sure to push the message through if 'pause notifications' is on. In case of urgency ask for my personal number at the secretariat as well as sending a message on slack with details.