A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
/bank/experiments/2025-03-lore/qc/general_stats.tsv/bank/experiments/2025-03-lore/qc/reads_per_step.tsv/bank/experiments/2025-03-lore/qc/skera.tsv/bank/experiments/2025-03-lore/qc/lima.tsv/bank/experiments/2025-03-lore/qc/isoseq_correct_barcodes.tsv/bank/experiments/2025-03-lore/qc/isoseq_correct.tsv/bank/experiments/2025-03-lore/qc/isoseq_bcstats.tsv/bank/experiments/2025-03-lore/qc/isoseq_collapse.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications_by_read.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_rt_switching.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_genes.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_filter_reasons.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications_by_cell.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications_by_transcript.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications_by_isoform.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_classifications_by_mapping.tsv/bank/experiments/2025-03-lore/qc/pigeon_classify_junctions.tsv/bank/experiments/2025-03-lore/qc/pigeon_report.tsv/bank/experiments/2025-03-lore/qc/segmented.knee_mqc.png/bank/experiments/2025-03-lore/pbmm2/segmented.stats.tsv/bank/experiments/2025-03-lore/pbmm2/segmented.coverage.tsv/bank/experiments/2025-03-lore/pbmm2/segmented.bam.mtnucratiomtnuc.json/bank/experiments/2025-03-lore/qc
General Statistics
| Sample Name | Reads in cells | % Reads in cells | MT genome coverage | Genome coverage | MT to Nuclear Ratio | M Genome reads | M MT genome reads | Error rate | Non-primary | Reads mapped | % Mapped | % Proper pairs | % MapQ 0 reads | Total seqs | Mean insert | Reads | Bases | Coverage | Mean depth | Mean BQ | Mean MQ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| segmented | 14.6M | 16.0% | 11896.8X | 2.9X | 4168.5 | 9.2M | 0.2M | 0.37% | 0.0M | 9.3M | 99.8% | 0.0% | 8.8% | 9.3M | 0.0bp | 9.4M | 461.2Mb | 14.9% | 2.9x | 38.8 | 55.3 |
Reads per processing step
The number of reads remaining after each processing step. The final reads are deduplicated and aligned to the genome/transcriptome.
Legend:
ccs_reads: Circular consensus sequencing (HiFi) reads (Skera)
s_reads: Segmented reads (Skera)
primed_reads: Primed reads (Lima)
fl_reads: Full Length reads (Iso-seq correct)
flnc_reads: Full Length Non-Chimeric reads (Iso-seq correct)
polya_reads: PolyA-tailed reads (Iso-seq correct)
non-missing: Barcoded reads (Iso-seq correct)
yield_reads: Estimated reads in cells (Iso-seq correct).
Skera
Deconcatenates Kinnex HiFi reads to produce S-reads that represent the original cDNA molecules.
| sample | reads | s_reads | mean_len_s_reads | percent_full_array | mean_array_size |
|---|---|---|---|---|---|
| segmented | 6104086.0 | 91323074.0 | 981.0 | 87.5 | 15.0 |
Lima
Removes and spurious false positives.
| sample | Reads input | Reads above all thresholds (A) | Reads below any threshold (B) | (B) Below min length | (B) Below min score | (B) Below min end score | (B) Below min passes | (B) Below min score lead | (B) Below min ref span | (B) Without SMRTbell adapter | (B) Wrong different pair | (B) Undesired 5p--5p pairs | (B) Undesired 3p--3p pairs | (A) With same pair | (A) With different pair |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| segmented | 91323074 | 90692203 | 630871 | 12350 | 0 | 228466 | 6 | 0 | 490179 | 0 | 0 | 192404 | 231319 | 0 | 90692203 |
Iso-seq correct
Identifies cell barcode errors and corrects them. Additionally, Iso-seq correct estimates which reads are likely to originate from real cells vs ambrient RNA.
Stats
| sample | processed_reads | filtered_reads | processed_bases | edit_read_count | unchanged_read_count | missing_read_count | found_but_failing_read_count | failing_read_count | yield_count | yield_fraction |
|---|---|---|---|---|---|---|---|---|---|---|
| segmented | 89851332.0 | 0.0 | 76851365691.0 | 89358088.0 | 3050.0 | 490194.0 | 74756392.0 | 75246586.0 | 14604746.0 | 0.2 |
Barcode corrections
Barcode stats
| sample | number_unique_groupbarcodes | number_unique_molbarcodes | total_number_reads | mean_groupbarcode_depth | cutoff_threshold | number_of_cells | median_umis_per_cell | reads_in_cells | fraction_reads_in_cells | mean_reads_per_cell |
|---|---|---|---|---|---|---|---|---|---|---|
| segmented | 1303203.0 | 67371906.0 | 89851332.0 | 68.0 | 1318.0 | 13791.0 | 2505.0 | 57425785.0 | 0.6 | 4164.0 |
Knee plots
Samtools
1.21
Toolkit for interacting with BAM/CRAM files.URL: http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352
Percent mapped
Alignment metrics from samtools stats; mapped vs. unmapped reads vs. reads mapped with MQ0.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple locations in the reference sequence. This can be due to repetitive regions in the genome, the presence of alternative contigs in the reference, or due to reads that are too short to be uniquely mapped. These reads are often filtered out in downstream analyses.
Alignment stats
This module parses the output from samtools stats. All numbers in millions.
Coverage: global stats
Stats parsed from samtools coverage output, and summarized (added up or weighted-averaged) across all regions.
| Sample Name | Reads | Bases | Coverage | Mean depth | Mean BQ | Mean MQ |
|---|---|---|---|---|---|---|
| segmented | 9.4M | 461.2Mb | 14.9% | 2.9x | 38.8 | 55.3 |
Coverage: stats per region
Per-region stats parsed from samtools coverage output.
Iso-seq collapse
Collapses redundant transcripts into unique isoforms based on exonic structures.
| sample | number_mapped_unique_isoforms | number_of_mapped_unique_loci |
|---|---|---|
| segmented | 980929 | 84131 |
Pigeon Classify
Pigeon is used to classify isoforms into categories, filter this output, and to report on the gene and isoform- level saturation.
Classifications overview
| Sample | Input | Passed | Removed | Unique genes | Unique transcripts |
|---|---|---|---|---|---|
| segmented | 980929 | None | None | 478784 | 833527 |
| segmented (filtered) | 980929 | 197700 | 783229 | 21757 | 133186 |
Classifications by read
| Sample | Full splice match | Incomplete splice match | Novel in catalog | Novel not in catalog | Antisense | Genic intron | Genic genomic | Intergenic | Other |
|---|---|---|---|---|---|---|---|---|---|
| segmented | 4043350 | 1824365 | 121922 | 267929 | 330231 | 1505 | 464837 | 1797957 | 19680 |
| segmented (filtered) | 3795951 | 571401 | 91983 | 103494 | 1409 | 0 | 2165 | 2927 | 6392 |
Classifications by isoform
| Sample | Full splice match | Incomplete splice match | Novel in catalog | Novel not in catalog | Antisense | Genic intron | Genic genomic | Intergenic | Other |
|---|---|---|---|---|---|---|---|---|---|
| segmented | 64158 | 154179 | 41888 | 125335 | 35853 | 423 | 116562 | 438211 | 4320 |
| segmented (filtered) | 50512 | 67228 | 30711 | 43780 | 912 | 0 | 887 | 1927 | 1743 |
Classifications by cell
| Sample | median_genes_per_cell | median_transcripts_per_cell | median_genes_per_cell_known | median_transcripts_per_cell_known |
|---|---|---|---|---|
| segmented | 4.0 | 4.0 | 3.0 | 2.0 |
| segmented (filtered) | 4.0 | 4.0 | 4.0 | 4.0 |
Classifications by transcript
| Sample | transcripts_fsm | transcripts_ism | transcripts_nic | transcripts_nnc |
|---|---|---|---|---|
| segmented | 64158.0 | 154179.0 | 41888.0 | 125335.0 |
| segmented (filtered) | 50512.0 | 67228.0 | 30711.0 | 43780.0 |
Classifications by mapping
| Sample | flnc_mapped_genome | flnc_mapped_transcriptome | flnc_mapped_transcriptome_excluding_ribomito |
|---|---|---|---|
| segmented | 8871776.0 | 6257566.0 | 4586174.0 |
| segmented (filtered) | 4575722.0 | 4562829.0 | 2965486.0 |
Junctions
| Sample | Known canonical | Known non-canonical | Novel canonical | Novel non-canonical |
|---|---|---|---|---|
| segmented | 1330326 | 1460 | 122771 | 104366 |
| segmented (filtered) | 830116 | 45 | 72129 | 0 |
Genes
| Sample | Known | Novel |
|---|---|---|
| segmented | 27198 | 451586 |
| segmented (filtered) | 19371 | 2386 |
RT Switching
| Sample | All transcripts | Unique transcripts | All junctions | Unique junctions |
|---|---|---|---|---|
| segmented | 24780 | 1283 | 25606 | 25606 |
| segmented (filtered) | None | None | None | None |
Filter reasons
| Sample | Intrapriming | Monoexonic | RT switching | Low coverage/non-canonical |
|---|---|---|---|---|
| segmented | None | None | None | None |
| segmented (filtered) | 431098 | 0 | 13045 | 339086 |
Saturation reports
Gene and isoform- level saturation. The tables show the number of unique genes found in a subsampled number of reads.
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Software | Version |
|---|---|
| Samtools | 1.21 |
| mtnucratio | 0.7.1 |