Toolbox

MultiQC Toolbox

Highlight Samples

Regex mode off

Rename Samples

Click here for bulk input.

Paste two columns of a tab-delimited table here (eg. from Excel).

First column should be the old name, second column the new name.

Regex mode off

Show / Hide Samples

Regex mode off

Export Plots

Images
Data

Aspect ratio

Plot scaling

Download the raw data used to create the plots in this report below:

Format:

Note that additional data was saved in multiqc_GRCh38_data when this report was generated.

Choose Plots

If you use plots from MultiQC in a publication or presentation, please cite:

MultiQC: Summarize analysis results for multiple tools and samples in a single report
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics (2016)
doi: 10.1093/bioinformatics/btw354
PMID: 27312411

Save Settings

You can save the toolbox settings for this report to the browser.

Load Settings

Choose a saved report profile from the dropdown box below:

Tool Citations

Please remember to cite the tools that you use in your analysis.

To help with this, you can download publication details of the tools mentioned in this report:

About MultiQC

This report was generated using MultiQC, version 1.14

You can see a YouTube video describing how to use MultiQC reports here: https://youtu.be/qPbIlO_KWN0

For more information about MultiQC, including other videos and extensive documentation, please visit http://multiqc.info

You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: https://github.com/ewels/MultiQC

MultiQC is published in Bioinformatics:

MultiQC: Summarize analysis results for multiple tools and samples in a single report
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics (2016)
doi: 10.1093/bioinformatics/btw354
PMID: 27312411

These samples were run by seq2science v0.9.8, a tool for easy preprocessing of NGS data.

Take a look at our docs for info about how to use this report to the fullest.

Workflow: chip-seq
Date: August 22, 2023
Project: Norm
Contact E-mail: yourmail@here.com

JavaScript Disabled

MultiQC reports use JavaScript for plots and toolbox functions. It looks like you have JavaScript disabled in your web browser. Please note that many of the report functions will not work as intended.

Report generated on 2023-08-22, 13:35 CEST based on data in:

/scratch/jngalang/Norm/results/qc/samtools_stats/bwa-mem2/GRCh38-43655_EZLysis_H3K18la.samtools-coordinate.samtools_stats.txt
/scratch/jngalang/Norm/results/macs2/GRCh38-43655_EZLysis_H3K18la_peaks.xls
/scratch/jngalang/Norm/results/qc/plotPCA/GRCh38.tsv
/scratch/jngalang/Norm/results/macs2/GRCh38-43654_EZLysis_H3K4me3_peaks.xls
/scratch/jngalang/Norm/results/qc/samtools_stats/final_bam/GRCh38-43655_EZLysis_H3K18la.samtools-coordinate.samtools_stats.txt
/scratch/jngalang/Norm/results/qc/plotCorrelation/GRCh38-deepTools_pearson_correlation_clustering_mqc.png
/scratch/jngalang/Norm/results/log/workflow_explanation_mqc.html
/scratch/jngalang/Norm/results/qc/plotFingerprint/GRCh38.tsv
/scratch/jngalang/Norm/results/qc/assembly_GRCh38_stats_mqc.html
/scratch/jngalang/Norm/results/qc/plotCorrelation/GRCh38-deepTools_spearman_correlation_clustering_mqc.png
/scratch/jngalang/Norm/results/qc/samtools_stats/bwa-mem2/GRCh38-43654_EZLysis_H3K4me3.samtools-coordinate.samtools_stats.txt
/scratch/jngalang/Norm/results/qc/InsertSizeMetrics/GRCh38-43655_EZLysis_H3K18la.tsv
/scratch/jngalang/Norm/results/bwa-mem2/GRCh38-43655_EZLysis_H3K18la.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/jngalang/Norm/results/qc/markdup/GRCh38-43654_EZLysis_H3K4me3.samtools-coordinate.metrics.txt
/scratch/jngalang/Norm/results/qc/macs2/GRCh38-43655_EZLysis_H3K18la_featureCounts.txt.summary
/scratch/jngalang/Norm/results/qc/samtools_stats/final_bam/GRCh38-43654_EZLysis_H3K4me3.samtools-coordinate.samtools_stats.txt
/scratch/jngalang/Norm/results/qc/trimming/43654_EZLysis_H3K4me3.fastp.json
/scratch/jngalang/Norm/results/qc/macs2/GRCh38-43654_EZLysis_H3K4me3_featureCounts.txt.summary
/scratch/jngalang/Norm/results/bwa-mem2/GRCh38-43654_EZLysis_H3K4me3.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/jngalang/Norm/results/qc/plotProfile_gene/GRCh38-macs2.tsv
/scratch/jngalang/Norm/results/qc/markdup/GRCh38-43655_EZLysis_H3K18la.samtools-coordinate.metrics.txt
/scratch/jngalang/Norm/results/qc/trimming/43655_EZLysis_H3K18la.fastp.json
/scratch/jngalang/Norm/results/qc/samplesconfig_mqc.html
/scratch/jngalang/Norm/results/qc/InsertSizeMetrics/GRCh38-43654_EZLysis_H3K4me3.tsv

Change sample names:

General Statistics

Showing ²/₂ rows and ¹⁷/₃₄ columns.

Sample Name	Fragment Length	% Duplication	% > Q30	Mb Q30 bases	M Reads After Filtering	GC content	% PF	% Adapter	Insert Size	Mean Insert Size	% Dups	Error rate	M Non-Primary	M Reads Mapped	% Mapped	% Proper Pairs	% MapQ 0 Reads	M Total seqs	Error rate	M Non-Primary	M Reads Mapped	% Mapped	% Proper Pairs	% MapQ 0 Reads	M Total seqs	% Assigned	M Assigned	MT genome coverage	Genome coverage	MT to Nuclear Ratio	M Genome reads	M MT genome reads	Number of Peaks	Treatment Redundancy
43654_EZLysis_H3K4me3	186	6.5%	93.0%	470.3	8.7	51.0%	98.0%	9.0%	170 bp	195 bp	8.1%	0.28%	0.0	6.5	74.6%	71.2%	5.1%	8.7	0.23%	0.0	5.5	100.0%	96.9%	0.0%	5.5	21.1%	0.6	43.6 X	0.1 X	354.5	6.5	0.0	15663	0.01
43655_EZLysis_H3K18la	177	7.5%	92.7%	484.1	9.1	48.2%	98.8%	14.2%	170 bp	187 bp	10.6%	0.26%	0.0	7.1	77.5%	71.9%	6.5%	9.1	0.22%	0.0	5.8	100.0%	96.1%	0.0%	5.8	0.0%	0.0	42.3 X	0.1 X	317.1	7.1	0.0	9	0.01

Uncheck the tick box to hide columns. Click and drag the handle on the left to change order.

Sort	Group	Column	Description	ID	Scale
\|\|	MACS2	Fragment Length	Fragment Length	`d`	None
\|\|	fastp	% Duplication	Duplication rate before filtering	`pct_duplication`	None
\|\|	fastp	% > Q30	Percentage of reads > Q30 after filtering	`after_filtering_q30_rate`	None
\|\|	fastp	Mb Q30 bases	Bases > Q30 after filtering (millions)	`after_filtering_q30_bases`	base_count
\|\|	fastp	M Reads After Filtering	Total reads after filtering (millions)	`filtering_result_passed_filter_reads`	read_count
\|\|	fastp	GC content	GC content after filtering	`after_filtering_gc_content`	None
\|\|	fastp	% PF	Percent reads passing filter	`pct_surviving`	None
\|\|	fastp	% Adapter	Percentage adapter-trimmed reads	`pct_adapter`	None
\|\|	Picard	Insert Size	Median Insert Size, all read orientations (bp)	`summed_median`	None
\|\|	Picard	Mean Insert Size	Mean Insert Size, all read orientations (bp)	`summed_mean`	None
\|\|	Picard	% Dups	Mark Duplicates - Percent Duplication	`PERCENT_DUPLICATION`	None
\|\|	SamTools pre-sieve	Error rate	Error rate: mismatches (NM) / bases mapped (CIGAR)	`error_rate`	None
\|\|	SamTools pre-sieve	M Non-Primary	Non-primary alignments (millions)	`non-primary_alignments`	read_count
\|\|	SamTools pre-sieve	M Reads Mapped	Reads Mapped in the bam file (millions)	`reads_mapped`	read_count
\|\|	SamTools pre-sieve	% Mapped	% Mapped Reads	`reads_mapped_percent`	None
\|\|	SamTools pre-sieve	% Proper Pairs	% Properly Paired Reads	`reads_properly_paired_percent`	None
\|\|	SamTools pre-sieve	% MapQ 0 Reads	% of Reads that are Ambiguously Placed (MapQ=0)	`reads_MQ0_percent`	None
\|\|	SamTools pre-sieve	M Total seqs	Total sequences in the bam file (millions)	`raw_total_sequences`	read_count
\|\|	SamTools post-sieve	Error rate	Error rate: mismatches (NM) / bases mapped (CIGAR)	`error_rate`	None
\|\|	SamTools post-sieve	M Non-Primary	Non-primary alignments (millions)	`non-primary_alignments`	read_count
\|\|	SamTools post-sieve	M Reads Mapped	Reads Mapped in the bam file (millions)	`reads_mapped`	read_count
\|\|	SamTools post-sieve	% Mapped	% Mapped Reads	`reads_mapped_percent`	None
\|\|	SamTools post-sieve	% Proper Pairs	% Properly Paired Reads	`reads_properly_paired_percent`	None
\|\|	SamTools post-sieve	% MapQ 0 Reads	% of Reads that are Ambiguously Placed (MapQ=0)	`reads_MQ0_percent`	None
\|\|	SamTools post-sieve	M Total seqs	Total sequences in the bam file (millions)	`raw_total_sequences`	read_count
\|\|	macs2_frips	% Assigned	% Assigned reads	`percent_assigned`	None
\|\|	macs2_frips	M Assigned	Assigned reads (millions)	`Assigned`	read_count
\|\|	mtnucratio	MT genome coverage	Average coverage (X) on mitochondrial genome.	`mt_cov_avg`	None
\|\|	mtnucratio	Genome coverage	Average coverage (X) on nuclear genome.	`nuc_cov_avg`	None
\|\|	mtnucratio	MT to Nuclear Ratio	Mitochondrial to nuclear reads ratio (MTNUC)	`mt_nuc_ratio`	None
\|\|	mtnucratio	M Genome reads	Reads on the nuclear genome (millions)	`nucreads`	read_count
\|\|	mtnucratio	M MT genome reads	Reads on the mitochondrial genome (millions)	`mtreads`	read_count
\|\|	MACS2	Number of Peaks	Total number of peaks	`peak_count`	None
\|\|	MACS2	Treatment Redundancy	Redundant rate in treatment	`treatment_redundant_rate`	None

Workflow explanation

Preprocessing of reads was done automatically by seq2science v0.9.8 using the chip-seq workflow. Paired-end reads were trimmed with fastp v0.20.1 with default options. Genome assembly GRCh38 was downloaded with genomepy 0.13.0. The effective genome size was estimated per sample by khmer v2.0 by calculating the number of unique kmers with k being the average read length. Reads were aligned with bwa-mem2 v2.2.1 with options '-M'. Afterwards, duplicate reads were marked with Picard MarkDuplicates v2.23.8. General alignment statistics were collected by samtools stats v1.14. Peaks were called with macs2 v2.2.7 with options '--buffer-size 10000' in BAMPE mode. The effective genome size was estimated by taking the number of unique kmers in the assembly of the same length as the average read length for each sample. The fraction reads in peak score (frips) was calculated by featurecounts v1.6.4. A consensus set of summits was made with gimmemotifs.combine_peaks v0.17.2. Deeptools v3.5.1 was used for the fingerprint, profile, correlation and dendrogram/heatmap plots, where the heatmap was made with options '--distanceBetweenBins 9000 --binSize 1000'. A peak feature distribution plot and peak localization plot relative to TSS were made with chipseeker. The UCSC genome browser was used to visualize and inspect alignment. All summits were extended with 100 bp to get a consensus peakset. Finally, a count table from the consensus peakset was made with gimmemotifs.coverage_table. Quality control metrics were aggregated by MultiQC v1.14.

Assembly stats

Genome assembly GRCh38 contains of 194 contigs, with a GC-content of 40.87%, and 4.88% consists of the letter N. The N50-L50 stats are 145138636-9 and the N75-L75 stats are 114364328-14. The genome annotation contains 61267 genes.

fastp

fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...).DOI: 10.1093/bioinformatics/bty560.

Filtered Reads

Filtering statistics of sampled reads.

Duplication Rates

Duplication rates of sampled reads.

Insert Sizes

Insert size estimation of sampled reads.

Sequence Quality

Average sequencing quality over each base of all reads.

GC Content

Average GC content over each base of all reads.

N content

Average N content over each base of all reads.

Picard

Picard is a set of Java command line tools for manipulating high-throughput sequencing data.

Insert Size

Plot shows the number of reads at a given insert size. Reads with different orientations are summed.

Mark Duplicates

Number of reads, categorised by duplication state. Pair counts are doubled - see help text for details.

The table in the Picard metrics file contains some columns referring read pairs and some referring to single reads.

To make the numbers in this plot sum correctly, values referring to pairs are doubled according to the scheme below:

READS_IN_DUPLICATE_PAIRS = 2 * READ_PAIR_DUPLICATES
READS_IN_UNIQUE_PAIRS = 2 * (READ_PAIRS_EXAMINED - READ_PAIR_DUPLICATES)
READS_IN_UNIQUE_UNPAIRED = UNPAIRED_READS_EXAMINED - UNPAIRED_READ_DUPLICATES
READS_IN_DUPLICATE_PAIRS_OPTICAL = 2 * READ_PAIR_OPTICAL_DUPLICATES
READS_IN_DUPLICATE_PAIRS_NONOPTICAL = READS_IN_DUPLICATE_PAIRS - READS_IN_DUPLICATE_PAIRS_OPTICAL
READS_IN_DUPLICATE_UNPAIRED = UNPAIRED_READ_DUPLICATES
READS_UNMAPPED = UNMAPPED_READS

SamTools pre-sieve

Samtools is a suite of programs for interacting with high-throughput sequencing data.DOI: 10.1093/bioinformatics/btp352.

The pre-sieve statistics are quality metrics measured before applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, read length filtering, and tn5 shift.

Percent Mapped

Alignment metrics from samtools stats; mapped vs. unmapped reads.

For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.

Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).

Alignment metrics

This module parses the output from samtools stats. All numbers in millions.

SamTools post-sieve

Samtools is a suite of programs for interacting with high-throughput sequencing data.DOI: 10.1093/bioinformatics/btp352.

The post-sieve statistics are quality metrics measured after applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, and tn5 shift.

Percent Mapped

Alignment metrics from samtools stats; mapped vs. unmapped reads.

Alignment metrics

This module parses the output from samtools stats. All numbers in millions.

deepTools

deepTools is a suite of tools to process and analyze deep sequencing data.DOI: 10.1093/nar/gkw257.

PCA plot

PCA plot with the top two principal components calculated based on genome-wide distribution of sequence reads

Fingerprint plot

Signal fingerprint according to plotFingerprint

Read Distribution Profile after Annotation

Accumulated view of the distribution of sequence reads related to the closest annotated gene. All annotated genes have been normalized to the same size.

Green: -3.0Kb upstream of gene to TSS
Yellow: TSS to TES
Pink: TES to 3.0Kb downstream of gene

macs2_frips

Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.DOI: 10.1093/bioinformatics/btt656.

deepTools - Spearman correlation heatmap of reads in bins across the genome

Spearman correlation plot generated by deeptools. Spearman correlation is a non-parametric (distribution-free) method, and assesses the monotonicity of the relationship.

deepTools - Pearson correlation heatmap of reads in bins across the genome

Pearson correlation plot generated by deeptools. Pearson correlation is a parametric (lots of assumptions, e.g. normality and homoscedasticity) method, and assesses the linearity of the relationship.

Samples & Config

The samples file used for this run:

sample	assembly	control
43654_EZLysis_H3K4me3	GRCh38	43656_EZLysis_IgG
43655_EZLysis_H3K18la	GRCh38	43656_EZLysis_IgG

The config file used for this run:

# tab-separated file of the samples
samples: samples.tsv

# pipeline file locations
result_dir: ./results  # where to store results
genome_dir: ./genomes  # where to look for or download the genomes
fastq_dir: ./results/fastq  # where to look for or download the fastqs


# contact info for multiqc report and trackhub
email: yourmail@here.com

# produce a UCSC trackhub?
create_trackhub: true

# how to handle replicates
biological_replicates: fisher  # change to "keep" to not combine them
technical_replicates: merge    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which aligner to use
aligner: bwa-mem2

# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true
remove_dups: true

# peak caller
peak_caller:
  macs2:
      --buffer-size 10000
      --broad
#  genrich:
#      -y -q 0.05

# how much peak summits will be extended by (on each side) for the final count table
# (e.g. 100 means a 200 bp wide peak)
slop: 100

# whether or not to run gimme maelstrom to infer differential motifs
run_gimme_maelstrom: false

# differential peak analysis
# for explanation, see: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html
#contrasts:
#  - 'descriptive_name_all_HEL'

Toggle navigation v1.14

MultiQC Toolbox

Apply Highlight Samples

Apply Rename Samples

Apply Show / Hide Samples

Export Plots

Choose Plots

Save Settings

Load Settings

Tool Citations

About MultiQC

These samples were run by seq2science v0.9.8, a tool for easy preprocessing of NGS data.

General Statistics

General Statistics: Columns

Workflow explanation

Assembly stats

fastp

Filtered Reads

Duplication Rates

Insert Sizes

Sequence Quality

GC Content

N content

Picard

Insert Size

Mark Duplicates Help

SamTools pre-sieve

Percent Mapped Help

Alignment metrics

SamTools post-sieve

Percent Mapped Help

Alignment metrics

deepTools

PCA plot

Fingerprint plot

Read Distribution Profile after Annotation

macs2_frips

deepTools - Spearman correlation heatmap of reads in bins across the genome

deepTools - Pearson correlation heatmap of reads in bins across the genome

Samples & Config

v1.14

Highlight Samples

Rename Samples

Show / Hide Samples

Mark Duplicates

Percent Mapped

Percent Mapped