1 Set-up of analysis

The data was loaded into R and filtered according to the following criteria:

This data still includes all treatments for comparing the following things:

  1. Batch correction and clustering without mito.genes
Assumption: I am interested in gene describing keratinocyte biology (eventhough this might also be true for mito.genes)
  1. Batch correction and clustering with regressed out mito.genes
Assumption: I am interested in gene describing keratinocyte biology (eventhough this might also be true for mito.genes)
  1. Batch correction with mito-genes, clustering without mito.genes
During the correction of (technical) batch effects, I want to include all the measured genes, because otherwise integration anchors are based on 

2 Libraries

require("devtools")
library(ggplot2)
library(gplots)
library(SingleCellExperiment)
library(scater)
library(dplyr)
library(tidyr)
library(mvoutlier)
#library(limma)
library(knitr)
library(Seurat)
#library(scran)
library(RColorBrewer)
library(plot3D)
library(Matrix)
library(tidyr)
library(dplyr)
library(platetools)
library(ggpubr)
# options(stringsAsFactors = FALSE)
source("/scratch/etanis/scripts/read_kb_data.R")

experiment_nr <- "ST-99"

3 Loading the data

Here I load the data that was prepared on 24/11/21. This object also contains all mitochondrial genes, so I will make a seperate seurat in which they are removed.

Seurat.filtered <- readRDS("/scratch/etanis/ST-99/211124_markers_mit_options/211124_filtered_seurat_object.rds")

RNA.counts <- Seurat.filtered@assays$RNA@counts
RNA.counts[1:2,1:2]
## 2 x 2 sparse Matrix of class "dgCMatrix"
##         1636_O18 1636_N24
## DDX11L1        .        .
## WASH7P         .        .
to.remove.mit <- grep(pattern= "MT-", x= rownames(RNA.counts), value= TRUE)
RNA.counts.nomit <- RNA.counts[!rownames(RNA.counts) %in% to.remove.mit,]

dim(RNA.counts)
## [1] 60683  3367
dim(RNA.counts.nomit)
## [1] 60645  3367
meta.nomit <- Seurat.filtered@meta.data[,-c(2,3,14,15)]

Seurat.filtered.nomit <- CreateSeuratObject(counts= RNA.counts.nomit, meta.data = meta.nomit)

3.0.1 QC plots filtered data


The data was filtered according to the following filtering criteria:

Seurat.filtered <- subset(Seurat.matched, subset= nCount_RNA > 300 & nCount_RNA < 7500 & nFeature_RNA >200 & nFeature_RNA <4000 & percent.mito > -Inf & percent.mito < 0.4 & nFeature_ADT > 60 & nCount_ADT <40000)

TMM filtering: <0.38

3.0.1.1 RNA QC plots

VlnPlot(object = Seurat.filtered, features = c("nFeature_RNA"), ncol = 1) + labs(title = "Number of genes per cell - RNA") + xlab("Plate number") +ylab("Number of genes per cell") + NoLegend()

VlnPlot(object = Seurat.filtered, features = c("nCount_RNA"), ncol = 1) + labs(title = "Number of UMIs per cell - RNA") + xlab("Plate number")+ylab("Number of UMIs per cell") + NoLegend()

3.0.1.2 ADT QC Plots

DefaultAssay(Seurat.filtered) <- "TMMset"
meta.tmm <- Seurat.filtered@meta.data

VlnPlot(object = Seurat.filtered, features = c("nCount_ADT"), ncol = 1) + labs(title = "Number of counts per cell - ADT") + xlab("Plate number") + NoLegend()

VlnPlot(object = Seurat.filtered, features = c("nFeature_ADT"), ncol = 1) + labs(title = "Number of antibodies per cell - ADT") + xlab("Plate number") +ylim(0,75) + NoLegend()
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

ggplot(meta.tmm, aes(x=nCount_TMMset)) + geom_histogram(bins = 150, fill = "white", color = "black") + xlim(0,1) + labs(title = "TMM normalised counts - batch correction based on set", y= "TMM normalised counts", x="Number of cells") + theme_light()
## Warning: Removed 2 rows containing missing values (geom_bar).


4 Batch correction with and without mito.genes

Here I will compare the different types of batch correction that I can apply. I will compare no batch correction and batch correction based on set.


4.0.1 No batch correction

Seurat.nobatch <- SCTransform(Seurat.filtered, vars.to.regress = c("percent.mito"))
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 22275 by 3367
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 3367 cells
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## Found 73 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 22275 genes
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |======================================================================| 100%
## Computing corrected count matrix for 22275 genes
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |======================================================================| 100%
## Calculating gene attributes
## Wall clock passed: Time difference of 47.7971 secs
## Determine variable features
## Set 3000 variable features
## Place corrected count matrix in counts slot
## Regressing out percent.mito
## Centering data matrix
## Set default assay to SCT
DefaultAssay(Seurat.nobatch) <- "SCT"
# These are now standard steps in the Seurat workflow for visualization and clustering
Seurat.nobatch <- RunPCA(Seurat.nobatch, verbose = FALSE)
Seurat.nobatch <- RunUMAP(Seurat.nobatch, dims = 1:15, verbose = FALSE)

Seurat.nobatch <- FindNeighbors(Seurat.nobatch, dims = 1:15, verbose = FALSE)
Seurat.nobatch <- FindClusters(Seurat.nobatch, verbose = FALSE, resolution = 0.3)

# check clustering stability
library(clustree)
## Loading required package: ggraph
#clustree(Seurat.nobatch, prefix="integrated_snn_res.")
#clustree(Seurat.nobatch, prefix="SCT_snn_res.0.8")

DimPlot(Seurat.nobatch, label = TRUE) + NoLegend()

DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)

DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "orig.ident")

DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")

DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "processed.set", split.by= "orig.ident")

DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, split.by= "treatments")

Idents(Seurat.nobatch) <- "processed.set"
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)

Idents(Seurat.nobatch) <- "treatments"
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)

filename_date <- paste(Sys.Date(), experiment_nr,"umap_not_integrated_RNA_nomit_res0.3.pdf", sep = "_")
pdf(filename_date)
DimPlot(Seurat.nobatch, label = TRUE) + NoLegend()
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "orig.ident")
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, group.by= "processed.set", split.by= "orig.ident")
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE, split.by= "treatments")
Idents(Seurat.nobatch) <- "processed.set"
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)
Idents(Seurat.nobatch) <- "treatments"
DimPlot(Seurat.nobatch, reduction= "umap", label= TRUE)

dev.off()
## png 
##   2
OBSERVATION

From the no batch correction analysis, it is clear that there is an effect of the treatments, and that the replicate plates nicely group together. This is variation we want to keep. What is interesting, is that the first plate of each treatment (belonging to a processed set) does deviate in all treatment conditions. This might mean there is a processed set effect, which we want to remove.

4.0.2 Batch correction based on processed.set

As mentioned before, I compare different ways of incorporating or removing the mito.genes to assess the impact on the

4.0.2.1 No mito.genes

# batch correction with Seurat Integration & SCTrasnform normalization
## adjust the k.filter to the lowest plate cell number 
Seurat.split.set <- SplitObject(Seurat.filtered.nomit, split.by = "processed.set")
for (i in 1:length(Seurat.split.set)) {
  Seurat.split.set[[i]] <-SCTransform(Seurat.split.set[[i]], assay= "RNA", verbose = FALSE)
}
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 17064 by 1003
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1003 cells
## Found 69 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 17064 genes
## Computing corrected count matrix for 17064 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 13.42989 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18144 by 1199
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1199 cells
## Found 95 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18144 genes
## Computing corrected count matrix for 18144 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 14.87813 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18223 by 1165
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1165 cells
## Found 94 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18223 genes
## Computing corrected count matrix for 18223 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 14.28041 secs
Seurat.split.set.features <- SelectIntegrationFeatures(object.list = Seurat.split.set, nfeatures = 3000)

options(future.globals.maxSize = 950 *1024^2) #to make sure I have enough RAM for the following 
Seurat.split.set <- PrepSCTIntegration(object.list = Seurat.split.set, anchor.features = Seurat.split.set.features, verbose = FALSE)
## Calculating residuals of type pearson for 1286 genes
## Calculating residuals of type pearson for 1365 genes
## Calculating residuals of type pearson for 1406 genes
#Because in one plate there are less than 200 cells (default), this command would not run without specifying a lower number for k 

Seurat.split.set.anchors <- FindIntegrationAnchors(object.list = Seurat.split.set, normalization.method = "SCT", 
                                                  anchor.features = Seurat.split.set.features, verbose = FALSE, k.filter = 1130)

Seurat.split.set.integrated <- IntegrateData(anchorset = Seurat.split.set.anchors, normalization.method = "SCT", 
                                            verbose = FALSE)

# PCA of batch corrected edata
Seurat.split.set.integrated <- RunPCA(Seurat.split.set.integrated, verbose = FALSE)
PCAPlot(Seurat.split.set.integrated, group.by="orig.ident")

ElbowPlot(Seurat.split.set.integrated, ndims= 50)

# Clustering (use ndims based on Elbowplot results)
Seurat.split.set.integrated <- FindNeighbors(Seurat.split.set.integrated, assay= "integrated", dims= 1:15)
## Computing nearest neighbor graph
## Computing SNN
resolution <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
for (i in resolution){
  Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integrated", resolution= i)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9130
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8723
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8527
## Number of communities: 9
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8337
## Number of communities: 9
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8191
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8051
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7930
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7812
## Number of communities: 13
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7703
## Number of communities: 13
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7608
## Number of communities: 14
## Elapsed time: 0 seconds
# check clustering stability
library(clustree)
clustree(Seurat.split.set.integrated, prefix="integrated_snn_res.")

# UMAP
# choose an appropriate resolution based on clustree results
Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integrated", resolution= 0.3)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 131137
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8527
## Number of communities: 9
## Elapsed time: 0 seconds
Seurat.split.set.integrated <- RunUMAP(Seurat.split.set.integrated, assay= "integrated", dims= 1:15)
## 10:36:53 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:36:53 Read 3367 rows and found 15 numeric columns
## 10:36:53 Using Annoy for neighbor search, n_neighbors = 30
## 10:36:53 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:36:53 Writing NN index file to temp file /tmp/RtmpVDqTWF/file8305e1357054a
## 10:36:53 Searching Annoy index using 1 thread, search_k = 3000
## 10:36:54 Annoy recall = 100%
## 10:36:55 Commencing smooth kNN distance calibration using 1 thread
## 10:36:57 Initializing from normalized Laplacian + noise
## 10:36:57 Commencing optimization for 500 epochs, with 138354 positive edges
## 10:37:01 Optimization finished
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")

Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

filename_date <- paste(Sys.Date(), experiment_nr,"umap_integrated_based_on_set_RNA_res0.3_nomit.pdf", sep = "_")
pdf(filename_date)
par(mfrow=c(2,1))
clustree(Seurat.split.set.integrated, prefix="integrated_snn_res.")
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
dev.off()
## png 
##   2
Seurat.split.set.integrated.nomit <- Seurat.split.set.integrated

4.0.2.2 Mitogenes regressed out during normalisation

# batch correction with Seurat Integration & SCTrasnform normalization
## adjust the k.filter to the lowest plate cell number 
Seurat.split.set <- SplitObject(Seurat.filtered, split.by = "processed.set")
for (i in 1:length(Seurat.split.set)) {
  Seurat.split.set[[i]] <-SCTransform(Seurat.split.set[[i]], assay= "RNA",vars.to.regress = c("percent.mito"), verbose = FALSE)
}
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 17093 by 1003
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1003 cells
## Found 77 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 17093 genes
## Computing corrected count matrix for 17093 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 13.42324 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18178 by 1199
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1199 cells
## Found 68 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18178 genes
## Computing corrected count matrix for 18178 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 15.76203 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18252 by 1165
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1165 cells
## Found 75 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18252 genes
## Computing corrected count matrix for 18252 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 14.77998 secs
Seurat.split.set.features <- SelectIntegrationFeatures(object.list = Seurat.split.set, nfeatures = 3000)

options(future.globals.maxSize = 950 *1024^2) #to make sure I have enough RAM for the following 
Seurat.split.set <- PrepSCTIntegration(object.list = Seurat.split.set, anchor.features = Seurat.split.set.features, 
                                      verbose = FALSE)
## Calculating residuals of type pearson for 1263 genes
## Calculating residuals of type pearson for 1351 genes
## Calculating residuals of type pearson for 1506 genes
#Because in one plate there are less than 200 cells (default), this command would not run without specifying a lower number for k 

Seurat.split.set.anchors <- FindIntegrationAnchors(object.list = Seurat.split.set, normalization.method = "SCT", 
                                                  anchor.features = Seurat.split.set.features, verbose = FALSE, k.filter = 1130)

Seurat.split.set.integrated <- IntegrateData(anchorset = Seurat.split.set.anchors, normalization.method = "SCT", 
                                            verbose = FALSE)

# PCA of batch corrected edata
Seurat.split.set.integrated <- RunPCA(Seurat.split.set.integrated, verbose = FALSE)
PCAPlot(Seurat.split.set.integrated, group.by="orig.ident")

ElbowPlot(Seurat.split.set.integrated, ndims= 50)

# Clustering (use ndims based on Elbowplot results)
Seurat.split.set.integrated <- FindNeighbors(Seurat.split.set.integrated, assay= "integrated", dims= 1:15)
## Computing nearest neighbor graph
## Computing SNN
resolution <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
for (i in resolution){
  Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integrated", resolution= i)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9139
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8805
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8580
## Number of communities: 9
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8388
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8244
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8112
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7984
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7853
## Number of communities: 13
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7740
## Number of communities: 14
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7634
## Number of communities: 14
## Elapsed time: 0 seconds
# check clustering stability
library(clustree)
clustree(Seurat.split.set.integrated, prefix="integrated_snn_res.")

# UMAP
# choose an appropriate resolution based on clustree results
Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integrated", resolution= 0.3)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8580
## Number of communities: 9
## Elapsed time: 0 seconds
Seurat.split.set.integrated <- RunUMAP(Seurat.split.set.integrated, assay= "integrated", dims= 1:15)
## 10:38:55 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:38:55 Read 3367 rows and found 15 numeric columns
## 10:38:55 Using Annoy for neighbor search, n_neighbors = 30
## 10:38:55 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:38:56 Writing NN index file to temp file /tmp/RtmpVDqTWF/file8305e3a931f4c
## 10:38:56 Searching Annoy index using 1 thread, search_k = 3000
## 10:38:57 Annoy recall = 100%
## 10:38:57 Commencing smooth kNN distance calibration using 1 thread
## 10:38:58 Initializing from normalized Laplacian + noise
## 10:38:58 Commencing optimization for 500 epochs, with 138828 positive edges
## 10:39:03 Optimization finished
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")

Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

filename_date <- paste(Sys.Date(), experiment_nr,"umap_integrated_based_on_set_RNA_res0.3_nomit.pdf", sep = "_")
pdf(filename_date)
par(mfrow=c(2,1))
clustree(Seurat.split.set.integrated, prefix="integrated_snn_res.")
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
dev.off()
## png 
##   2
Seurat.split.set.integrated.mitreg <- Seurat.split.set.integrated

4.0.2.3 Mito.genes present during integration, removed before clustering

I will first integrate based on the whole dataset to also include the data on the mitochondrial genes (which are mostly highly expressed and could influence the sampling and thus ditribution of other genes during the library prep and sequencing)

# batch correction with Seurat Integration & SCTransform normalization
## adjust the k.filter to the lowest plate cell number 
Seurat.split.set <- SplitObject(Seurat.filtered, split.by = "processed.set")
for (i in 1:length(Seurat.split.set)) {
  Seurat.split.set[[i]] <-SCTransform(Seurat.split.set[[i]], assay= "RNA", verbose = FALSE)
}
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 17093 by 1003
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1003 cells
## Found 77 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 17093 genes
## Computing corrected count matrix for 17093 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 13.09656 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18178 by 1199
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1199 cells
## Found 68 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18178 genes
## Computing corrected count matrix for 18178 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 16.42447 secs
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 18252 by 1165
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 1165 cells
## Found 75 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 18252 genes
## Computing corrected count matrix for 18252 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 15.56519 secs
Seurat.split.set.features <- SelectIntegrationFeatures(object.list = Seurat.split.set, nfeatures = 3000)

options(future.globals.maxSize = 950 *1024^2) #to make sure I have enough RAM for the following 
Seurat.split.set <- PrepSCTIntegration(object.list = Seurat.split.set, anchor.features = Seurat.split.set.features, 
                                      verbose = FALSE)
## Calculating residuals of type pearson for 1263 genes
## Calculating residuals of type pearson for 1351 genes
## Calculating residuals of type pearson for 1506 genes
#Because in one plate there are less than 200 cells (default), this command would not run without specifying a lower number for k 

Seurat.split.set.anchors <- FindIntegrationAnchors(object.list = Seurat.split.set, normalization.method = "SCT", 
                                                  anchor.features = Seurat.split.set.features, verbose = FALSE, k.filter = 1130)

Seurat.split.set.integrated <- IntegrateData(anchorset = Seurat.split.set.anchors, normalization.method = "SCT", 
                                            verbose = FALSE)
#removal of mitochondrial genes from integrated data

integrated.data <- Seurat.split.set.integrated@assays$integrated@scale.data
integrated.data.nomit <- integrated.data[!rownames(integrated.data) %in% to.remove.mit,]

dim(integrated.data)
## [1] 3000 3367
dim(integrated.data.nomit)
## [1] 2982 3367

Within this set, there are 18 genes that match the to.remove.mit list. These will be removed, leaving me with less hypervariable genes for clustering. I don’t want to change the number of genes that I start the integration with, because that will have an effect on the anchors.

#removal of mitochondrial genes from integrated data

Seurat.split.set.integrated[["integratednomit"]] <- CreateAssayObject(integrated.data.nomit)
DefaultAssay(Seurat.split.set.integrated) <- "integratednomit"

newvar.features <- rownames(integrated.data.nomit)

Seurat.split.set.integrated <- SetAssayData(Seurat.split.set.integrated, assay = "integratednomit", slot = "scale.data", new.data = integrated.data.nomit) 
Seurat.split.set.integrated@assays$integratednomit@var.features <- rownames(integrated.data.nomit)
# PCA of batch corrected edata
Seurat.split.set.integrated <- RunPCA(Seurat.split.set.integrated, verbose = FALSE)
PCAPlot(Seurat.split.set.integrated, group.by="orig.ident")

ElbowPlot(Seurat.split.set.integrated, ndims= 50)

# Clustering (use ndims based on Elbowplot results)
Seurat.split.set.integrated <- FindNeighbors(Seurat.split.set.integrated, assay= "integratednomit", dims= 1:15)
## Computing nearest neighbor graph
## Computing SNN
resolution <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
for (i in resolution){
  Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integratednomit", resolution= i)
}
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9161
## Number of communities: 4
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8834
## Number of communities: 7
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8622
## Number of communities: 9
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8430
## Number of communities: 9
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8299
## Number of communities: 11
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8174
## Number of communities: 13
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8061
## Number of communities: 13
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7948
## Number of communities: 14
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7843
## Number of communities: 15
## Elapsed time: 0 seconds
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7742
## Number of communities: 15
## Elapsed time: 0 seconds
# check clustering stability
library(clustree)
clustree(Seurat.split.set.integrated, prefix="integratednomit_snn_res.")

# UMAP
# choose an appropriate resolution based on clustree results
Seurat.split.set.integrated <- FindClusters(Seurat.split.set.integrated, assay= "integratednomit", resolution= 0.5)
## Warning: The following arguments are not used: assay

## Warning: The following arguments are not used: assay
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3367
## Number of edges: 133386
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8299
## Number of communities: 11
## Elapsed time: 0 seconds
Seurat.split.set.integrated <- RunUMAP(Seurat.split.set.integrated, assay= "integratednomit", dims= 1:15)
## 10:40:52 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:40:52 Read 3367 rows and found 15 numeric columns
## 10:40:52 Using Annoy for neighbor search, n_neighbors = 30
## 10:40:52 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:40:52 Writing NN index file to temp file /tmp/RtmpVDqTWF/file8305e6bd2e3b4
## 10:40:52 Searching Annoy index using 1 thread, search_k = 3000
## 10:40:53 Annoy recall = 100%
## 10:40:54 Commencing smooth kNN distance calibration using 1 thread
## 10:40:55 Initializing from normalized Laplacian + noise
## 10:40:55 Commencing optimization for 500 epochs, with 138982 positive edges
## 10:40:59 Optimization finished
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")

DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")

Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)

filename_date <- paste(Sys.Date(), experiment_nr,"umap_integrated_based_on_set_RNA_res0.5_integratedwithmit_clusterednomit.pdf", sep = "_")
pdf(filename_date)
par(mfrow=c(2,1))
clustree(Seurat.split.set.integrated, prefix="integratednomit_snn_res.")
Idents(Seurat.split.set.integrated) <- "seurat_clusters"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, group.by= "Plate", split.by= "orig.ident")
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE, split.by= "treatments")
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "processed.set"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
Idents(Seurat.split.set.integrated) <- "treatments"
DimPlot(Seurat.split.set.integrated, reduction= "umap", label= TRUE)
dev.off()
## png 
##   2
Seurat.integrated.mitincluded.clusternomit <- Seurat.split.set.integrated

#Heatmaps with most diff expressed genes

4.0.3 Heatmaps

For the different batch corrected seurats, I will determine the top 10 significant markers and plot them in a heatmap. They will feature genes that are expressed in at least 25% of cells and the logfc.threshold is 0.5

4.0.3.1 No mito.genes

Idents(Seurat.split.set.integrated.nomit) <- "seurat_clusters"
markers.all.nomit.clust <- FindAllMarkers(Seurat.split.set.integrated.nomit, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.clust <- markers.all.nomit.clust %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)

Idents(Seurat.split.set.integrated.nomit) <- "treatments"
markers.all.nomit.treat <- FindAllMarkers(Seurat.split.set.integrated.nomit, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.treat <- markers.all.nomit.treat %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)
DoHeatmap(Seurat.split.set.integrated.nomit, features = selection.markers.nomit.clust$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between all clusters")

DoHeatmap(Seurat.split.set.integrated.nomit, features = selection.markers.nomit.treat$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between treatments")

4.0.3.2 Mitogenes regressed out during normalisation

Idents(Seurat.split.set.integrated.mitreg) <- "seurat_clusters"
markers.all.nomit.clust <- FindAllMarkers(Seurat.split.set.integrated.mitreg, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.clust <- markers.all.nomit.clust %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)

Idents(Seurat.split.set.integrated.mitreg) <- "treatments"
markers.all.nomit.treat <- FindAllMarkers(Seurat.split.set.integrated.mitreg, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.treat <- markers.all.nomit.treat %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)
DoHeatmap(Seurat.split.set.integrated.mitreg, features = selection.markers.nomit.clust$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between all clusters")

DoHeatmap(Seurat.split.set.integrated.mitreg, features = selection.markers.nomit.treat$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between treatments")

4.0.3.3 Mito.genes present during integration, removed before clustering

Because the FindAllMarkers function in Seurat is based on the RNA assay, there will be MT genes present in the list. However, the calling of clusters is performed without the mito.genes and should describe non-mitogenes. I will therefore remove the mito.genes from the markers data.frame

Idents(Seurat.integrated.mitincluded.clusternomit) <- "seurat_clusters"
markers.all.nomit.clust <- FindAllMarkers(Seurat.integrated.mitincluded.clusternomit, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.clust <- markers.all.nomit.clust %>%
  filter(!grepl("MT-", gene)) %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)

Idents(Seurat.integrated.mitincluded.clusternomit) <- "treatments"
markers.all.nomit.treat <- FindAllMarkers(Seurat.integrated.mitincluded.clusternomit, min.pct = 0.25, logfc.threshold = 0.5)

selection.markers.nomit.treat <- markers.all.nomit.treat %>%
  filter(!grepl("MT-", gene)) %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_logFC)
DoHeatmap(Seurat.integrated.mitincluded.clusternomit, features = selection.markers.nomit.clust$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between all clusters")

DoHeatmap(Seurat.integrated.mitincluded.clusternomit, features = selection.markers.nomit.treat$gene) + theme(axis.text = element_text(size=3)) + labs(title = "Differentially expressed between treatments")

#Go-term analysis

For this part, I need to extract the lists of differentially expressed genes called by FindAllMarkers