Please read the following:
s123456.R
where s123456
is your student number, with the questions clearly marked with comments (using the #
character). Please don’t include any unnecessary code, only the code that is used to produce the answer.Read the following file into a tibble: https://mbdata.science.ru.nl/share/heeringen/gbd_exam/genome_size.Oi0I.csv
. This file contains information on the genome size and chromosome number of various species.
How many observations (organisms) does this data set contain?
Create a new tibble (or data.frame
) with the column Organism_Name
renamed to Name
and the column Organism_Group
renamed to Group
.
How many eukaryotes have a known number of chromosomes (higher than 0)?
Which organism has the largest number of assemblies?
Is there a significant correlation between the number of chromosomes and the genome size?
Create a boxplot of the genome size per organism group. Make sure the y-axis in log10 scale.
For this question you are going to work with the curated database of single cell studies described by Svensson et al. You can use the following two commands to read in this data (you don’t need to save or download the file first). Ignore any warnings.
sc <- read_tsv('http://www.nxn.se/single-cell-studies/data.tsv')
sc <- sc %>% rename_all(~str_replace_all(., '\s+', '_'))
How many studies had 5 or more cell types or clusters? Don’t include studies where this information is missing.
The columns Tissue
and Cell_source
contain information about the cell type on which the single cell experiment had been performed. What is the most common cell source for the tissue "Culture"
(cultured cells)?
Create a scatter plot of the number of cells against the date. Facet by organism, and only use the four organisms with the highest number of experiments (Human, Mouse, Drosophila and Zebrafish). Make sure the y-axis is on a log10 scale.
We have the hypothesis that publications in Immunity will have a larger number of cells than publications in Cell Reports. Use everything that you have learned to test this hypothesis. What is your conclusion?