MAGE-TAB Version	1.1						
Public Release Date	2012-11-06						
Investigation Title	RNA-sequencing and small RNA-sequencing of 465 lymphoblastoid cell lines from the 1000 Genomes						
Experiment Description	This RNA sequencing data set of 465 human lymphoblastoid cell line samples from the CEU, FIN, GBR, TSI and YRI populations from the 1000 Genomes sample collection was created by the Geuvadis consortium (www.geuvadis.org, http://www.geuvadis.org/web/geuvadis/our-rnaseq-project). The data is under embargo until the first publication by the investigators in early 2013. This accession contains all mRNA and small RNA sequencing data, clean data that passed QC and other filters are available under accession E-GEUV-1, E-GEUV-2. 						
Person Last Name	Kurbatova	Lappalainen	Dermitzakis				
Person First Name	Natalja	Tuuli	Emmanouil				
Person Mid Initials							
Person Email	natalja@ebi.ac.uk	Tuuli.Lappalainen@unige.ch	emmanouil.dermitzakis@unige.ch				
Person Affiliation	EBI	UNIGE	UNIGE				
Person Phone							
Person Fax							
Person Address							
Person Roles	submitter	investigator	investigator				
Experimental Factor Name	population	laboratory					
Experimental Factor Type	ethnic group	laboratory					
Experimental Factor Term Source REF	EFO	EFO					
Experimental Factor Term Accession Number	EFO_0001799	EFO_0004907 					
Protocol Name	P-GEUV-1	P-GEUV-2	P-GEUV-3	P-GEUV-4	P-GEUV-5	P-GEUV-6	P-GEUV-7
Protocol Description	For information about the sample characteristics, populations, and available genotype information, see http://www.1000genomes.org and http://ccr.coriell.org . EVB transformed lymphoblastoid cell lines (LCLs) were shipped to ECACC (European COllection of Cell Cultures) as live cultures (06/2011 - 09/2011), in batches of ~ 30 samples from Coriell (GBR,FIN & TSI) and 2 x ~90 samples (CEU & YRI) from University of Geneva. In ECACC, the cell lines were cultured to approximately 1.2 x 10e8 cells (06/2011-10/2011). Snap frozen pellets of 2 x 10e7 cells were produced from proliferating cultures without additives. 	Total RNA was extracted from cell pellets in the University of Geneva using the TRIzol Reagent (Ambion) according to the manufacturer's guidelines. No DNAse treatment was done to the RNA samples. RNA quality was assessed by Agilent Bioanalyzer RNA 6000 Nano Kit, and RNA quantity was measured by Qubit 2.0 (Invitrogen) using the RNA Broad range kit according to the manufacturer's instructions. 	Sample processing for sequencing was done in a random order in each of the seven laboratories. 5 samples were sequenced in every laboratory. Library preparation was done with TruSeq Sm RNA Sample Prep. 	The sequencing platform was Illumina HiSeq. The cluster generation kit was TruSeq PE Cluster Kit v3, and the sequencing kit was TruSeq SBS Kit v3. 36 bp single-end sequencing was done to the depth of about 3M total reads.	Sample processing for sequencing was done in a random order in each of the seven laboratories. Two batches of replicates were done: 5 samples were sequenced in every laboratory (twice in the University of Geneva), and additionally, 168 samples (mostly CEU and YRI samples) that were originally sequenced in the other laboratories were prepped and sequenced at about 1/2 of the normal coverage in University of Geneva. Library preparation was done with TruSeq RNA Sample Prep Kit v2 using the high-throughput protocol. 	The sequencing platform was Illumina HiSeq. The cluster generation kit was TruSeq PE Cluster Kit v3, and the sequencing kit was TruSeq SBS Kit v3. 75 bp paired-end sequencing was done to the depth of a minimum of 20M mapped reads.	The reads were mapped to the human hg19 reference genome (autosomes+X+Y+M) with the GEM mapper v 1.349 mapping entire reads, and split mapping the unmapped reads (see http://www.geuvadis.org/web/geuvadis/our-rnaseq-project for settings). In the bam files, mapping quality is scored as follows: 1) Matches which are unique, and do not have any subdominant match: 251 >= MAPQ >= 255, XT=U ; 2) Matches which are unique, and have subdominant matches but a different score: 175 >= MAPQ >= 181, XT=U ; 3) Matches which are putatively unique (not unique, but distinguishable by score): 119 >= MAPQ >= 127, XT=U ; Matches which are a perfect tie: 78 >= MAPQ >= 90, XT=R. Number of mismatches is not reflected in MAPQ but the information of total mismatches in both mates is found in the NM flag. 
Protocol Software							GEM mapper 1.349
Protocol Hardware				Illumina HiSeq 2000		Illumina HiSeq 2000	
Protocol Contact							
Protocol Type	growth protocol	nucleic acid extraction protocol	cDNA library construction protocol	nucleic acid sequencing protocol	cDNA library construction protocol	nucleic acid sequencing protocol	high throughput sequence alignment protocol
Protocol Term Source REF	EFO	EFO	EFO	EFO	EFO	EFO	EFO
Protocol Term Accession Number	EFO_0003789	EFO_0002944	EFO_0004187	EFO_0004170	EFO_0004187	EFO_0004170	EFO_0004917
Experimental Design	population based design	operator variation design					
Experimental Design Term Source REF	EFO	EFO					
Experimental Design Term Accession Number	EFO_0001430	EFO_0001772					
Comment[AEExperimentType]	RNA-seq of coding RNA	RNA-seq of non coding RNA					
Comment[ArrayExpressAccession]	E-GEUV-3						
Comment[SecondaryAccession]	ERP001941	ERP001942					
Comment[SequenceDataURI]	http://www.ebi.ac.uk/ena/data/view/ERR187488-ERR187939,ERR188021-ERR188482,ERR204819-ERR205023,ERR204766-ERR204805						
Term Source Name	NCBI Taxonomy	EFO					
Term Source Version		2.29					
Term Source File	http://www.ncbi.nlm.nih.gov/Taxonomy/	http://www.ebi.ac.uk/efo
SDRF File	E-GEUV-3.sdrf.txt
