MAGE-TAB Version	1.1				
Public Release Date	2012-11-06				
Investigation Title	RNA-sequencing of 465 lymphoblastoid cell lines from the 1000 Genomes				
Experiment Description	This RNA sequencing data set of 465 human lymphoblastoid cell line samples from the CEU, FIN, GBR, TSI and YRI populations from the 1000 Genomes sample collection was created by the Geuvadis consortium (www.geuvadis.org, http://www.geuvadis.org/web/geuvadis/our-rnaseq-project). The data is under embargo until the first publication by the investigators in early 2013. This accession contains mRNA sequencing data, and small RNA data from the same samples are available under accession E-GEUV-2. 				
Person Last Name	Kurbatova	Lappalainen	Dermitzakis		
Person First Name	Natalja	Tuuli	Emmanouil		
Person Mid Initials					
Person Email	natalja@ebi.ac.uk	Tuuli.Lappalainen@unige.ch	emmanouil.dermitzakis@unige.ch		
Person Affiliation	EBI	UNIGE	UNIGE		
Person Phone					
Person Fax					
Person Address					
Person Roles	submitter	investigator	investigator		
Experimental Factor Name	population	laboratory			
Experimental Factor Type	ethnic group	laboratory			
Experimental Factor Term Source REF	EFO	EFO			
Experimental Factor Term Accession Number	EFO_0001799	EFO_0004907 			
Protocol Name	P-GEUV-1	P-GEUV-2	P-GEUV-5	P-GEUV-6	P-GEUV-7
Protocol Description	For information about the sample characteristics, populations, and available genotype information, see http://www.1000genomes.org and http://ccr.coriell.org . EVB transformed lymphoblastoid cell lines (LCLs) were shipped to ECACC (European COllection of Cell Cultures) as live cultures (06/2011 - 09/2011), in batches of ~ 30 samples from Coriell (GBR,FIN & TSI) and 2 x ~90 samples (CEU & YRI) from University of Geneva. In ECACC, the cell lines were cultured to approximately 1.2 x 10e8 cells (06/2011-10/2011). Snap frozen pellets of 2 x 10e7 cells were produced from proliferating cultures without additives. 	Total RNA was extracted from cell pellets in the University of Geneva using the TRIzol Reagent (Ambion) according to the manufacturer's guidelines. No DNAse treatment was done to the RNA samples. RNA quality was assessed by Agilent Bioanalyzer RNA 6000 Nano Kit, and RNA quantity was measured by Qubit 2.0 (Invitrogen) using the RNA Broad range kit according to the manufacturer's instructions. 	Sample processing for sequencing was done in a random order in each of the seven laboratories. Two batches of replicates were done: 5 samples were sequenced in every laboratory (twice in the University of Geneva), and additionally, 168 samples (mostly CEU and YRI samples) that were originally sequenced in the other laboratories were prepped and sequenced at about 1/2 of the normal coverage in University of Geneva. Library preparation was done with TruSeq RNA Sample Prep Kit v2 using the high-throughput protocol. 	The sequencing platform was Illumina HiSeq. The cluster generation kit was TruSeq PE Cluster Kit v3, and the sequencing kit was TruSeq SBS Kit v3. 75 bp paired-end sequencing was done to the depth of a minimum of 20M mapped reads.	The reads were mapped to the human hg19 reference genome (autosomes+X+Y+M) with the GEM mapper v 1.349 mapping entire reads, and split mapping the unmapped reads (see http://www.geuvadis.org/web/geuvadis/our-rnaseq-project for settings). In the bam files, mapping quality is scored as follows: 1) Matches which are unique, and do not have any subdominant match: 251 >= MAPQ >= 255, XT=U ; 2) Matches which are unique, and have subdominant matches but a different score: 175 >= MAPQ >= 181, XT=U ; 3) Matches which are putatively unique (not unique, but distinguishable by score): 119 >= MAPQ >= 127, XT=U ; Matches which are a perfect tie: 78 >= MAPQ >= 90, XT=R. Number of mismatches is not reflected in MAPQ but the information of total mismatches in both mates is found in the NM flag. 
Protocol Software					GEM mapper 1.349
Protocol Hardware				Illumina HiSeq 2000	
Protocol Contact					
Protocol Type	growth protocol	nucleic acid extraction protocol	cDNA library construction protocol	nucleic acid sequencing protocol	high throughput sequence alignment protocol
Protocol Term Source REF	EFO	EFO	EFO	EFO	EFO
Protocol Term Accession Number	EFO_0003789	EFO_0002944	EFO_0004187	EFO_0004170	EFO_0004917
Experimental Design	population based design	operator variation design			
Experimental Design Term Source REF	EFO	EFO			
Experimental Design Term Accession Number	EFO_0001430	EFO_0001772			
Comment[AEExperimentType]	transcription profiling by high throughput sequencing				
Comment[SecondaryAccession]	ERP001942
Comment[SequenceDataURI]	http://www.ebi.ac.uk/ena/data/view/ERR188021-ERR188482
Comment[ArrayExpressAccession]	E-GEUV-1				
Term Source Name	NCBI Taxonomy	EFO			
Term Source Version		2.29			
Term Source File	http://www.ncbi.nlm.nih.gov/Taxonomy/	http://www.ebi.ac.uk/efo			
SDRF File	E-GEUV-1.sdrf.txt
