TY - JOUR
T1 - RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods
AU - Germain, Pierre-Luc
AU - Vitriolo, Alessandro
AU - Adamo, Antonio
AU - Laise, Pasquale
AU - Das, Vivek
AU - Testa, Giuseppe
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: European Research Council [616441 – DISEASEAVATARS to G.T.]; Regione Lombardia (Ricerca Indipendente 2012); Italian Ministry of Health (Ricerca Corrente to G.T.); ERA-NET Neuron Program (to G.T. and P.L.G.); Italian Association for Cancer Research (to G.T.); EPIGEN Flagship Project of the Italian National Research Council (to G.T.); Jerome-Lejeune Foundation (to G.T.); Umberto Veronesi Foundation (fellowship to P.L.G.); Federation of European Biochemical Societies (to A.A.); Italian Foundation for Cancer Research (to P.L. and V.D.). Funding for open access charge: ERC Research Grant DISEASEAVATARS [616441].
PY - 2016/5/17
Y1 - 2016/5/17
N2 - RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.
AB - RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.
UR - http://hdl.handle.net/10754/619769
UR - http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkw448
UR - http://www.scopus.com/inward/record.url?scp=84976386635&partnerID=8YFLogxK
U2 - 10.1093/nar/gkw448
DO - 10.1093/nar/gkw448
M3 - Article
SN - 0305-1048
VL - 44
SP - 5054
EP - 5067
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 11
ER -