TY - JOUR
T1 - Assessing the consistency of public human tissue RNA-seq data sets
AU - Danielsson, Frida
AU - James, Tojo
AU - Gomez-Cabrero, David
AU - Huss, Mikael
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-16
PY - 2015/2/6
Y1 - 2015/2/6
N2 - Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch ef- fects, differences in RNA extraction and sequencing library preparationmethods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expressionmeasurements for osten- sibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributingmost to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias frombioinformatics pipe- lines. We conclude that published human tissue RNA-seq expressionmeasurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The art- icle is supplemented by a detailed walkthrough with embedded R code and figures.
AB - Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch ef- fects, differences in RNA extraction and sequencing library preparationmethods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expressionmeasurements for osten- sibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributingmost to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias frombioinformatics pipe- lines. We conclude that published human tissue RNA-seq expressionmeasurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The art- icle is supplemented by a detailed walkthrough with embedded R code and figures.
UR - https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbv017
UR - http://www.scopus.com/inward/record.url?scp=84941241201&partnerID=8YFLogxK
U2 - 10.1093/bib/bbv017
DO - 10.1093/bib/bbv017
M3 - Article
SN - 1477-4054
VL - 16
SP - 941
EP - 949
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 6
ER -