TY - JOUR
T1 - The Transcript-centric Mutations in Human Genomes
AU - Cui, Peng
AU - Lin, Qiang
AU - Ding, Feng
AU - Hu, Songnian
AU - Yu, Jun
N1 - Funding Information:
This work was supported by grants from the National Basic Research Program (973 Program; 2011CB944100 and 2011CB944101 ), National Natural Science Foundation of China ( 90919024 ) awarded to JY, and Knowledge Innovation Program of the Chinese Academy of Sciences ( KSCX2-EW-R-01-04 ) to SH.
PY - 2012/2
Y1 - 2012/2
N2 - Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5' end to the 3' end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
AB - Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5' end to the 3' end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
KW - Genetic variations
KW - RNA-seq
KW - Sequence signatures
UR - http://www.scopus.com/inward/record.url?scp=84863381890&partnerID=8YFLogxK
U2 - 10.1016/S1672-0229(11)60029-6
DO - 10.1016/S1672-0229(11)60029-6
M3 - Article
C2 - 22449397
AN - SCOPUS:84863381890
SN - 1672-0229
VL - 10
SP - 11
EP - 22
JO - Genomics, Proteomics and Bioinformatics
JF - Genomics, Proteomics and Bioinformatics
IS - 1
ER -