TY - JOUR
T1 - Accurate transcriptome-wide identification and quantification of alternative polyadenylation from RNA-seq data with APAIQ
AU - Long, Yongkang
AU - Zhang, Bin
AU - Tian, Shuye
AU - Chan, Jia Jia
AU - Zhou, Juexiao
AU - Li, Zhongxiao
AU - Li, Yisheng
AU - An, Zheng
AU - Liao, Xingyu
AU - Wang, Yu
AU - Sun, Shiwei
AU - Xu, Ying
AU - Tay, Yvonne
AU - Chen, Wei
AU - Gao, Xin
N1 - KAUST Repository Item: Exported on 2023-05-01
Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, REI/1/4940-01-01, REI/1/5202-01-01, URF/1/4098-01-01, URF/1/4352-01-01, URF/1/4379-01-01, URF/1/ 4663-01-01
Acknowledgements: We thank all past and present members in the Structure and Functional Bioinformatics Group for their assistance and constructive feedback on this project. We also thank KAUST-HPC for providing generous support on computational resources. This work was supported by King Abdullah University of Science and Technology (KAUST) Office Administration (ORA) under Award Nos. FCC/1/1976-44-1, FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4098-01-01, URF/1/4352-01-01, URF/1/4379-01-01, URF/1/ 4663-01-01, REI/1/5202-01-01, and REI/1/4940-01-01; National Key Research and Development Program of China (Grant No. 2021YFF1201000); National Nature Science Foundation of China (Grant Nos. 62002388, 32100431, 31970601); Shenzhen Science and Technology Program (Grant No. KQTD20180411143432337); and Shenzhen–Hong Kong Institute of Brain Science–Shenzhen Fundamental Research Institutions (Grant No. 2021SHIBS0002).
PY - 2023/4/28
Y1 - 2023/4/28
N2 - Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3′ ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage (PAU), and bias toward 3′ untranslated regions. Here we developed a tool for APA identification and quantification (APAIQ) from RNA-seq data, which can accurately identify PAS and quantify PAU in a transcriptome-wide manner. Using 3′ end-seq data as the benchmark, we showed that APAIQ outperforms current methods on PAS identification and PAU quantification, including DaPars2, Aptardi, mountainClimber, SANPolyA, and QAPA. Finally, applying APAIQ on 421 RNA-seq samples from liver cancer patients, we identified >540 tumor-associated APA events and experimentally validated two intronic polyadenylation candidates, demonstrating its capacity to unveil cancer-related APA with a large-scale RNA-seq data set.
AB - Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3′ ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage (PAU), and bias toward 3′ untranslated regions. Here we developed a tool for APA identification and quantification (APAIQ) from RNA-seq data, which can accurately identify PAS and quantify PAU in a transcriptome-wide manner. Using 3′ end-seq data as the benchmark, we showed that APAIQ outperforms current methods on PAS identification and PAU quantification, including DaPars2, Aptardi, mountainClimber, SANPolyA, and QAPA. Finally, applying APAIQ on 421 RNA-seq samples from liver cancer patients, we identified >540 tumor-associated APA events and experimentally validated two intronic polyadenylation candidates, demonstrating its capacity to unveil cancer-related APA with a large-scale RNA-seq data set.
UR - http://hdl.handle.net/10754/691300
UR - http://genome.cshlp.org/lookup/doi/10.1101/gr.277177.122
U2 - 10.1101/gr.277177.122
DO - 10.1101/gr.277177.122
M3 - Article
C2 - 37117035
SN - 1088-9051
JO - Genome Research
JF - Genome Research
ER -