TY - JOUR
T1 - Transcript annotation in FANTOM3
T2 - Mouse gene catalog based on physical cDNAs
AU - Maeda, Norihiro
AU - Kasukawa, Takeya
AU - Oyama, Rieko
AU - Gough, Julian
AU - Frith, Martin
AU - Engström, Pär G.
AU - Lenhard, Boris
AU - Aturaliya, Rajith N.
AU - Batalov, Serge
AU - Beisel, Kirk W.
AU - Bult, Carol J.
AU - Fletcher, Colin F.
AU - Forrest, Alistair R.R.
AU - Furuno, Masaaki
AU - Hill, David
AU - Itoh, Masayoshi
AU - Kanamori-Katayama, Mutsumi
AU - Katayama, Shintaro
AU - Katoh, Masaru
AU - Kawashima, Tsugumi
AU - Quackenbushb, John
AU - Ravasi, Timothy
AU - Ring, Brian Z.
AU - Shibata, Kazuhiro
AU - Sugiura, Koji
AU - Takenaka, Yoichi
AU - Teasdale, Rohan D.
AU - Wells, Christine A.
AU - Zhu, Yunxia
AU - Kai, Chikatoshi
AU - Kawai, Jun
AU - Hume, David A.
AU - Carninci, Piero
AU - Hayashizaki, Yoshihide
PY - 2006/4
Y1 - 2006/4
N2 - The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
AB - The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
UR - http://www.scopus.com/inward/record.url?scp=33646490270&partnerID=8YFLogxK
U2 - 10.1371/journal.pgen.0020062
DO - 10.1371/journal.pgen.0020062
M3 - Article
C2 - 16683036
AN - SCOPUS:33646490270
SN - 1553-7390
VL - 2
SP - 498
EP - 503
JO - PLOS Genetics
JF - PLOS Genetics
IS - 4
M1 - e62
ER -