TY - JOUR
T1 - Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis
T2 - New insight into intron and operon populations
AU - Satou, Yutaka
AU - Mineta, Katsuhiko
AU - Ogasawara, Michio
AU - Sasakura, Yasunori
AU - Shoguchi, Eiichi
AU - Ueno, Keisuke
AU - Yamada, Lixy
AU - Matsumoto, Jun
AU - Wasserscheid, Jessica
AU - Dewar, Ken
AU - Wiley, Graham B.
AU - Macmil, Simone L.
AU - Roe, Bruce A.
AU - Zeller, Robert W.
AU - Hastings, Kenneth E.M.
AU - Lemaire, Patrick
AU - Lindquist, Erika
AU - Endo, Toshinori
AU - Hotta, Kohji
AU - Inaba, Kazuo
N1 - Funding Information:
The assembly and gene model optimization work was supported by BIRD of Japan Science and Technology Agency. Primary sequence data were acquired under programs funded by Grants-in-aid from MEXT, Japan (YS, No.17687022), the National Science Foundation (RWZ), and the Canadian Institutes of Health Research (KD and KEMH, MOP-77708).
PY - 2008/10/14
Y1 - 2008/10/14
N2 - Background: The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly. Results: We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5′-ends were precisely mapped using 5′-full-length ESTs, an important refinement even in otherwise unchanged models. Conclusion: Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.
AB - Background: The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly. Results: We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5′-ends were precisely mapped using 5′-full-length ESTs, an important refinement even in otherwise unchanged models. Conclusion: Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.
UR - http://www.scopus.com/inward/record.url?scp=55349149407&partnerID=8YFLogxK
U2 - 10.1186/gb-2008-9-10-r152
DO - 10.1186/gb-2008-9-10-r152
M3 - Article
C2 - 18854010
AN - SCOPUS:55349149407
SN - 1474-7596
VL - 9
JO - Genome biology
JF - Genome biology
IS - 10
M1 - R152
ER -