TY - JOUR
T1 - Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedaries
AU - Al-Swailem, Abdulaziz M.
AU - Shehata, Maher M.
AU - Abu-Duhier, Faisel M.
AU - Al-Yamani, Essam J.
AU - Al-Busadah, Khalid A.
AU - Al-Arawi, Mohammed S.
AU - Al-Khider, Ali Y.
AU - Al-Muhaimeed, Abdullah N.
AU - Al-Qahtani, Fahad H.
AU - Manee, Manee M.
AU - Al-Shomrani, Badr M.
AU - Al-Qhtani, Saad M.
AU - Al-Harthi, Amer S.
AU - Akdemir, Kadir C.
AU - Inan, Mehmet S.
AU - Otu, Hasan H.
PY - 2010
Y1 - 2010
N2 - Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.
AB - Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.
UR - http://www.scopus.com/inward/record.url?scp=77956287507&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0010720
DO - 10.1371/journal.pone.0010720
M3 - Article
C2 - 20502665
AN - SCOPUS:77956287507
SN - 1932-6203
VL - 5
JO - PloS one
JF - PloS one
IS - 5
M1 - e10720
ER -