NeuralTE: an accurate approach for Transposable Element superfamily classification with multi-feature fusion

Kang Hu, Minghua Xu, Xin Gao, Jianxin Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Transposable Elements (TEs), which make up a significant portion of the genomes in most eukaryotic organisms, can be classified into various superfamilies based on their sequence and structural characteristics. Accurate TE classification at the superfamily level can reveal their distribution and abundance across various genomes, providing deeper insights into species variation and evolution. Recent advancements in third-generation sequencing technologies have made a large number of genomes from non-model species available. However, existing TE classification methods suffer from several limitations, including the necessity to train multiple hierarchical classification models, the incapacity to perform classification at the superfamily level, and deficiencies in both accuracy and robustness. Therefore, there is an urgent need for an accurate TE classification method to improve genome annotation. In this study, we develop NeuralTE, a deep learning method designed to classify TEs at the superfamily level. To achieve accurate TE classification, we identify various structural features of TEs and use different combinations of k-mers for terminal repeats and internal sequences to uncover distinct patterns. Evaluation on all TEs from Repbase shows that NeuralTE outperforms existing machine learning and homology-based methods in classifying TEs. Testing on TEs from novel species highlights the superior performance of NeuralTE compared to existing methods. We also conduct TE annotation experiments on rice using different classification tools, and the results show that NeuralTE achieves annotations nearly identical to the gold standard, highlighting its robustness and accuracy in classifying TEs. NeuralTE is publicly available at https://github.com/CSU-KangHu/NeuralTE.

Original languageEnglish (US)
Title of host publicationACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400713026
DOIs
StatePublished - Dec 16 2024
Event15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024 - Shenzhen, China
Duration: Nov 22 2024Nov 25 2024

Publication series

NameACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024
Country/TerritoryChina
CityShenzhen
Period11/22/2411/25/24

Keywords

  • genome annotation
  • multi-feature fusion
  • superfamily level
  • Transposable Element

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'NeuralTE: an accurate approach for Transposable Element superfamily classification with multi-feature fusion'. Together they form a unique fingerprint.

Cite this