TY - JOUR
T1 - NLR-parser: Rapid annotation of plant NLR complements
AU - Steuernagel, Burkhard
AU - Jupe, Florian
AU - Witek, Kamil
AU - Jones, Jonathan D.G.
AU - Wulff, Brande B.H.
N1 - Generated from Scopus record by KAUST IRTS on 2023-02-20
PY - 2015/5/15
Y1 - 2015/5/15
N2 - Motivation: The repetitive nature of plant disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins hampers their prediction with standard gene annotation software. Motif alignment and search tool (MAST) has previously been reported as a tool to support annotation of NLR-encoding genes. However, the decision if a motif combination represents an NLR protein was entirely manual. Results: The NLR-parser pipeline is designed to use the MAST output from six-frame translated amino acid sequences and filters for predefined biologically curated motif compositions. Input reads can be derived from, for example, raw long-read sequencing data or contigs and scaffolds coming from plant genome projects. The output is a tab-separated file with information on start and frame of the first NLR specific motif, whether the identified sequence is a TNL or CNL, potentially full or fragmented. In addition, the output of the NB-ARC domain sequence can directly be used for phylogenetic analyses. In comparison to other prediction software, the highly complex NB-ARC domain is described in detail using several individual motifs.
AB - Motivation: The repetitive nature of plant disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins hampers their prediction with standard gene annotation software. Motif alignment and search tool (MAST) has previously been reported as a tool to support annotation of NLR-encoding genes. However, the decision if a motif combination represents an NLR protein was entirely manual. Results: The NLR-parser pipeline is designed to use the MAST output from six-frame translated amino acid sequences and filters for predefined biologically curated motif compositions. Input reads can be derived from, for example, raw long-read sequencing data or contigs and scaffolds coming from plant genome projects. The output is a tab-separated file with information on start and frame of the first NLR specific motif, whether the identified sequence is a TNL or CNL, potentially full or fragmented. In addition, the output of the NB-ARC domain sequence can directly be used for phylogenetic analyses. In comparison to other prediction software, the highly complex NB-ARC domain is described in detail using several individual motifs.
UR - https://academic.oup.com/bioinformatics/article/31/10/1665/177009
UR - http://www.scopus.com/inward/record.url?scp=84929620494&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btv005
DO - 10.1093/bioinformatics/btv005
M3 - Article
SN - 1460-2059
VL - 31
SP - 1665
EP - 1667
JO - Bioinformatics
JF - Bioinformatics
IS - 10
ER -