Identification and classification of promoters using the attention mechanism based on long short-term memory

作者:Qingwen Li, Lichao Zhang, Lei Xu, Quan Zou, Jin Wu, Qingyuan Li

摘要

A promoter is a short region of DNA that can bind RNA polymerase and initiate gene transcription. It is usually located directly upstream of the transcription initiation site. DNA promoters have been proven to be the main cause of many human diseases, especially diabetes, cancer or Huntington’s disease. Therefore, the classification of promoters has become an interesting problem and has attracted the attention of many researchers in the field of bioinformatics. Various studies have been conducted in order to solve this problem, but their performance still needs further improvement. In this research, we segmented the DNA sequence in a k-mers manner, then trained the word vector model, inputted it into long short-term memory(LSTM) and used the attention mechanism to predict. Our method can achieve 93.45% and 90.59% cross-validation accuracy in the two layers, respectively. Our results are better than others based on the same data set, and provided some ideas for accurately predicting promoters. In addition, this research suggested that natural language processing can play a significant role in biological sequence prediction.

论文关键词:promoter, bioinformatics, natural language processing, attention mechanism

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-021-0548-9