TY - JOUR
T1 - Discovering trends and hotspots of biosafety and biosecurity research via machine learning
AU - Guan, Renchu
AU - Pang, Haoyu
AU - Liang, Yanchun
AU - Shao, Zhongjun
AU - Gao, Xin
AU - Xu, Dong
AU - Feng, Xiaoyue
N1 - KAUST Repository Item: Exported on 2022-05-25
Acknowledged KAUST grant number(s): FCC/1/1976-23-01, REI/1/4473-01-01, URF/1/4098-01-01
Acknowledgements: Our work is supported by the National Key Research and Development Program of China No. 2021YFF1201203 and No. 2021YFF1201205, the National Natural Science Foundation of China No. 61972174 and No. 62172187, the Science and Technology Planning Project of Guangdong Province No. 2020A0505100018, Guangdong Universities’ Innovation Team Project (2021KCXTD015) and Guangdong Key Disciplines Project (2021ZDJS138), the Tencent Rhino-Bird Research Program, and the Office of Research Administration (ORA) at King Abdullah University of Science and Technology (KAUST) under award numbers FCC/1/1976-23-01, URF/1/4098-01-01 and REI/1/4473-01-01.
PY - 2022/5/22
Y1 - 2022/5/22
N2 - Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as 'biosafety') a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms.
AB - Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as 'biosafety') a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms.
UR - http://hdl.handle.net/10754/678139
UR - https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac194/6590367
U2 - 10.1093/bib/bbac194
DO - 10.1093/bib/bbac194
M3 - Article
C2 - 35596953
SN - 1467-5463
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
ER -