TY - GEN
T1 - Temporal positive-unlabeled learning for biomedical hypothesis generation via risk estimation
AU - Akujuobi, Uchenna Thankgod
AU - Chen, Jun
AU - Elhoseiny, Mohamed
AU - Spranger, Michael
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2021-08-19
Acknowledged KAUST grant number(s): URF/1/1976, NSFC no.61828302
Acknowledgements: The research reported in this publication was supported by funding from the Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), under award number URF/1/1976-31-01, and NSFC No 61828302. Additional revenue related to this work: student internship at Sony Computer Science Laboratories Inc. We would like to acknowledge the great contribution of Sucheendra K. PalaniapPalaniappanpan and The Systems Biology institute to this work for the initial problem definition and data collection.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
AB - Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
UR - http://hdl.handle.net/10754/670669
UR - https://proceedings.neurips.cc/paper/2020/hash/310614fca8fb8e5491295336298c340f-Abstract.html
UR - http://www.scopus.com/inward/record.url?scp=85108410206&partnerID=8YFLogxK
M3 - Conference contribution
BT - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
PB - Neural information processing systems foundation
ER -