High-throughput screening (HTS) performs the experimental testing of a large number of chemical compounds aiming to identify those active in the considered assay. Alternatively, faster and cheaper methods of large-scale virtual screening are performed computationally through quantitative structure-activity relationship (QSAR) models. However, the vast amount of available HTS heterogeneous data and the imbalanced ratio of active to inactive compounds in an assay make this a challenging problem. Although different QSAR models have been proposed, they have certain limitations, e.g., high false positive rates, complicated user interface, and limited utilization options. Therefore, we developed DPubChem, a novel web tool for deriving QSAR models that implement the state-of-the-art machine-learning techniques to enhance the precision of the models and enable efficient analyses of experiments from PubChem BioAssay database. DPubChem also has a simple interface that provides various options to users. DPubChem predicted active compounds for 300 datasets with an average geometric mean and F 1 score of 76.68% and 76.53%, respectively. Furthermore, DPubChem builds interaction networks that highlight novel predicted links between chemical compounds and biological assays. Using such a network, DPubChem successfully suggested a novel drug for the Niemann-Pick type C disease. DPubChem is freely available at www.cbrc.kaust.edu.sa/dpubchem.
ASJC Scopus subject areas