TY - GEN
T1 - Exploring Sentiment as a Potential Indicator of Bias in Disease Ontologies
AU - Slater, Luke T.
AU - Williams, John A.
AU - Schofield, Paul N.
AU - Gkoutos, Georgios V.
N1 - KAUST Repository Item: Exported on 2022-01-19
Acknowledged KAUST grant number(s): OSR, URF/1/3790-01-01
Acknowledgements: GVG and LTS acknowledge support from the NIHR Birmingham ECMC, NIHR Birmingham SRMRC, Nanocommons H2020-EU (731032) and the NIHR Birmingham Biomedical Research Centre. GVG, LTS, and JAW acknowledge support from the MRC HDR UK (HDRUK/CFC/01), an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, the Medical Research Council or the Department of Health. GVG and PNS are supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3790-01-01, and both acknowledge the support of the The Alan Turing Institute.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2021/12/9
Y1 - 2021/12/9
N2 - Ontologies are fundamental tools for the organisation and analysis of biomedical data. One of their roles is as controlled domain vocabularies, providing standardised language and categorisation for relevant domain concepts. As such, ontologies frequently include a wealth of natural language metadata including labels and definitions. Since these metadata are usually created by humans, there exists the possibility that conscious and unconscious biases may be reflected in them. Moreover, humans and computers engage directly with these metadata during the course of scientific practice, and therefore any biases or idiosyncrasies may influence work involving the use of these concepts. Previous work has exposed the possibility of bias in ontological representations of disease domains, however there have been no methods developed for automatic or semiautomatic guidance towards bias in ontology metadata. In this article, we develop an approach to explore sentiment analysis as a potential indicator of bias in ontology concept definitions. We evaluate its use on pairs of disease classes from MESH and Human Disease Ontology (DO), comparing and contrasting sentiment scores between them. We use these examples to identify and evaluate a number of outlying examples, relating them to existing literature. We discuss how our approach could be used to guide ontology developers towards outlying and potentially biased language, forming a tool that could be used to evaluate and improve normalisation of ontology metadata. We also discuss the applicability and appropriateness of general-purpose sentiment analysis applied to biomedical texts, and potential influences of bias on computational analysis, in the context of our results.
AB - Ontologies are fundamental tools for the organisation and analysis of biomedical data. One of their roles is as controlled domain vocabularies, providing standardised language and categorisation for relevant domain concepts. As such, ontologies frequently include a wealth of natural language metadata including labels and definitions. Since these metadata are usually created by humans, there exists the possibility that conscious and unconscious biases may be reflected in them. Moreover, humans and computers engage directly with these metadata during the course of scientific practice, and therefore any biases or idiosyncrasies may influence work involving the use of these concepts. Previous work has exposed the possibility of bias in ontological representations of disease domains, however there have been no methods developed for automatic or semiautomatic guidance towards bias in ontology metadata. In this article, we develop an approach to explore sentiment analysis as a potential indicator of bias in ontology concept definitions. We evaluate its use on pairs of disease classes from MESH and Human Disease Ontology (DO), comparing and contrasting sentiment scores between them. We use these examples to identify and evaluate a number of outlying examples, relating them to existing literature. We discuss how our approach could be used to guide ontology developers towards outlying and potentially biased language, forming a tool that could be used to evaluate and improve normalisation of ontology metadata. We also discuss the applicability and appropriateness of general-purpose sentiment analysis applied to biomedical texts, and potential influences of bias on computational analysis, in the context of our results.
UR - http://hdl.handle.net/10754/675034
UR - https://ieeexplore.ieee.org/document/9669329/
U2 - 10.1109/bibm52615.2021.9669329
DO - 10.1109/bibm52615.2021.9669329
M3 - Conference contribution
BT - 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
PB - IEEE
ER -