Expert-defined Keywords Improve Interpretability of Retinal Image Captioning

Ting-Wei Wu, Jia-Hong Huang, Joseph Lin, Marcel Worring

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations


Automatic machine learning-based (ML-based) medical report generation systems for retinal images suffer from a relative lack of interpretability. Hence, such ML-based systems are still not widely accepted. The main reason is that trust is one of the important motivating aspects of interpretability and humans do not trust blindly. Precise technical definitions of interpretability still lack consensus. Hence, it is difficult to make a human-comprehensible ML-based medical report generation system. Heat maps/saliency maps, i.e., post-hoc explanation approaches, are widely used to improve the interpretability of ML-based medical systems. However, they are well known to be problematic. From an ML-based medical model’s perspective, the highlighted areas of an image are considered important for making a prediction. However, from a doctor’s perspective, even the hottest regions of a heat map contain both useful and non-useful information. Simply localizing the region, therefore, does not reveal exactly what it was in that area that the model considered useful. Hence, the post-hoc explanation-based method relies on humans who probably have a biased nature to decide what a given heat map might mean. Interpretability boosters, in particular expert-defined keywords, are effective carriers of expert domain knowledge and they are human-comprehensible. In this work, we propose to exploit such keywords and a specialized attention-based strategy to build a more human-comprehensible medical report generation system for retinal images. Both keywords and the proposed strategy effectively improve the interpretability. The proposed method achieves state-of-the-art performance under commonly used text evaluation metrics BLEU, ROUGE, CIDEr, and METEOR. Project website:
Original languageEnglish (US)
Title of host publication2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
StatePublished - Jan 2023
Externally publishedYes


Dive into the research topics of 'Expert-defined Keywords Improve Interpretability of Retinal Image Captioning'. Together they form a unique fingerprint.

Cite this