Medical Education

Latest News

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets

Skin diseases are the most common reason for clinical consultations in studied populations 1, affecting almost a third of the global population 2, 3. The 2013 Global Burden of Disease found skin diseases to be the fourth leading cause of nonfatal disabilities globally, accounting for 41.6 million Disability Adjusted Life Years and 39.0 million Years Lost due to Disability 4. In the USA alone, the healthcare cost of skin diseases was estimated to be $75 billion in 2016 5.

The Role of Deep Learning in Dermatology

The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. Advances in deep learning (DL)-based methods for dermatological tasks have produced models that are approaching the diagnostic accuracies of experts, some even mimicking clinical approaches of hierarchical 7, 8, 9 and differential 10 diagnoses. Consequently, the data-driven nature of these DL methods implies that large and diverse datasets are needed to train accurate, robust, and generalizable models.

Key Data and Statistical Overview

  • Global Impact: Skin diseases account for 41.6 million Disability Adjusted Life Years.
  • Economic Burden: Estimated healthcare cost of $75 billion in the USA (2016).
  • Scale of Affliction: Affects almost a third of the global population.
  • Medical Shortage: There is a projected decline in the ratio of dermatologists to populations.

Challenges in Data Quality and Dataset Integrity

With the increased incidence rates of skin cancer over the past decades 6, coupled with the projected decline in the ratio of dermatologists to populations 5, automated systems for dermatological diagnosis can be immensely valuable. However, unlike natural computer vision datasets, medical image datasets are relatively smaller, primarily because of the large costs associated with image acquisition and annotation, legal, ethical, and privacy concerns 11, and are more cost prohibitive to expand 12. This is also true for skin cancer image datasets 13, 14, where the surge in skin image analysis research over the past decade can be attributed in part to recent publicly available datasets, most notably the datasets and challenges of the International Skin Imaging Collaboration (ISIC) and the associated HAM10000 15 and BCN2000 16 datasets, as well as other clinical image datasets such as SD-198 17, SD-260 18, derm7pt 19, and Fitzpatrick17k 20.

Although large data sets are important for the development of reliable models, the quality of the data therein and their correct use are equally important 21, 22, 23, 24: low-quality data may result in inefficient training, inaccurate models that exhibit biases, poor generalizability and low robustness, and may negatively affect the interpretability of such models. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition.

Analysis and Methodology

In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.