AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function

Abstract Melanoma, one of the most dangerous types of skin cancer, results in a very high mortality rate. Early detection and resection are two key points for a successful cure. Recent researches have used artificial intelligence to classify melanoma and nevus and to compare the assessment of these...

Full description

Bibliographic Details
Main Authors: Tri-Cong Pham, Chi-Mai Luong, Van-Dung Hoang, Antoine Doucet
Format: Article
Language:English
Published: Nature Publishing Group 2021-09-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-96707-8
Description
Summary:Abstract Melanoma, one of the most dangerous types of skin cancer, results in a very high mortality rate. Early detection and resection are two key points for a successful cure. Recent researches have used artificial intelligence to classify melanoma and nevus and to compare the assessment of these algorithms to that of dermatologists. However, training neural networks on an imbalanced dataset leads to imbalanced performance, the specificity is very high but the sensitivity is very low. This study proposes a method for improving melanoma prediction on an imbalanced dataset by reconstructed appropriate CNN architecture and optimized algorithms. The contributions involve three key features as custom loss function, custom mini-batch logic, and reformed fully connected layers. In the experiment, the training dataset is kept up to date including 17,302 images of melanoma and nevus which is the largest dataset by far. The model performance is compared to that of 157 dermatologists from 12 university hospitals in Germany based on the same dataset. The experimental results prove that our proposed approach outperforms all 157 dermatologists and achieves higher performance than the state-of-the-art approach with area under the curve of 94.4%, sensitivity of 85.0%, and specificity of 95.0%. Moreover, using the best threshold shows the most balanced measure compare to other researches, and is promisingly application to medical diagnosis, with sensitivity of 90.0% and specificity of 93.8%. To foster further research and allow for replicability, we made the source code and data splits of all our experiments publicly available.
ISSN:2045-2322