DOI: 10.1002/acm2.14248 ISSN: 1526-9914

Accurate and robust auto‐segmentation of head and neck organ‐at‐risks based on a novel CNN fine‐tuning workflow

Shunyao Luan, Kun Wu, Yuan Wu, Benpeng Zhu, Wei Wei, Xudong Xue
  • Radiology, Nuclear Medicine and imaging
  • Instrumentation
  • Radiation

Abstract

Purpose

Obvious inconsistencies in auto‐segmentations exist among various AI software. In this study, we have developed a novel convolutional neural network (CNN) fine‐tuning workflow to achieve precise and robust localized segmentation.

Methods

The datasets include Hubei Cancer Hospital dataset, Cetuximab Head and Neck Public Dataset, and Québec Public Dataset. Seven organs‐at‐risks (OARs), including brain stem, left parotid gland, esophagus, left optic nerve, optic chiasm, mandible, and pharyngeal constrictor, were selected. The auto‐segmentation results from four commercial AI software were first compared with the manual delineations. Then a new multi‐scale lightweight residual CNN model with an attention module (named as HN‐Net) was trained and tested on 40 samples and 10 samples from Hubei Cancer Hospital, respectively. To enhance the network's accuracy and generalization ability, the fine‐tuning workflow utilized an uncertainty estimation method for automatic selection of candidate samples of worthiness from Cetuximab Head and Neck Public Dataset for further training. The segmentation performances were evaluated on the Hubei Cancer Hospital dataset and/or the entire Québec Public Dataset.

Results

A maximum difference of 0.13 and 0.7 mm in average Dice value and Hausdorff distance value for the seven OARs were observed by four AI software. The proposed HN‐Net achieved an average Dice value of 0.14 higher than that of the AI software, and it also outperformed other popular CNN models (HN‐Net: 0.79, U‐Net: 0.78, U‐Net++: 0.78, U‐Net‐Multi‐scale: 0.77, AI software: 0.65). Additionally, the HN‐Net fine‐tuning workflow by using the local datasets and external public datasets further improved the automatic segmentation with the average Dice value by 0.02.

Conclusion

The delineations of commercial AI software need to be carefully reviewed, and localized further training is necessary for clinical practice. The proposed fine‐tuning workflow could be feasibly adopted to implement an accurate and robust auto‐segmentation model by using local datasets and external public datasets.

More from our Archive