Abstract:In radiation oncology, it is usually difficult and time-consuming to manually profile the targets in the head and neck. Therefore, it is very necessary to develop an automatic medical image segmentation method, which not only saves time and energy, but also avoids the subjective variations among different physicians. In this work, we used positron emission computed tomography and computed tomography image data to segment head and neck tumors, and realized more accurate segmentation by using the complementary information between them. The network was developed based on the U-Net architecture, and an inception module was added into the encoder module. In addition, dense modules and spatial attention are added to the decoder to improve the network performance. Experimental results show that our method outperforms the other U-Net networks. Quantitatively, the dice similarity coefficient, recall rate and Jaccard similarity coefficient are found to be 0.782, 0.846 and 0.675, respectively. Compared with the original U-Net, these results corresponds to an improvement by 6.8%, 13.4% and 9.8%, respectively. The 95% Hausdorff distance is found to be 5.661, which is 1.616 smaller than the original U-Net. In conclusion, this study demonstrates that the inception spatial-attention dense U-Net model can effectively improve the segmentation accuracy on the head and neck tumor PET-CT images.