Abstract:Cancer is a genetically related disease with multiple subtypes, each exhibiting significant differences in genetics, phenotype, and treatment response. Accurate classification of cancer subtypes is critical for personalized treatment, as it helps improve therapeutic outcomes. However, cancer subtype classification methods based on patient gene expression data often struggle to effectively distinguish rare subtypes in the presence of imbalanced samples. To address this issue, a cancer subtype classification method called MFP-VAE (Meta-learning Few-shot Prototype learning VAE) is proposed, focusing on handling datasets with imbalanced samples. This method improves the sampling strategy to ensure balanced consideration of different subtypes in meta-learning tasks. The model employs a variational autoencoder for feature extraction and classifies samples by calculating the distance between the samples and the subtype prototypes. Experimental results show that MFP-VAE outperforms existing methods on two public cancer datasets, significantly improving classification accuracy, especially under imbalanced sample conditions. Furthermore, survival analysis reveals that the distinguished cancer subtypes exhibit significant differences in clinical characteristics, providing meaningful clinical insights.