DOI: 10.1002/mp.70502 ISSN: 0094-2405

Medical image local augmentation via text‐ and mask‐guided diffusion model

Pei Cao, Donghao Li, Xinlu Li, Yongheng Yan, Yilong Li

Abstract

Background

Medical images serve as the core basis for precision diagnosis and treatment, yet their scarcity severely hampers the advancement of intelligent medical image analysis. Data augmentation for medical images represents a key pathway to overcoming this data bottleneck. However, existing methods primarily focus on global image transformations and exhibiting limited control over local regional details.

Purpose

In order to enhance image diversity, this paper proposes a text‐ and mask‐guided local augmentation method for medical images.

Methods

Aiming at the problem of insufficient diversity of medical synthesized images, this paper designs a text‐ and mask‐guided local augmentation method for medical images (MILA‐TMGDiff). This method first employs a pre‐trained MedSAM model to segment target regions within input medical images, yielding precise masks. Subsequently, text prompts with semantic relevance and task‐specificity are designed for different types of medical imaging data. Finally, the mask and text prompts are jointly input as local guidance conditions into a diffusion generative model. By applying controlled perturbations to the local noise distribution, fine‐grained generation control over specific anatomical regions is achieved, ultimately producing synthetic medical images of high quality in both visual realism and diversity.

Results

The method in this paper has been tested on x‐ray, MRI, and CT images for local augmentation experiments, and the quantitative analysis results show that the local structural similarity of the images generated by this paper in the Mask region exhibits a significant change: a reduction of 97.9%, 103.3%, and 42.2% on chest x‐ray, pelvic CT, and brain CT data, respectively. This phenomenon confirms that the local feature enhancement mechanism proposed in this paper can effectively modulate the distribution of structural features in the Mask region while maintaining the global texture consistency.

Conclusions

This provides a new technical pathway for controlled data augmentation in medical imaging, helping to advance the development of intelligent medical image analysis and laying the foundation for future research on fine‐grained medical image generation.

More from our Archive