Gamma Knife treatment planning using knowledge‐based reinforcement learning
Christopher Huynh, Björn Ahlgren, Beibei Zhang, Dominik Fay, Håkan Nordström, Mark RuschinAbstract
Background
Inverse planning is often used for Gamma Knife radiosurgery, allowing clinicians to mathematically specify desired clinical objectives and dose limits. The objectives are controlled by weights that are manually tuned to find the desired trade‐off, which varies from case to case. Automation of this process can reduce clinical workload and improve consistency in plan quality.
Purpose
To train a deep reinforcement learning agent using a reward function that incorporates the clinical metrics from past plans into its scoring criteria. The metric trade‐off from the clinical plan is scored higher than all others, guiding the agent to produce plans with similar trade‐offs.
Methods
An agent was trained to adjust the two priority weights (i.e., digital slider bars) in the clinical inverse planner. The agent consists of a neural network that receives the metrics and dose distribution of the current plan and the target and organ‐at‐risk masks as inputs. These methods were demonstrated on a dataset of 204 single‐target metastases and a dataset of 71 acoustic neuroma cases. The cases were split into training, validation, and testing sets of size 123/41/40 and 42/14/15 for the metastases and acoustic neuromas, respectively.
Results
On the metastases test dataset, the agent achieved a significantly higher ( p = 0.0136) average plan score (3.925 ± 0.130) compared to the default slider plans (3.874 ± 0.147). On the acoustic neuromas test dataset, the agent achieved a higher ( p = 0.4493) average plan score (4.035 ± 0.177) compared to the default slider plans (3.995 ± 0.365). The higher plan scores are reflected in the four plan quality metrics: the agent's plans, on average, had metrics more similar to the clinical plans, compared to the default slider plans, for both test datasets.
Conclusions
The proposed reward function enabled the agent to learn to find plans that aligned with historical planning decisions. Future work will investigate providing the agent with additional inputs that can explain the variability in planning decisions, which would further improve its performance.