DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays
In Submission

Abstract

overview

Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility.

DIFR3CT

scales

DIFR3CT consists of two parts:

(a) Feature fusion of multi-view X-rays: We extract a feature image Wk from each input planar x-ray Xk with a 2D U-Net. We then re-project Wk back into 3D space using known x-ray imaging acquisition settings. We average all re-projected feature volumes into one feature volume Favg.

(b) 3D conditional latent diffusion model: During training, each CT volume is encoded into a latent code Z0 using a pretrained encoder. We train a time-conditioned 3D denoising U-Net to take a random noisy latent code Zt and conditioning signal Favg, and output a partially denoised code Zt-1. After T steps, the predicted code Ẑ0 is reconstructed into a CT volume using a pretrained decoder.

Quantitative Results

Comparison of DIFR3CT with baselines on the Thoracic Dataset, given biplanar x-ray inputs

(From left to right: Ground Truth, NAF [1], 3D Diffusion [2], X2CT-GAN [3], INRR3CT [4], DIFR3CT)

scales

Related links

DIFR3CT was implemented on top of the Video VQGAN and Video Diffusion Models codebases.

Our previous work, INRR3CT: CT Reconstruction from Few Planar X-Rays with Application Towards Low-Resource Radiotherapy, is here.

Citation

If you find the paper useful in your research, please cite the paper:

Acknowledgements

The website template was borrowed from Michaƫl Gharbi.