DDPM#

Denoising Diffusion Probabilistic Models (DDPM) implementation

This module provides a complete implementation of DDPM, as described in Ho et al. (2020, “Denoising Diffusion Probabilistic Models”). It includes components for forward and reverse diffusion processes, hyperparameter management, training, and image sampling. Supports both unconditional and conditional generation with text prompts.

Components

  • ForwardDDPM: Forward diffusion process to add noise.

  • ReverseDDPM: Reverse diffusion process to denoise.

  • SchedulerDDPM: Noise schedule management.

  • TrainDDPM: Training loop with mixed precision and scheduling.

  • SampleDDPM: Image generation from trained models.

References

  • Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models.

  • Salimans, Tim, et al. “Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications.”

arXiv preprint arXiv:1701.05517 (2017).


class torchdiff.ddpm.ForwardDDPM(*args: Any, **kwargs: Any)[source]#

Bases: Module

Forward diffusion process for DDPM.

Implements sampling from the forward noising distribution:

q(x_t | x_0) = N(√ᾱ_t x_0, (1 - ᾱ_t) I)

Also computes the appropriate training target depending on the chosen prediction parameterization (x0 or v).

Initialize the forward diffusion process.

Parameters:
  • scheduler – Noise scheduler providing diffusion coefficients.

  • pred_type – Prediction parameterization. One of {“noise”, “x0”, “v”}.

forward(x0: torch.Tensor, t: torch.Tensor, noise: torch.Tensor) Tuple[torch.Tensor, torch.Tensor][source]#

Sample a noised version of the input and compute the training target.

Parameters:
  • x0 – Clean input data of shape (batch, …).

  • t – Discrete timesteps of shape (batch,), with values in [0, T-1].

  • noise – Standard Gaussian noise of the same shape as x0.

Returns:

Noised data sampled from q(x_t | x_0). target: Training target corresponding to the selected prediction

type (x0 or v).

Return type:

xt

class torchdiff.ddpm.ReverseDDPM(*args: Any, **kwargs: Any)[source]#

Bases: Module

Reverse diffusion process for DDPM.

Implements a single reverse denoising step:

p_θ(x_{t-1} | x_t) = N(μ_θ(x_t, t), Σ_t)

Supports different prediction parameterizations (noise, x0, v) and multiple variance types (fixed or learned).

Initialize the reverse diffusion process.

Parameters:
  • scheduler – Noise scheduler providing diffusion coefficients.

  • pred_type – Model prediction parameterization. One of {“noise”, “x0”, “v”}.

  • var_type – Variance type used in the reverse process. One of {“fixed_small”, “fixed_large”, “learned”}.

  • clip_out – Whether to clip predicted x0 to a fixed range.

predict_x0(xt: torch.Tensor, t: torch.Tensor, pred: torch.Tensor) torch.Tensor[source]#

Convert the model output into a prediction of the original data x0.

Parameters:
  • xt – Current noised data x_t.

  • t – Discrete timesteps of shape (batch,).

  • pred – Model output corresponding to the selected prediction type.

Returns:

Predicted clean data x0.

get_variance(t: torch.Tensor, pred_var: torch.Tensor | None = None) torch.Tensor[source]#

Compute the variance used in the reverse diffusion step.

Parameters:
  • t – Discrete timesteps of shape (batch,).

  • pred_var – Optional model-predicted variance (required when var_type=”learned”).

Returns:

Variance tensor for the reverse transition.

forward(xt: torch.Tensor, pred: torch.Tensor, t: torch.Tensor, pred_var: torch.Tensor | None = None) Tuple[torch.Tensor, torch.Tensor | None][source]#

Perform a single reverse diffusion step from x_t to x_{t-1}.

Parameters:
  • xt – Current state x_t of shape (batch, …).

  • pred – Model prediction at timestep t.

  • t – Discrete timesteps of shape (batch,).

  • pred_var – Optional predicted variance for learned variance models.

Returns:

Sampled previous state x_{t-1}. pred_x0: Predicted clean data x0.

Return type:

x_prev

class torchdiff.ddpm.SchedulerDDPM(*args: Any, **kwargs: Any)[source]#

Bases: Module

Noise scheduler for DDPM-style diffusion models.

This class defines the discrete diffusion timeline and precomputes all noise schedule coefficients required for forward diffusion and reverse sampling, including betas, alphas, cumulative products, and posterior coefficients.

Supported schedules include linear, cosine, quadratic, and sigmoid.

The scheduler acts as the single source of truth for the diffusion horizon T and all time-dependent constants.

Initialize the DDPM noise scheduler.

Parameters:
  • schedule_type – Type of beta schedule to use. One of {“linear”, “cosine”, “quadratic”, “sigmoid”}.

  • time_steps – Number of discrete diffusion steps (T).

  • beta_min – Minimum beta value for applicable schedules.

  • beta_max – Maximum beta value for applicable schedules.

  • cosine_s – Small offset used in the cosine schedule.

  • clip_min – Minimum value for clipping betas (cosine schedule).

  • clip_max – Maximum value for clipping betas (cosine schedule).

get_index(t: torch.Tensor, x_shape: torch.Size) torch.Tensor[source]#

Reshape a timestep-dependent tensor for broadcasting over data tensors.

Parameters:
  • t – Tensor of shape (batch,) containing timestep-indexed values.

  • x_shape – Shape of the target tensor to broadcast over.

Returns:

Tensor reshaped to (batch, 1, …, 1) for broadcasting.

class torchdiff.ddpm.TrainDDPM(*args: Any, **kwargs: Any)[source]#

Bases: Module

Trainer for Denoising Diffusion Probabilistic Models (DDPM) with Multi-GPU Support.

Manages the training process for DDPM, optimizing a noise predictor model to learn the noise added by the forward diffusion process. Supports conditional training with text prompts, mixed precision training, learning rate scheduling, early stopping, checkpointing, and distributed data parallel (DDP) training across multiple GPUs.

Parameters:
  • diff_net (nn.Module) – Model to predict noise/v added during the forward diffusion process.

  • fwd_ddpm (nn.Module) – Forward DDPM diffusion module for adding noise.

  • rwd_ddpm (nn.Module) – Reverse DDPM diffusion module for denoising.

  • train_loader (torch.utils.data.DataLoader) – DataLoader for training data. Should be wrapped with DistributedSampler for DDP.

  • optim (torch.optim.Optimizer) – Optimizer for training the noise predictor and conditional model (if applicable).

  • loss_fn (callable) – Loss function to compute the difference between predicted and actual noise.

  • val_loader (torch.utils.data.DataLoader, optional) – DataLoader for validation data, default None.

  • max_epochs (int, optional) – Maximum number of training epochs (default: 100).

  • device (str) – Device for computation (default: CUDA).

  • cond_net (nn.Module, optional) – Model for conditional generation (e.g., text embeddings), default None.

  • metrics (object, optional) – Metrics object for computing MSE, PSNR, SSIM, FID, and LPIPS (default: None).

  • tokenizer (BertTokenizer, optional) – Tokenizer for processing text prompts, default None (loads “bert-base-uncased”).

  • max_token_length (int, optional) – Maximum length for tokenized prompts (default: 77).

  • store_path (str, optional) – Path to save model checkpoints (default: “ddpm_train”).

  • patience (int, optional) – Number of epochs to wait for improvement before early stopping (default: 20).

  • warmup_steps (int, optional) – Number of epochs for learning rate warmup (default: 1000).

  • val_freq (int, optional) – Frequency (in epochs) for validation (default: 10).

  • norm_range (tuple, optional) – Range for clamping generated images (default: (-1, 1)).

  • norm_output (bool, optional) – Whether to normalize generated images to [0, 1] for metrics (default: True).

  • use_ddp (bool, optional) – Whether to use Distributed Data Parallel training (default: False).

  • grad_acc (int, optional) – Number of gradient accumulation steps before optimizer update (default: 1).

  • log_freq (int, optional) – Number of epochs before printing loss.

  • use_comp (bool, optional) – whether the model is internally compiled using torch.compile (default: false)

  • use_amp (bool, optional) – Whether to use automatic mixed precision (AMP) for training (default: False). Enable only on GPUs with good fp16 support (e.g., Ampere or newer).

load_checkpoint(checkpoint_path: str) Tuple[int, float][source]#

Loads a training checkpoint to resume training.

Restores the state of the noise predictor, conditional model (if applicable), and optimizer from a saved checkpoint. Handles DDP model state dict loading.

Parameters:

checkpoint_path (str) – Path to the checkpoint file.

Returns:

  • epoch (int) – The epoch at which the checkpoint was saved.

  • loss (float) – The loss at the checkpoint.

static warmup_scheduler(optimizer: torch.optim.Optimizer, warmup_steps: int) torch.optim.lr_scheduler.LambdaLR[source]#

Creates a learning rate scheduler for warmup.

Generates a scheduler that linearly increases the learning rate from 0 to the optimizer’s initial value over the specified warmup epochs, then maintains it.

Parameters:
  • optimizer (torch.optim.Optimizer) – Optimizer to apply the scheduler to.

  • warmup_steps (int) – Number of steps for the warmup phase.

Returns:

Learning rate scheduler for warmup.

Return type:

torch.optim.lr_scheduler.LambdaLR

forward() Dict[source]#

Trains the DDPM model to predict noise added by the forward diffusion process.

Executes the training loop with support for distributed training, gradient accumulation, mixed precision, gradient clipping, and learning rate scheduling. Includes validation, early stopping, and checkpointing functionality.

Returns:

losses

Return type:

a dictionary contains train and validation losses

validate() Tuple[float, float, float, float, float, float][source]#

Validates the noise predictor and computes evaluation metrics.

Computes validation loss (MSE between predicted and ground truth noise) and generates samples using the reverse diffusion model. Evaluates image quality metrics if available.

Returns:

(val_loss, fid, mse, psnr, ssim, lpips_score) where metrics may be None if not computed.

Return type:

tuple

class torchdiff.ddpm.SampleDDPM(*args: Any, **kwargs: Any)[source]#

Bases: Module

mage generation using a trained Denoising Diffusion Probabilistic Model (DDPM).

Implements the sampling process for DDPM, generating images by iteratively denoising random noise using a trained noise predictor and reverse diffusion process. Supports conditional generation with text prompts via a conditional model, as inspired by Ho et al. (2020).

Parameters:
  • rwd_ddpm (nn.Module) – Reverse diffusion module (e.g., ReverseDDPM) for the reverse process.

  • diff_net (nn.Module) – Trained model to predict noise at each time step.

  • img_size (tuple) – Tuple of (height, width) specifying the generated image dimensions.

  • cond_model (nn.Module, optional) – Model for conditional generation (e.g., text embeddings), default None.

  • tokenizer (str, optional) – Pretrained tokenizer name from Hugging Face (default: “bert-base-uncased”).

  • max_token_length (int, optional) – Maximum length for tokenized prompts (default: 77).

  • batch_size (int, optional) – Number of images to generate per batch (default: 1).

  • in_channels (int, optional) – Number of input channels for generated images (default: 3).

  • device (str, device type) – Device for computation (default: CUDA).

  • norm_range (tuple, optional) – Tuple of (min, max) for clamping generated images (default: (-1, 1)).

tokenize(prompts: List | str) Tuple[torch.Tensor, torch.Tensor][source]#

Tokenizes text prompts for conditional generation.

Converts input prompts into tokenized input IDs and attention masks using the specified tokenizer, suitable for use with the conditional model.

Parameters:

prompts (str or list) – A single text prompt or a list of text prompts.

Returns:

  • input_ids (torch.Tensor) – Tokenized input IDs, shape (batch_size, max_length).

  • attention_mask (torch.Tensor) – Attention mask, shape (batch_size, max_length).

forward(conds: str | List | None = None, norm_output: bool = True, save_imgs: bool = True, save_path: str = 'ddpm_samples') torch.Tensor[source]#

Generates images using the DDPM sampling process.

Iteratively denoises random noise to generate images using the reverse diffusion process and noise predictor. Supports conditional generation with text prompts. Optionally saves generated images to a specified directory.

Parameters:
  • conds (str or list, optional) – Text prompt(s) for conditional generation, default None.

  • norm_output (bool, optional) – If True, normalizes output images to [0, 1] (default: True).

  • save_imgs (bool, optional) – If True, saves generated images to save_path (default: True).

  • save_path (str, optional) – Directory to save generated images (default: “ddpm_samples”).

Returns:

  • samps (torch.Tensor) - Generated images, shape (batch_size, in_channels, height, width).

  • If norm_output is True, images are normalized to [0, 1]; otherwise, they are clamped to norm_range.

to(device: torch.device) Self[source]#

Moves the module and its components to the specified device.

Updates the device attribute and moves the reverse diffusion, noise predictor, and conditional model (if present) to the specified device.

Parameters:

device (torch.device) – Target device for the module and its components.

Return type:

sample_ddpm (SampleDDPM) - moved to the specified device.