Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion

Sathwik Karnik1*, Juyeop Kim2*, Sanmi Koyejo1, Jong-Seok Lee2, Somil Bansal1
1Stanford University 2Yonsei University

Abstract

Text-to-image diffusion models often memorize training data, revealing a fundamental failure to generalize beyond the training set. Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization. To address this, we propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity. RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube" - the set of intermediate states that inevitably evolve into memorized samples. We then formulate mitigation as a constrained reinforcement learning (RL) problem, where a policy learns to steer the trajectory away from memorization via minimal perturbations in the caption embedding space. Empirical evaluations show that RADS achieves a superior Pareto frontier between generation diversity (SSCD), quality (FID), and alignment (CLIP) compared to state-of-the-art baselines. Crucially, RADS provides robust mitigation without modifying the diffusion backbone, offering a plug-and-play solution for safe generation.

Motivation (part 1)

Key Findings

Treating the diffusion denoising process as a dynamical system, RADS leverages reachability analysis from control theory to provide key capabilities for mitigating memorization in text-to-image generation:

  1. 1 RADS successfully prevents the reproduction of memorized training data entirely at inference time.
  2. 2 By applying minimal perturbations in the continuous caption embedding space, it serves as a plug-and-play solution that requires no destructive modifications or fine-tuning of the pre-trained diffusion weights.

Examples

RADS achieves mitigation with the highest generation diversity.

Prompt: "Michael Fassbender to Star In Assassin's Creed Movie"

None

Memorized

Wen

Wen et al. (2024)

Ren

Ren et al. (2024)

Hintersdorf

Hintersdorf et al. (2024)

Jain

Jain et al. (2025)

Ours

RADS (ours)

RADS achieves mitigation on challenging prompts.

Prompt: "Bloodborne Video: Sony Explains the Game's Procedurally Generated Dungeons"

None

Memorized

Wen

Wen et al. (2024)

Ren

Ren et al. (2024)

Hintersdorf

Hintersdorf et al. (2024)

Jain

Jain et al. (2025)

Ours

RADS (ours)