Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
Masatoshi Uehara (*co-first), Xingyu Su (*co-first), Yulai Zhao, Xiner Li,
Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani
Genentech, Texas A&M University, Princeton University, UC Berkley
Abstract
To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Besides, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and cell-type-specific regulatory DNA design.
Figure: Overview of Algorithms: The two GIFs illustrate example trajectories for protein design optimization, where the objective is to minimize either the cRMSD (left) or maximize secondary structure matching (right) to align with the target structure.
Proposals: Reward-Guided Evolutionary Refinement in Diffusion models
Our algorithm optimizes the reward function while preserving sequence naturalness, as characterized by pre-trained diffusion models. Unlike existing single-shot guided approaches (e.g., classifier guidance, SMC-based methods), our method employs an iterative refinement strategy inspired by evolutionary algorithms. Specifically, it alternates between derivative-free reward-guided denoising and noising, enabling the optimization of complex reward functions that single-shot generation struggles with. The algorithm is outlined in Figure 1. Our approach can be seen as a variant of directed evolution-type algorithms, where candidate generation is driven by reward-guided denoising within diffusion models.
Figure 1: We instantiate it within masked diffusion models (but this method can be applied to any diffusion model). It alternates reward-guided denoising and noising. A practical example of reward-guided denoising is described in Figure 2.
Figure 2: Examples of reward-guided denoising in Figure 1. Value functions are look-ahead functions that predict future rewards at time 0 from intermediate states. Check out the SVDD paper.
Applications in Protein Design
We integrate our methods into discrete diffusion models (e.g., EvoDiff) for protein sequences by leveraging reward models (i.e., seq → target property) at test time for computational protein design. We present results on optimizing several fundamental structural rewards, including symmetry, globularity, match_ss, and crmsd. We can further optimize ptm,plddt,tm,lddt, hydrophobic, and surface_expose. All rewards are defined based on the outputs of a sequence-to-structure model using ESMFold. Below, we visualize examples of the generated sequences, where ESMFold is used to predict their structures.
The following is an example of the trajectory during the refinement process. Our algorithm can continuously refine the outputs by gradually correcting errors introduced during reward-guided denoising, improving the design over successive iterations. For instance, for the task of optimizing the similarity (RMSD) of a protein to a target structure, we can progressively minimize the RMSD through refinement, optimizing the design from an initial fit (obtained by existing single-shot generation methods) to a better final fit, as shown on the right.