Optimum documentation
GaudiStableDiffusionPipeline
GaudiStableDiffusionPipeline
The GaudiStableDiffusionPipeline
class enables to perform text-to-image generation on HPUs.
It inherits from the GaudiDiffusionPipeline
class that is the parent to any kind of diffuser pipeline.
To get the most out of it, it should be associated with a scheduler that is optimized for HPUs like GaudiDDIMScheduler
.
GaudiStableDiffusionPipeline
class optimum.habana.diffusers.GaudiStableDiffusionPipeline
< source >( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: KarrasDiffusionSchedulers safety_checker: StableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor requires_safety_checker: bool = True use_habana: bool = False use_hpu_graphs: bool = False gaudi_config: typing.Union[str, optimum.habana.transformers.gaudi_configuration.GaudiConfig] = None bf16_full_eval: bool = False )
Parameters
-
vae (
AutoencoderKL
) — Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations. - text_encoder (CLIPTextModel) — Frozen text-encoder (clip-vit-large-patch14).
-
tokenizer (
~transformers.CLIPTokenizer
) — ACLIPTokenizer
to tokenize text. -
unet (
UNet2DConditionModel
) — AUNet2DConditionModel
to denoise the encoded image latents. -
scheduler (
SchedulerMixin
) — A scheduler to be used in combination withunet
to denoise the encoded image latents. Can be one ofDDIMScheduler
,LMSDiscreteScheduler
, orPNDMScheduler
. -
safety_checker (
StableDiffusionSafetyChecker
) — Classification module that estimates whether generated images could be considered offensive or harmful. Please refer to the model card for more details about a model’s potential harms. -
feature_extractor (CLIPImageProcessor) —
A
CLIPImageProcessor
to extract features from generated images; used as inputs to thesafety_checker
. -
use_habana (bool, defaults to
False
) — Whether to use Gaudi (True
) or CPU (False
). -
use_hpu_graphs (bool, defaults to
False
) — Whether to use HPU graphs or not. -
gaudi_config (Union[str, GaudiConfig], defaults to
None
) — Gaudi configuration to use. Can be a string to download it from the Hub. Or a previously initialized config can be passed. -
bf16_full_eval (bool, defaults to
False
) — Whether to use full bfloat16 evaluation instead of 32-bit. This will be faster and save memory compared to fp32/mixed precision but can harm generated images.
Extends the StableDiffusionPipeline
class:
- Generation is performed by batches
- Two
mark_step()
were added to add support for lazy mode - Added support for HPU graphs
__call__
< source >(
prompt: typing.Union[str, typing.List[str]] = None
height: typing.Optional[int] = None
width: typing.Optional[int] = None
num_inference_steps: int = 50
guidance_scale: float = 7.5
negative_prompt: typing.Union[typing.List[str], str, NoneType] = None
num_images_per_prompt: typing.Optional[int] = 1
batch_size: int = 1
eta: float = 0.0
generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None
latents: typing.Optional[torch.FloatTensor] = None
prompt_embeds: typing.Optional[torch.FloatTensor] = None
negative_prompt_embeds: typing.Optional[torch.FloatTensor] = None
output_type: typing.Optional[str] = 'pil'
return_dict: bool = True
callback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = None
callback_steps: int = 1
cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None
guidance_rescale: float = 0.0
)
→
GaudiStableDiffusionPipelineOutput
or tuple
Parameters
-
prompt (
str
orList[str]
, optional) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. -
height (
int
, optional, defaults toself.unet.config.sample_size * self.vae_scale_factor
) — The height in pixels of the generated images. -
width (
int
, optional, defaults toself.unet.config.sample_size * self.vae_scale_factor
) — The width in pixels of the generated images. -
num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. -
guidance_scale (
float
, optional, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. -
negative_prompt (
str
orList[str]
, optional) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). -
num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt. -
batch_size (
int
, optional, defaults to 1) — The number of images in a batch. -
eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to the~schedulers.DDIMScheduler
, and is ignored in other schedulers. -
generator (
torch.Generator
orList[torch.Generator]
, optional) — Atorch.Generator
to make generation deterministic. -
latents (
torch.FloatTensor
, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. -
prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. -
negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. -
output_type (
str
, optional, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. -
return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return aGaudiStableDiffusionPipelineOutput
instead of a plain tuple. -
callback (
Callable
, optional) — A function that calls everycallback_steps
steps during inference. The function is called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
. -
callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function is called. If not specified, the callback is called at every step. -
cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. -
guidance_rescale (
float
, optional, defaults to 0.7) — Guidance rescale factor from Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.
Returns
GaudiStableDiffusionPipelineOutput
or tuple
If return_dict
is True
, ~diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
“not-safe-for-work” (nsfw) content.
The call function to the pipeline for generation.
GaudiDiffusionPipeline
class optimum.habana.diffusers.GaudiDiffusionPipeline
< source >( use_habana: bool = False use_hpu_graphs: bool = False gaudi_config: typing.Union[str, optimum.habana.transformers.gaudi_configuration.GaudiConfig] = None bf16_full_eval: bool = False )
Parameters
-
use_habana (bool, defaults to
False
) — Whether to use Gaudi (True
) or CPU (False
). -
use_hpu_graphs (bool, defaults to
False
) — Whether to use HPU graphs or not. -
gaudi_config (Union[str, GaudiConfig], defaults to
None
) — Gaudi configuration to use. Can be a string to download it from the Hub. Or a previously initialized config can be passed. -
bf16_full_eval (bool, defaults to
False
) — Whether to use full bfloat16 evaluation instead of 32-bit. This will be faster and save memory compared to fp32/mixed precision but can harm generated images.
Extends the DiffusionPipeline
class:
- The pipeline is initialized on Gaudi if
use_habana=True
. - The pipeline’s Gaudi configuration is saved and pushed to the hub.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] **kwargs )
More information here.
save_pretrained
< source >( save_directory: typing.Union[str, os.PathLike] safe_serialization: bool = True variant: typing.Optional[str] = None push_to_hub: bool = False **kwargs )
Parameters
-
save_directory (
str
oros.PathLike
) — Directory to which to save. Will be created if it doesn’t exist. -
safe_serialization (
bool
, optional, defaults toTrue
) — Whether to save the model usingsafetensors
or the traditional PyTorch way (that usespickle
). -
variant (
str
, optional) — If specified, weights are saved in the format pytorch_model..bin. -
push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). -
kwargs (
Dict[str, Any]
, optional) — Additional keyword arguments passed along to the~utils.PushToHubMixin.push_to_hub
method.
Save the pipeline and Gaudi configurations. More information here.
GaudiDDIMScheduler
class optimum.habana.diffusers.GaudiDDIMScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None clip_sample: bool = True set_alpha_to_one: bool = True steps_offset: int = 0 prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 clip_sample_range: float = 1.0 sample_max_value: float = 1.0 timestep_spacing: str = 'leading' rescale_betas_zero_snr: bool = False )
Parameters
-
num_train_timesteps (
int
, defaults to 1000) — The number of diffusion steps to train the model. -
beta_start (
float
, defaults to 0.0001) — The startingbeta
value of inference. -
beta_end (
float
, defaults to 0.02) — The finalbeta
value. -
beta_schedule (
str
, defaults to"linear"
) — The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
,scaled_linear
, orsquaredcos_cap_v2
. -
trained_betas (
np.ndarray
, optional) — Pass an array of betas directly to the constructor to bypassbeta_start
andbeta_end
. -
clip_sample (
bool
, defaults toTrue
) — Clip the predicted sample for numerical stability. -
clip_sample_range (
float
, defaults to 1.0) — The maximum magnitude for sample clipping. Valid only whenclip_sample=True
. -
set_alpha_to_one (
bool
, defaults toTrue
) — Each diffusion step uses the alphas product value at that step and at the previous one. For the final step there is no previous alpha. When this option isTrue
the previous alpha product is fixed to1
, otherwise it uses the alpha value at step 0. -
steps_offset (
int
, defaults to 0) — An offset added to the inference steps. You can use a combination ofoffset=1
andset_alpha_to_one=False
to make the last step use step 0 for the previous alpha product like in Stable Diffusion. -
prediction_type (
str
, defaults toepsilon
, optional) — Prediction type of the scheduler function; can beepsilon
(predicts the noise of the diffusion process),sample
(directly predicts the noisy sample) or
v_prediction` (see section 2.4 of Imagen Video paper). -
thresholding (
bool
, defaults toFalse
) — Whether to use the “dynamic thresholding” method. This is unsuitable for latent-space diffusion models such as Stable Diffusion. -
dynamic_thresholding_ratio (
float
, defaults to 0.995) — The ratio for the dynamic thresholding method. Valid only whenthresholding=True
. -
sample_max_value (
float
, defaults to 1.0) — The threshold value for dynamic thresholding. Valid only whenthresholding=True
. -
timestep_spacing (
str
, defaults to"leading"
) — The way the timesteps should be scaled. Refer to Table 2 of the Common Diffusion Noise Schedules and Sample Steps are Flawed for more information. -
rescale_betas_zero_snr (
bool
, defaults toFalse
) — Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to--offset_noise
.
Extends Diffusers’ DDIMScheduler to run optimally on Gaudi:
- All time-dependent parameters are generated at the beginning
- At each time step, tensors are rolled to update the values of the time-dependent parameters
step
< source >(
model_output: FloatTensor
sample: FloatTensor
eta: float = 0.0
use_clipped_model_output: bool = False
generator = None
variance_noise: typing.Optional[torch.FloatTensor] = None
return_dict: bool = True
)
→
diffusers.schedulers.scheduling_utils.DDIMSchedulerOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
) — The direct output from learned diffusion model. -
timestep (
float
) — The current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
) — A current instance of a sample created by the diffusion process. -
eta (
float
) — The weight of noise for added noise in diffusion step. -
use_clipped_model_output (
bool
, defaults toFalse
) — IfTrue
, computes “corrected”model_output
from the clipped predicted original sample. Necessary because predicted original sample is clipped to [-1, 1] whenself.config.clip_sample
isTrue
. If no clipping has happened, “corrected”model_output
would coincide with the one provided as input anduse_clipped_model_output
has no effect. -
generator (
torch.Generator
, optional) — A random number generator. -
variance_noise (
torch.FloatTensor
) — Alternative to generating noise withgenerator
by directly providing the noise for the variance itself. Useful for methods such asCycleDiffusion
. -
return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return aDDIMSchedulerOutput
ortuple
.
Returns
diffusers.schedulers.scheduling_utils.DDIMSchedulerOutput
or tuple
If return_dict is True
, DDIMSchedulerOutput
is returned, otherwise a
tuple is returned where the first element is the sample tensor.
Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion process from the learned model outputs (most often the predicted noise).