Optimum documentation

Creating a RyzenAIOnnxQuantizer

You are viewing v1.17.1 version. A newer version v1.24.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

🤗 Optimum AMD provides a RyzenAI Quantizer that enables you to apply quantization on many models hosted on the Hugging Face Hub using the AMD Vitis AI Quantizer.

RyzenAI Quantizer provides an easy-to-use Post Training Quantization (PTQ) flow on the pre-trained model saved in the ONNX format. It generates a quantized ONNX model ready to be deployed with the Ryzen AI.

The Quantizer supports various configuration and functions to quantize models targeting for deployment on IPU_CNN, IPU_Transformer and CPU.

Creating a RyzenAIOnnxQuantizer

Initialize the RyzenAIOnnxQuantizer using the from_pretrained() method:

    from optimum.amd.ryzenai import RyzenAIOnnxQuantizer

    quantizer = RyzenAIOnnxQuantizer.from_pretrained("path/to/model")

Quantization example

Below you will find an easy end-to-end example on how to quantize a VGG model from Timm library.

  • To begin, export the VGG model to ONNX using Optimum Exporters. Ensure static shapes are specified for inference.
  • Create a preprocessing function to handle specific image format conversions and apply necessary transformations to prepare the input for the model.
  • Initialize the RyzenAI quantizer (RyzenAIOnnxQuantizer) and configure the quantization settings using AutoQuantizationConfig. The recommended quantization configuration for CNN models to be deployed on the IPU is loaded using ipu_cnn_config.
  • Obtain a calibration dataset using the quantizer’s get_calibration_dataset method. This dataset is crucial for computing quantization parameters during the quantization process.
  • Run the quantizer with the specified quantization configuration and calibration data. The quantization parameters computed during this process are embedded as constants in the quantized model.
  • The resulting quantized model is saved in the specified quantization directory.
    from functools import partial
    import timm

    from optimum.amd.ryzenai import AutoQuantizationConfig, RyzenAIOnnxQuantizer
    from optimum.exporters.onnx import main_export
    from transformers import PretrainedConfig

    # Define paths for exporting ONNX model and saving quantized model
    export_dir = "/path/to/vgg_onnx"
    quantization_dir = "/path/to/vgg_onnx_quantized"

    # Specify the model ID from Timm
    model_id = "timm/vgg11.tv_in1k"

    # Step 1: Export the model to ONNX format using Optimum Exporters
    main_export(
        model_name_or_path=model_id,
        output=export_dir,
        task="image-classification",
        opset=13,
        batch_size=1,
        no_dynamic_axes=True,
    )

    # Step 2: Preprocess configuration and data transformations
    config = PretrainedConfig.from_pretrained(export_dir)
    data_config = timm.data.resolve_data_config(pretrained_cfg=config.pretrained_cfg)
    transforms = timm.data.create_transform(**data_config, is_training=False)

    def preprocess_fn(ex, transforms):
        image = ex["image"]
        if image.mode == "L":
            # Convert greyscale to RGB if needed
            print("WARNING: converting greyscale to RGB")
            image = image.convert("RGB")
        pixel_values = transforms(image)
        return {"pixel_values": pixel_values}

    # Step 3: Initialize the RyzenAIOnnxQuantizer with the exported model
    quantizer = RyzenAIOnnxQuantizer.from_pretrained(export_dir)

    # Step 4: Load recommended quantization config for model
    quantization_config = AutoQuantizationConfig.ipu_cnn_config()

    # Step 5: Obtain a calibration dataset for computing quantization parameters
    train_calibration_dataset = quantizer.get_calibration_dataset(
        "imagenet-1k",
        preprocess_function=partial(preprocess_fn, transforms=transforms),
        num_samples=100,
        dataset_split="train",
        preprocess_batch=False,
        streaming=True,
    )

    # Step 6: Run the quantizer with the specified configuration and calibration data
    quantizer.quantize(
        quantization_config=quantization_config,
        dataset=train_calibration_dataset,
        save_dir=quantization_dir
    )