rawalkhirodkar's picture
Add initial commit
28c256d

A newer version of the Gradio SDK is available: 5.29.0

Upgrade

Code Structure of CUDA operators

This folder contains all non-python code for MMCV custom ops. Please follow the same architecture if you want to add new ops.

Directories Tree

.
β”œβ”€β”€ common
β”‚   β”œβ”€β”€ box_iou_rotated_utils.hpp
β”‚   β”œβ”€β”€ parrots_cpp_helper.hpp
β”‚   β”œβ”€β”€ parrots_cuda_helper.hpp
β”‚   β”œβ”€β”€ pytorch_cpp_helper.hpp
β”‚   β”œβ”€β”€ pytorch_cuda_helper.hpp
β”‚   β”œβ”€β”€ pytorch_device_registry.hpp
β”‚   β”œβ”€β”€ cuda
β”‚   β”‚   β”œβ”€β”€ common_cuda_helper.hpp
β”‚   β”‚   β”œβ”€β”€ parrots_cudawarpfunction.cuh
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── ops_cuda_kernel.cuh
|   β”œβ”€β”€ mps
β”‚   β”‚   β”œβ”€β”€ MPSLibrary.h
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── MPSUtils.h
|   β”œβ”€β”€ mlu
β”‚   β”‚   └── ...
|   └── utils
β”‚   β”‚   └── ...
β”œβ”€β”€ parrots
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ ops.cpp
β”‚   β”œβ”€β”€ ops_parrots.cpp
β”‚   └── ops_pytorch.h
└── pytorch
    β”œβ”€β”€ info.cpp
    β”œβ”€β”€ pybind.cpp
    β”œβ”€β”€ ...
    β”œβ”€β”€ ops.cpp
    β”œβ”€β”€ cuda
    β”‚   β”œβ”€β”€ ...
    β”‚   └── ops_cuda.cu
    β”œβ”€β”€ cpu
    β”‚   β”œβ”€β”€ ...
    β”‚   └── ops.cpp
    β”œβ”€β”€ mps
    β”‚   β”œβ”€β”€ ...
    |   └── op_mps.mm
    └── mlu
        β”œβ”€β”€ ...
        └── op_mlu.cpp

Components

  • common: This directory contains all tools and shared codes.
    • cuda: The cuda kernels which can be shared by all backends. HIP kernel is also here since they have similar syntax.
    • mps: The tools used to support MPS ops. NOTE that MPS support is experimental.
    • mlu: The MLU kernels used to support Cambricon device.
    • utils: The kernels and utils of spconv.
  • parrots: Parrots is a deep learning frame for model training and inference. Parrots custom ops are placed in this directory.
  • pytorch: PyTorch custom ops are supported by binding C++ to Python with pybind11. The ops implementation and binding codes are placed in this directory.
    • cuda: This directory contains cuda kernel launchers, which feed memory pointers of tensor to the cuda kernel in common/cuda. The launchers provide c++ interface of cuda implementation of corresponding custom ops.
    • cpu: This directory contain cpu implementations of corresponding custom ops.
    • mlu: This directory contain launchers of each MLU kernels.
    • mps: MPS ops implementation and launchers.

How to add new PyTorch ops?

  1. (Optional) Add shared kernel in common to support special hardware platform.

    // src/common/cuda/new_ops_cuda_kernel.cuh
    
    template <typename T>
    __global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) {
        // forward here
    }
    

    Add cuda kernel launcher in pytorch/cuda.

    // src/pytorch/cuda
    #include <new_ops_cuda_kernel.cuh>
    
    void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){
        // initialize
        at::cuda::CUDAGuard device_guard(input.device());
        cudaStream_t stream = at::cuda::getCurrentCUDAStream();
        ...
        AT_DISPATCH_FLOATING_TYPES_AND_HALF(
            input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] {
                new_ops_forward_cuda_kernel<scalar_t>
                    <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
                        input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...);
            }));
        AT_CUDA_CHECK(cudaGetLastError());
    }
    
  2. Register implementation for different devices.

    // src/pytorch/cuda/cudabind.cpp
    ...
    
    Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){
        // implement cuda forward here
        // use `NewOpsForwardCUDAKernelLauncher` here
    }
    // declare interface here.
    Tensor new_ops_forward_impl(Tensor input, Tensor output, ...);
    // register the implementation for given device (CUDA here).
    REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda);
    
  3. Add ops implementation in pytorch directory. Select different implementations according to device type.

    // src/pytorch/new_ops.cpp
    Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){
        // dispatch the implementation according to the device type of input.
        DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...);
    }
    ...
    
    Tensor new_ops_forward(Tensor input, Tensor output, ...){
        return new_ops_forward_impl(input, output, ...);
    }
    
  4. Binding the implementation in pytorch/pybind.cpp

    // src/pytorch/pybind.cpp
    
    ...
    
    Tensor new_ops_forward(Tensor input, Tensor output, ...);
    
    ...
    
    // bind with pybind11
    m.def("new_ops_forward", &new_ops_forward, "new_ops_forward",
            py::arg("input"), py::arg("output"), ...);
    
    ...
    
  5. Build MMCV again. Enjoy new ops in python

    from ..utils import ext_loader
    ext_module = ext_loader.load_ext('_ext', ['new_ops_forward'])
    
    ...
    
    ext_module.new_ops_forward(input, output, ...)