Spaces:
Running
Running
# Code Structure of CUDA operators | |
This folder contains all non-python code for MMCV custom ops. Please follow the same architecture if you want to add new ops. | |
## Directories Tree | |
```folder | |
. | |
βββ common | |
β βββ box_iou_rotated_utils.hpp | |
β βββ parrots_cpp_helper.hpp | |
β βββ parrots_cuda_helper.hpp | |
β βββ pytorch_cpp_helper.hpp | |
β βββ pytorch_cuda_helper.hpp | |
β βββ pytorch_device_registry.hpp | |
βΒ Β βββ cuda | |
βΒ Β β βββ common_cuda_helper.hpp | |
βΒ Β β βββ parrots_cudawarpfunction.cuh | |
βΒ Β β βββ ... | |
βΒ Β β βββ ops_cuda_kernel.cuh | |
|Β Β βββ mps | |
βΒ Β β βββ MPSLibrary.h | |
βΒ Β β βββ ... | |
βΒ Β β βββ MPSUtils.h | |
|Β Β βββ mlu | |
βΒ Β β βββ ... | |
|Β Β βββ utils | |
βΒ Β β βββ ... | |
βββ parrots | |
βΒ Β βββ ... | |
βΒ Β βββ ops.cpp | |
βΒ Β βββ ops_parrots.cpp | |
βΒ Β βββ ops_pytorch.h | |
βββ pytorch | |
Β Β Β βββ info.cpp | |
Β Β Β βββ pybind.cpp | |
Β Β Β βββ ... | |
Β Β Β βββ ops.cpp | |
Β Β Β βββ cuda | |
Β Β Β βΒ Β βββ ... | |
Β Β Β βΒ Β βββ ops_cuda.cu | |
Β Β Β βββ cpu | |
Β Β Β βΒ Β βββ ... | |
Β Β Β βΒ Β βββ ops.cpp | |
Β Β Β βββ mps | |
Β Β Β βΒ Β βββ ... | |
Β Β Β |Β Β βββ op_mps.mm | |
Β Β Β βββ mlu | |
Β Β Β Β Β βββ ... | |
Β Β Β Β Β βββ op_mlu.cpp | |
``` | |
## Components | |
- `common`: This directory contains all tools and shared codes. | |
- `cuda`: The cuda kernels which can be shared by all backends. **HIP** kernel is also here since they have similar syntax. | |
- `mps`: The tools used to support MPS ops. **NOTE** that MPS support is **experimental**. | |
- `mlu`: The MLU kernels used to support [Cambricon](https://www.cambricon.com/) device. | |
- `utils`: The kernels and utils of spconv. | |
- `parrots`: **Parrots** is a deep learning frame for model training and inference. Parrots custom ops are placed in this directory. | |
- `pytorch`: **PyTorch** custom ops are supported by binding C++ to Python with **pybind11**. The ops implementation and binding codes are placed in this directory. | |
- `cuda`: This directory contains cuda kernel launchers, which feed memory pointers of tensor to the cuda kernel in `common/cuda`. The launchers provide c++ interface of cuda implementation of corresponding custom ops. | |
- `cpu`: This directory contain cpu implementations of corresponding custom ops. | |
- `mlu`: This directory contain launchers of each MLU kernels. | |
- `mps`: MPS ops implementation and launchers. | |
## How to add new PyTorch ops? | |
1. (Optional) Add shared kernel in `common` to support special hardware platform. | |
```c++ | |
// src/common/cuda/new_ops_cuda_kernel.cuh | |
template <typename T> | |
__global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) { | |
// forward here | |
} | |
``` | |
Add cuda kernel launcher in `pytorch/cuda`. | |
```c++ | |
// src/pytorch/cuda | |
#include <new_ops_cuda_kernel.cuh> | |
void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){ | |
// initialize | |
at::cuda::CUDAGuard device_guard(input.device()); | |
cudaStream_t stream = at::cuda::getCurrentCUDAStream(); | |
... | |
AT_DISPATCH_FLOATING_TYPES_AND_HALF( | |
input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] { | |
new_ops_forward_cuda_kernel<scalar_t> | |
<<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>( | |
input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...); | |
})); | |
AT_CUDA_CHECK(cudaGetLastError()); | |
} | |
``` | |
2. Register implementation for different devices. | |
```c++ | |
// src/pytorch/cuda/cudabind.cpp | |
... | |
Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){ | |
// implement cuda forward here | |
// use `NewOpsForwardCUDAKernelLauncher` here | |
} | |
// declare interface here. | |
Tensor new_ops_forward_impl(Tensor input, Tensor output, ...); | |
// register the implementation for given device (CUDA here). | |
REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda); | |
``` | |
3. Add ops implementation in `pytorch` directory. Select different implementations according to device type. | |
```c++ | |
// src/pytorch/new_ops.cpp | |
Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){ | |
// dispatch the implementation according to the device type of input. | |
DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...); | |
} | |
... | |
Tensor new_ops_forward(Tensor input, Tensor output, ...){ | |
return new_ops_forward_impl(input, output, ...); | |
} | |
``` | |
4. Binding the implementation in `pytorch/pybind.cpp` | |
```c++ | |
// src/pytorch/pybind.cpp | |
... | |
Tensor new_ops_forward(Tensor input, Tensor output, ...); | |
... | |
// bind with pybind11 | |
m.def("new_ops_forward", &new_ops_forward, "new_ops_forward", | |
py::arg("input"), py::arg("output"), ...); | |
... | |
``` | |
5. Build MMCV again. Enjoy new ops in python | |
```python | |
from ..utils import ext_loader | |
ext_module = ext_loader.load_ext('_ext', ['new_ops_forward']) | |
... | |
ext_module.new_ops_forward(input, output, ...) | |
``` | |