metadata

title: RTMO Checkpoint Tester
emoji: 👀
colorFrom: pink
colorTo: green
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: RTMO PyTorch Checkpoint Tester

RTMO PyTorch Checkpoint Tester

This HuggingFace Space provides a real-time 2D multi-person pose estimation demo using the RTMO model from OpenMMLab, accelerated with ZeroGPU. It supports both image and video inputs.

Features

Remote Checkpoint Selection: Choose from multiple pre-trained variants (COCO, BODY7, CrowdPose, retrainable RTMO-s) via a dropdown.
Custom Checkpoint Upload: Upload your own .pth file; the application auto-detects RTMO-t/s/m/l variants.
Image Input: Upload images for single-frame pose estimation.
Video Input: Upload video files (e.g., .mp4, .mov, .avi, .mkv, .webm) to perform pose estimation on video sequences and view annotated outputs.
Threshold Adjustment: Fine-tune Bounding Box Threshold and NMS Threshold sliders to refine detections.
Example Images: Three license-free images with people are included for quick testing via the Examples panel.
ZeroGPU Acceleration: Utilizes the @spaces.GPU() decorator for GPU inference on HuggingFace Spaces.

Usage

Upload Image: Drag-and-drop or select an image in the Upload Image component (or choose from Examples).
Upload Video: Drag-and-drop or select a video file in the Upload Video component.
Select Remote Checkpoint: Pick a preloaded variant from the dropdown menu.
(Optional) Upload Your Own Checkpoint: Provide a .pth file to override the remote selection; the model variant is detected automatically.
Adjust Thresholds: Set Bounding Box Threshold (bbox_thr) and NMS Threshold (nms_thr) to control confidence and suppression behavior.
Run Inference: Click Run Inference.
View Results:
- For images, the annotated image will appear in the Annotated Image panel.
- For videos, the annotated video will appear in the Annotated Video panel. The active checkpoint name will appear below.

Remote Checkpoints

The following variants are available out of the box:

rtmo-s_8xb32-600e_coco
rtmo-m_16xb16-600e_coco
rtmo-l_16xb16-600e_coco
rtmo-t_8xb32-600e_body7
rtmo-s_8xb32-600e_body7
rtmo-m_16xb16-600e_body7
rtmo-l_16xb16-600e_body7
rtmo-s_8xb32-700e_crowdpose
rtmo-m_16xb16-700e_crowdpose
rtmo-l_16xb16-700e_crowdpose
rtmo-s_coco_retrainable (from Hugging Face)

Implementation Details

GPU Decorator: @spaces.GPU() marks the predict function for GPU execution under ZeroGPU.
Inference API: Leverages MMPoseInferencer from MMPose with pose2d, pose2d_weights, and category [0] for person detection.
Monkey-Patch: Applies a regex patch to bypass mmdet’s MMCV version assertion for compatibility.
Variant Detection: Inspects backbone.stem.conv.conv.weight channels in the checkpoint to select the correct RTMO variant.
Checkpoint Management: Remote files are downloaded to /tmp/{key}.pth on demand; uploads use the provided local path.
Image & Video Support: The predict function automatically handles both image and video inputs, saving annotated frames or video to /tmp/vis and displaying them in the UI.
Output: Saves visualization images or videos to /tmp/vis and displays them in the UI panels.

Files

app.py: Main Gradio application script.
requirements.txt: Python dependencies, including MMCV and MMPose.
README.md: This documentation file.