Luigi's picture
update README
f07e38a

A newer version of the Gradio SDK is available: 5.30.0

Upgrade
metadata
title: RTMO Checkpoint Tester
emoji: πŸ‘€
colorFrom: pink
colorTo: green
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: RTMO PyTorch Checkpoint Tester

RTMO PyTorch Checkpoint Tester

This HuggingFace Space provides a real-time 2D multi-person pose estimation demo using the RTMO model from OpenMMLab, accelerated with ZeroGPU. It supports both image and video inputs.

Features

  • Remote Checkpoint Selection: Choose from multiple pre-trained variants (COCO, BODY7, CrowdPose, retrainable RTMO-s) via a dropdown.
  • Custom Checkpoint Upload: Upload your own .pth file; the application auto-detects RTMO-t/s/m/l variants.
  • Image Input: Upload images for single-frame pose estimation.
  • Video Input: Upload video files (e.g., .mp4, .mov, .avi, .mkv, .webm) to perform pose estimation on video sequences and view annotated outputs.
  • Threshold Adjustment: Fine-tune Bounding Box Threshold and NMS Threshold sliders to refine detections.
  • Example Images: Three license-free images with people are included for quick testing via the Examples panel.
  • ZeroGPU Acceleration: Utilizes the @spaces.GPU() decorator for GPU inference on HuggingFace Spaces.

Usage

  1. Upload Image: Drag-and-drop or select an image in the Upload Image component (or choose from Examples).
  2. Upload Video: Drag-and-drop or select a video file in the Upload Video component.
  3. Select Remote Checkpoint: Pick a preloaded variant from the dropdown menu.
  4. (Optional) Upload Your Own Checkpoint: Provide a .pth file to override the remote selection; the model variant is detected automatically.
  5. Adjust Thresholds: Set Bounding Box Threshold (bbox_thr) and NMS Threshold (nms_thr) to control confidence and suppression behavior.
  6. Run Inference: Click Run Inference.
  7. View Results:
    • For images, the annotated image will appear in the Annotated Image panel.
    • For videos, the annotated video will appear in the Annotated Video panel. The active checkpoint name will appear below.

Remote Checkpoints

The following variants are available out of the box:

  • rtmo-s_8xb32-600e_coco
  • rtmo-m_16xb16-600e_coco
  • rtmo-l_16xb16-600e_coco
  • rtmo-t_8xb32-600e_body7
  • rtmo-s_8xb32-600e_body7
  • rtmo-m_16xb16-600e_body7
  • rtmo-l_16xb16-600e_body7
  • rtmo-s_8xb32-700e_crowdpose
  • rtmo-m_16xb16-700e_crowdpose
  • rtmo-l_16xb16-700e_crowdpose
  • rtmo-s_coco_retrainable (from Hugging Face)

Implementation Details

  • GPU Decorator: @spaces.GPU() marks the predict function for GPU execution under ZeroGPU.
  • Inference API: Leverages MMPoseInferencer from MMPose with pose2d, pose2d_weights, and category [0] for person detection.
  • Monkey-Patch: Applies a regex patch to bypass mmdet’s MMCV version assertion for compatibility.
  • Variant Detection: Inspects backbone.stem.conv.conv.weight channels in the checkpoint to select the correct RTMO variant.
  • Checkpoint Management: Remote files are downloaded to /tmp/{key}.pth on demand; uploads use the provided local path.
  • Image & Video Support: The predict function automatically handles both image and video inputs, saving annotated frames or video to /tmp/vis and displaying them in the UI.
  • Output: Saves visualization images or videos to /tmp/vis and displays them in the UI panels.

Files

  • app.py: Main Gradio application script.
  • requirements.txt: Python dependencies, including MMCV and MMPose.
  • README.md: This documentation file.