Luigi commited on
Commit
c5ee215
·
1 Parent(s): f994a03

initial commit

Browse files
Files changed (3) hide show
  1. README.md +42 -4
  2. app.py +43 -4
  3. requirements.txt +7 -0
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  title: Video Human Fall Detector
3
- emoji: 🐢
4
- colorFrom: pink
5
- colorTo: pink
6
  sdk: gradio
7
  sdk_version: 5.25.0
8
  app_file: app.py
@@ -11,4 +11,42 @@ license: apache-2.0
11
  short_description: Fall Detection Demo using LightCLIP
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Video Human Fall Detector
3
+ emoji: 🐠
4
+ colorFrom: purple
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 5.25.0
8
  app_file: app.py
 
11
  short_description: Fall Detection Demo using LightCLIP
12
  ---
13
 
14
+ # Fall Detection Demo using LightCLIP on Hugging Face Spaces
15
+
16
+ This project demonstrates a lightweight, transformer-based approach to detect human falls in video clips using a vision–language model (VLM). The demo is designed for complex scenes including multiple persons, obstacles, and varying lighting conditions. It employs a sliding-window technique to check multiple frames for robust detection and aggregates predictions over time to reduce false alarms.
17
+
18
+ ## Overview
19
+
20
+ The demo uses a pre-trained LightCLIP (or CLIP) model to compute image–text similarity scores between video frames and natural language prompts. Two prompts are used:
21
+ - **Fall Prompt:** "A person falling on the ground."
22
+ - **Non-Fall Prompt:** "A person standing or walking."
23
+
24
+ For each window of frames extracted from the video, the model computes similarity scores for each frame. The scores are aggregated over a sliding window, and if the average score for the "fall" prompt exceeds a defined threshold, a fall event is registered along with an approximate timestamp.
25
+
26
+ ## Project Files
27
+
28
+ - **app.py:** The main application file containing the Gradio demo.
29
+ - **requirements.txt:** Lists all the required Python libraries.
30
+ - **README.md:** This file.
31
+
32
+ ## How to Run
33
+
34
+ 1. **Clone or download the repository** into your Hugging Face Spaces.
35
+ 2. Ensure the project is set to use the **GPU plan** in Spaces.
36
+ 3. Spaces will automatically install the required libraries from `requirements.txt`.
37
+ 4. Launch the demo by running `app.py` (Gradio will start the web interface).
38
+
39
+ ## Code Overview
40
+
41
+ - **Frame Extraction:** The video is processed using OpenCV to extract frames (resized to 224×224).
42
+ - **LightCLIP Inference:** The demo uses the Hugging Face Transformers library to load a CLIP model (acting as LightCLIP). It computes image embeddings for each frame and compares them to text embeddings of the fall and non-fall descriptions.
43
+ - **Temporal Aggregation:** A sliding window (e.g. 16 frames with a stride of 8) is used to calculate average "fall" scores. Windows exceeding a threshold (e.g. 0.8) are flagged as fall events.
44
+ - **User Interface:** A simple Gradio UI allows users to upload a video clip and displays the detection result along with a representative frame and list of detected fall times.
45
+
46
+ ## Customization
47
+
48
+ - **Model:** Replace `"openai/clip-vit-base-patch32"` in `app.py` with your own LightCLIP model checkpoint if available.
49
+ - **Threshold & Window Size:** Adjust parameters such as the detection threshold, window size, and stride for better results on your dataset.
50
+ - **Deployment:** This demo is configured to run on a GPU-backed Hugging Face Space for real-time inference.
51
+
52
+ Enjoy experimenting with fall detection!
app.py CHANGED
@@ -1,7 +1,46 @@
 
 
1
  import gradio as gr
 
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!"
 
 
5
 
6
- demo = gr.Interface(fn=greet, inputs="text", outputs="text")
7
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import spaces # Import early to avoid potential issues
3
  import gradio as gr
4
+ from transformers import CLIPProcessor, CLIPModel
5
 
6
+ # Load the CLIP model and processor on the CPU initially
7
+ model_name = "openai/clip-vit-base-patch32"
8
+ model = CLIPModel.from_pretrained(model_name)
9
+ processor = CLIPProcessor.from_pretrained(model_name)
10
 
11
+ @spaces.GPU
12
+ def clip_similarity(image, text):
13
+ """
14
+ Computes a similarity score between an input image and text using the CLIP model.
15
+ This function is decorated with @spaces.GPU so that the model is moved to GPU only when needed.
16
+ """
17
+ # Create a torch device for cuda
18
+ device = torch.device("cuda")
19
+
20
+ # Move the model to GPU within the function
21
+ model.to(device)
22
+
23
+ # Preprocess the inputs and move tensors to GPU
24
+ inputs = processor(text=[text], images=image, return_tensors="pt", padding=True)
25
+ inputs = {key: val.to(device) for key, val in inputs.items()}
26
+
27
+ # Run inference
28
+ outputs = model(**inputs)
29
+
30
+ # Extract similarity score (logits_per_image): higher value indicates better matching
31
+ similarity_score = outputs.logits_per_image.detach().cpu().numpy()[0]
32
+ return float(similarity_score)
33
+
34
+ # Set up the Gradio interface
35
+ iface = gr.Interface(
36
+ fn=clip_similarity,
37
+ inputs=[
38
+ gr.Image(type="pil", label="Upload Image"),
39
+ gr.Text(label="Input Text")
40
+ ],
41
+ outputs=gr.Number(label="Similarity Score"),
42
+ title="CLIP Similarity Demo with ZeroGPU"
43
+ )
44
+
45
+ if __name__ == "__main__":
46
+ iface.launch()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ gradio
2
+ torch>=2.4.0
3
+ transformers>=4.20.0
4
+ opencv-python
5
+ Pillow
6
+ accelerate
7
+ yt_dlp