Spaces:

Matoshi
/

ghibli

Running on Zero

App Files Files Community

AtzePengg commited on Mar 28

Commit

6e67586

0 Parent(s):

Completely new branch with sample video feature

Browse files

Files changed (8) hide show

.env.example +2 -0
.gitattributes +35 -0
.gitignore +6 -0
CLAUDE.md +27 -0
README.md +62 -0
app.py +289 -0
packages.txt +1 -0
requirements.txt +7 -0

.env.example ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # OpenAI API Key
2	+ OPENAI_API_KEY=your_openai_api_key_here

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+venv/
+__pycache__/
+*.pyc
+.env
+outputs/
+.DS_Store

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# CLAUDE.md - Guidelines for Ghibli Project
+## Commands
+- Build/Run: `python app.py`
+- Tests: `pytest tests/`
+- Single test: `pytest tests/path_to_test.py::test_function_name -v`
+- Lint: `flake8 . && black . --check`
+- Type check: `mypy .`
+## Code Style Guidelines
+- **Formatting**: Use Black for Python code formatting
+- **Imports**: Sort imports with isort; standard library first, then third-party, then local
+- **Types**: Use type hints for all function signatures
+- **Naming**:
+  - snake_case for variables and functions
+  - PascalCase for classes
+  - UPPER_CASE for constants
+- **Error Handling**: Use try/except with specific exceptions, avoid bare except
+- **Documentation**: Use docstrings for all public functions and classes
+- **Testing**: Write unit tests for all new features
+- **Commits**: Descriptive commit messages with present tense verbs
+## Project Structure
+- `/app.py` - Main Gradio application
+- `/models/` - ML model implementations
+- `/utils/` - Utility functions
+- `/tests/` - Test files

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+title: Video-to-Ghibli Style Converter
+emoji: 🎬
+colorFrom: indigo
+colorTo: pink
+sdk: gradio
+sdk_version: 5.23.1
+app_file: app.py
+pinned: false
+---
+# Video-to-Ghibli Style Converter
+A Gradio web application that transforms videos into Studio Ghibli-style animations using OpenAI's GPT-4o.
+## Features
+- Upload short videos for stylization
+- Apply custom style prompts
+- Secure API key handling (user provides their own key)
+- Real-time conversion status updates
+## How It Works
+1. Upload a short video (a few seconds is best)
+2. Enter your OpenAI API key
+3. Customize the style prompt if desired
+4. Click "Stylize Video" and wait for processing
+The application:
+- Extracts frames from the video
+- Uses GPT-4o to analyze and transform each frame to Ghibli style
+- Reassembles the stylized frames into a new video
+## Local Setup
+1. Create a virtual environment:
+   ```
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+2. Install dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+3. Make sure ffmpeg is installed on your system:
+   - macOS: `brew install ffmpeg`
+   - Ubuntu: `sudo apt-get install ffmpeg`
+   - Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
+4. Run the application:
+   ```
+   python app.py
+   ```
+## Notes
+- You need your own OpenAI API key with access to GPT-4o
+- Processing time depends on video length and frame rate
+- For best results, use videos that are a few seconds long

app.py ADDED Viewed

	@@ -0,0 +1,289 @@

+import gradio as gr
+import openai
+import ffmpeg
+import os
+import uuid
+import base64
+import requests
+import tempfile
+import shutil
+import re
+import time
+import concurrent.futures
+from pathlib import Path
+from dotenv import load_dotenv
+from huggingface_hub import SpaceStage
+from huggingface_hub.utils import HfHubHTTPError
+# Add GPU decorator for Hugging Face Spaces
+try:
+    from spaces import GPU
+    use_gpu = True
+    @GPU
+    def get_gpu():
+        return True
+    # Call the function to trigger GPU allocation
+    get_gpu()
+except ImportError:
+    use_gpu = False
+    print("Running without GPU acceleration")
+# Load environment variables from .env file if it exists
+load_dotenv()
+# Get default API key from environment (will be '' if not set)
+DEFAULT_API_KEY = os.getenv("OPENAI_API_KEY", "")
+def download_video_from_url(url):
+    try:
+        # Create a temporary file
+        temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp4")
+        temp_filename = temp_file.name
+        temp_file.close()
+        # Download the video
+        response = requests.get(url, stream=True)
+        if response.status_code == 200:
+            with open(temp_filename, 'wb') as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    f.write(chunk)
+            return temp_filename
+        else:
+            print(f"Failed to download video: {response.status_code}")
+            return None
+    except Exception as e:
+        print(f"Error downloading video: {e}")
+        return None
+def process_frame(frame_path, style_prompt, api_key):
+    """Process a single frame with GPT-4o analysis and DALL-E 3 generation"""
+    try:
+        # Read the image and encode to base64
+        with open(frame_path, "rb") as img_file:
+            img_bytes = img_file.read()
+        # First use GPT-4o to analyze the image
+        analysis_messages = [
+            {"role": "system", "content": "You are an expert at analyzing images and describing them for AI image generation. For each image, provide a detailed description focusing on its visual content, composition, and elements that would help generate a Studio Ghibli style version."},
+            {"role": "user", "content": [
+                {"type": "text", "text": f"Analyze this image and provide a detailed description that could be used to recreate it in Studio Ghibli animation style. Focus on the essential visual elements that should be preserved and how they should be adapted to Ghibli aesthetic."},
+                {"type": "image_url", "image_url": {
+                    "url": f"data:image/png;base64,{base64.b64encode(img_bytes).decode('utf-8')}"
+                }}
+            ]}
+        ]
+        openai.api_key = api_key
+        analysis_response = openai.chat.completions.create(
+            model="gpt-4o",
+            messages=analysis_messages,
+            max_tokens=800
+        )
+        # Get the image description
+        image_description = analysis_response.choices[0].message.content
+        print(f"GPT-4o analysis for frame {os.path.basename(frame_path)}: {image_description[:150]}...")
+        # Now use DALL-E 3 to generate a stylized version based on the description
+        dall_e_prompt = f"Create a Studio Ghibli style animation frame that shows: {image_description}. {style_prompt}. Hand-drawn animation style, soft colors, attention to detail, Miyazaki aesthetic."
+        # Ensure prompt isn't too long
+        if len(dall_e_prompt) > 4000:
+            dall_e_prompt = dall_e_prompt[:3997] + "..."
+        dalle_response = openai.images.generate(
+            model="dall-e-3",
+            prompt=dall_e_prompt,
+            n=1,
+            size="1024x1024",
+            quality="standard"
+        )
+        # Get the generated image URL
+        img_url = dalle_response.data[0].url
+        print(f"Generated DALL-E image for frame {os.path.basename(frame_path)}")
+        # Download the image
+        img_response = requests.get(img_url, timeout=30)
+        if img_response.status_code == 200:
+            with open(frame_path, "wb") as out_img:
+                out_img.write(img_response.content)
+            print(f"Successfully saved stylized frame: {os.path.basename(frame_path)}")
+            return True
+        else:
+            print(f"Failed to download image: HTTP {img_response.status_code}")
+            return False
+    except Exception as e:
+        import traceback
+        print(f"Error processing frame {os.path.basename(frame_path)}: {str(e)}")
+        print(traceback.format_exc())
+        return False
+def stylize_video(video_path, style_prompt, api_key):
+    # Use the provided API key, or fall back to the default one
+    actual_api_key = api_key if api_key else DEFAULT_API_KEY
+    if not actual_api_key:
+        return None, "Please provide your OpenAI API key"
+    try:
+        # Create temp directories
+        temp_dir = tempfile.mkdtemp()
+        input_filename = os.path.join(temp_dir, "input.mp4")
+        frames_dir = os.path.join(temp_dir, "frames")
+        os.makedirs(frames_dir, exist_ok=True)
+        # Save the input video to a temporary file
+        if isinstance(video_path, str):
+            if os.path.exists(video_path):
+                # It's a file path, copy it
+                shutil.copy(video_path, input_filename)
+            else:
+                return None, f"Video file not found: {video_path}"
+        else:
+            # Assume it's binary data
+            with open(input_filename, "wb") as f:
+                f.write(video_path)
+        # Make sure the video file exists
+        if not os.path.exists(input_filename):
+            return None, "Failed to save input video"
+        # Extract frames - using lower fps for longer videos (1 frame per second)
+        ffmpeg.input(input_filename).output(f"{frames_dir}/%04d.png", vf="fps=1").run(quiet=True)
+        # Check if frames were extracted
+        frames = sorted([os.path.join(frames_dir, f) for f in os.listdir(frames_dir) if f.endswith('.png')])
+        if not frames:
+            return None, "No frames were extracted from the video"
+        # Limit to a maximum of 15 frames for reasonable processing times (15 seconds at 1fps)
+        if len(frames) > 15:
+            # Take evenly distributed frames
+            frames = frames[:15]
+        print(f"Processing {len(frames)} frames")
+        # Process frames in parallel with up to 2 concurrent workers to avoid rate limits
+        num_workers = 3 if use_gpu else 2  # More workers if GPU is available
+        with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
+            futures = {executor.submit(process_frame, frame, style_prompt, actual_api_key): frame for frame in frames}
+            # Collect results
+            processed_frames = []
+            for future in concurrent.futures.as_completed(futures):
+                frame = futures[future]
+                if future.result():
+                    processed_frames.append(frame)
+                    print(f"Completed frame {os.path.basename(frame)} ({len(processed_frames)}/{len(frames)})")
+        if not processed_frames:
+            return None, "Failed to process any frames. Please make sure your OpenAI API key has access to both GPT-4o and DALL-E 3."
+        # Even if not all frames were processed, try to create a video with what we have
+        print(f"Successfully processed {len(processed_frames)}/{len(frames)} frames")
+        # Ensure frames are in the correct order (important for video continuity)
+        processed_frames.sort()
+        # Reassemble frames into video
+        output_filename = os.path.join(temp_dir, "stylized.mp4")
+        # Use a higher bitrate and better codec for higher quality
+        # Note: We're using the original frames directory because the processed frame filenames match
+        ffmpeg.input(f"{frames_dir}/%04d.png", framerate=1) \
+              .output(output_filename, vcodec='libx264', pix_fmt='yuv420p', crf=18) \
+              .run(quiet=True)
+        # Check if the output file exists and has content
+        if not os.path.exists(output_filename) or os.path.getsize(output_filename) == 0:
+            return None, "Failed to create output video"
+        # Copy to a persistent location for Gradio to serve
+        os.makedirs("outputs", exist_ok=True)
+        persistent_output = os.path.join("outputs", f"stylized_{uuid.uuid4()}.mp4")
+        shutil.copy(output_filename, persistent_output)
+        # Return the relative path (Gradio can handle this)
+        print(f"Output video created at: {persistent_output}")
+        # Cleanup temp files
+        shutil.rmtree(temp_dir)
+        return persistent_output, f"Video stylized successfully with {len(processed_frames)} frames!"
+    except Exception as e:
+        import traceback
+        traceback_str = traceback.format_exc()
+        print(f"Error: {str(e)}\n{traceback_str}")
+        return None, f"Error: {str(e)}"
+def use_sample_bear_video():
+    """Function to download and use the sample bear video"""
+    # URL to a sample bear video (from a public source)
+    bear_url = "https://storage.googleapis.com/tfjs-models/assets/posenet/camera_cnn_1080p30s.mp4"
+    # Download the video to a temporary file
+    temp_video_path = download_video_from_url(bear_url)
+    if temp_video_path:
+        return temp_video_path, "Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
+    else:
+        return None, "Failed to download sample video"
+with gr.Blocks(title="Video-to-Ghibli Style Converter") as iface:
+    gr.Markdown("# Video-to-Ghibli Style Converter")
+    gr.Markdown("Upload a video and convert it to Studio Ghibli animation style using GPT-4o and DALL-E 3.")
+    with gr.Row():
+        with gr.Column():
+            video_input = gr.Video(label="Upload Video (up to 15 seconds)")
+            api_key = gr.Textbox(
+                label="OpenAI API Key (requires GPT-4o and DALL-E 3 access)",
+                type="password",
+                placeholder="Enter your OpenAI API key"
+            )
+            style_prompt = gr.Textbox(
+                label="Style Prompt",
+                value="Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
+            )
+            with gr.Row():
+                submit_btn = gr.Button("Stylize Video", variant="primary")
+                example_btn = gr.Button("Use Sample Bear Video")
+        with gr.Column():
+            video_output = gr.Video(label="Stylized Video")
+            status_output = gr.Textbox(label="Status")
+    submit_btn.click(
+        fn=stylize_video,
+        inputs=[video_input, style_prompt, api_key],
+        outputs=[video_output, status_output]
+    )
+    example_btn.click(
+        fn=use_sample_bear_video,
+        inputs=None,
+        outputs=[video_input, style_prompt]
+    )
+    gr.Markdown("""
+    ## Instructions
+    1. Upload a video up to 15 seconds long or use the sample bear video
+    2. Enter your OpenAI API key with GPT-4o and DALL-E 3 access
+    3. Customize the style prompt if desired
+    4. Click "Stylize Video" and wait for processing
+    ## Example Style Prompts
+    - "Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
+    - "Studio Ghibli style with magical and dreamy atmosphere"
+    - "Nostalgic Studio Ghibli animation style with watercolor backgrounds and clean linework"
+    - "Ghibli-inspired animation with vibrant colors and fantasy elements"
+    Note: Each frame is analyzed by GPT-4o and then transformed by DALL-E 3.
+    Videos are processed at 1 frame per second to keep processing time reasonable.
+    """)
+if __name__ == "__main__":
+    iface.launch()

packages.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ffmpeg

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+ffmpeg-python>=0.2.0
+openai>=1.0.0
+gradio>=5.0.0
+requests>=2.0.0
+pillow>=9.0.0
+python-dotenv>=1.0.0
+huggingface-hub>=0.20.0