AtzePengg commited on
Commit
6e67586
·
0 Parent(s):

Completely new branch with sample video feature

Browse files
Files changed (8) hide show
  1. .env.example +2 -0
  2. .gitattributes +35 -0
  3. .gitignore +6 -0
  4. CLAUDE.md +27 -0
  5. README.md +62 -0
  6. app.py +289 -0
  7. packages.txt +1 -0
  8. requirements.txt +7 -0
.env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # OpenAI API Key
2
+ OPENAI_API_KEY=your_openai_api_key_here
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ venv/
2
+ __pycache__/
3
+ *.pyc
4
+ .env
5
+ outputs/
6
+ .DS_Store
CLAUDE.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md - Guidelines for Ghibli Project
2
+
3
+ ## Commands
4
+ - Build/Run: `python app.py`
5
+ - Tests: `pytest tests/`
6
+ - Single test: `pytest tests/path_to_test.py::test_function_name -v`
7
+ - Lint: `flake8 . && black . --check`
8
+ - Type check: `mypy .`
9
+
10
+ ## Code Style Guidelines
11
+ - **Formatting**: Use Black for Python code formatting
12
+ - **Imports**: Sort imports with isort; standard library first, then third-party, then local
13
+ - **Types**: Use type hints for all function signatures
14
+ - **Naming**:
15
+ - snake_case for variables and functions
16
+ - PascalCase for classes
17
+ - UPPER_CASE for constants
18
+ - **Error Handling**: Use try/except with specific exceptions, avoid bare except
19
+ - **Documentation**: Use docstrings for all public functions and classes
20
+ - **Testing**: Write unit tests for all new features
21
+ - **Commits**: Descriptive commit messages with present tense verbs
22
+
23
+ ## Project Structure
24
+ - `/app.py` - Main Gradio application
25
+ - `/models/` - ML model implementations
26
+ - `/utils/` - Utility functions
27
+ - `/tests/` - Test files
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Video-to-Ghibli Style Converter
3
+ emoji: 🎬
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 5.23.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Video-to-Ghibli Style Converter
13
+
14
+ A Gradio web application that transforms videos into Studio Ghibli-style animations using OpenAI's GPT-4o.
15
+
16
+ ## Features
17
+
18
+ - Upload short videos for stylization
19
+ - Apply custom style prompts
20
+ - Secure API key handling (user provides their own key)
21
+ - Real-time conversion status updates
22
+
23
+ ## How It Works
24
+
25
+ 1. Upload a short video (a few seconds is best)
26
+ 2. Enter your OpenAI API key
27
+ 3. Customize the style prompt if desired
28
+ 4. Click "Stylize Video" and wait for processing
29
+
30
+ The application:
31
+ - Extracts frames from the video
32
+ - Uses GPT-4o to analyze and transform each frame to Ghibli style
33
+ - Reassembles the stylized frames into a new video
34
+
35
+ ## Local Setup
36
+
37
+ 1. Create a virtual environment:
38
+ ```
39
+ python -m venv venv
40
+ source venv/bin/activate # On Windows: venv\Scripts\activate
41
+ ```
42
+
43
+ 2. Install dependencies:
44
+ ```
45
+ pip install -r requirements.txt
46
+ ```
47
+
48
+ 3. Make sure ffmpeg is installed on your system:
49
+ - macOS: `brew install ffmpeg`
50
+ - Ubuntu: `sudo apt-get install ffmpeg`
51
+ - Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
52
+
53
+ 4. Run the application:
54
+ ```
55
+ python app.py
56
+ ```
57
+
58
+ ## Notes
59
+
60
+ - You need your own OpenAI API key with access to GPT-4o
61
+ - Processing time depends on video length and frame rate
62
+ - For best results, use videos that are a few seconds long
app.py ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import openai
3
+ import ffmpeg
4
+ import os
5
+ import uuid
6
+ import base64
7
+ import requests
8
+ import tempfile
9
+ import shutil
10
+ import re
11
+ import time
12
+ import concurrent.futures
13
+ from pathlib import Path
14
+ from dotenv import load_dotenv
15
+ from huggingface_hub import SpaceStage
16
+ from huggingface_hub.utils import HfHubHTTPError
17
+
18
+ # Add GPU decorator for Hugging Face Spaces
19
+ try:
20
+ from spaces import GPU
21
+ use_gpu = True
22
+ @GPU
23
+ def get_gpu():
24
+ return True
25
+ # Call the function to trigger GPU allocation
26
+ get_gpu()
27
+ except ImportError:
28
+ use_gpu = False
29
+ print("Running without GPU acceleration")
30
+
31
+ # Load environment variables from .env file if it exists
32
+ load_dotenv()
33
+
34
+ # Get default API key from environment (will be '' if not set)
35
+ DEFAULT_API_KEY = os.getenv("OPENAI_API_KEY", "")
36
+
37
+ def download_video_from_url(url):
38
+ try:
39
+ # Create a temporary file
40
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp4")
41
+ temp_filename = temp_file.name
42
+ temp_file.close()
43
+
44
+ # Download the video
45
+ response = requests.get(url, stream=True)
46
+ if response.status_code == 200:
47
+ with open(temp_filename, 'wb') as f:
48
+ for chunk in response.iter_content(chunk_size=8192):
49
+ f.write(chunk)
50
+ return temp_filename
51
+ else:
52
+ print(f"Failed to download video: {response.status_code}")
53
+ return None
54
+ except Exception as e:
55
+ print(f"Error downloading video: {e}")
56
+ return None
57
+
58
+ def process_frame(frame_path, style_prompt, api_key):
59
+ """Process a single frame with GPT-4o analysis and DALL-E 3 generation"""
60
+ try:
61
+ # Read the image and encode to base64
62
+ with open(frame_path, "rb") as img_file:
63
+ img_bytes = img_file.read()
64
+
65
+ # First use GPT-4o to analyze the image
66
+ analysis_messages = [
67
+ {"role": "system", "content": "You are an expert at analyzing images and describing them for AI image generation. For each image, provide a detailed description focusing on its visual content, composition, and elements that would help generate a Studio Ghibli style version."},
68
+ {"role": "user", "content": [
69
+ {"type": "text", "text": f"Analyze this image and provide a detailed description that could be used to recreate it in Studio Ghibli animation style. Focus on the essential visual elements that should be preserved and how they should be adapted to Ghibli aesthetic."},
70
+ {"type": "image_url", "image_url": {
71
+ "url": f"data:image/png;base64,{base64.b64encode(img_bytes).decode('utf-8')}"
72
+ }}
73
+ ]}
74
+ ]
75
+
76
+ openai.api_key = api_key
77
+ analysis_response = openai.chat.completions.create(
78
+ model="gpt-4o",
79
+ messages=analysis_messages,
80
+ max_tokens=800
81
+ )
82
+
83
+ # Get the image description
84
+ image_description = analysis_response.choices[0].message.content
85
+ print(f"GPT-4o analysis for frame {os.path.basename(frame_path)}: {image_description[:150]}...")
86
+
87
+ # Now use DALL-E 3 to generate a stylized version based on the description
88
+ dall_e_prompt = f"Create a Studio Ghibli style animation frame that shows: {image_description}. {style_prompt}. Hand-drawn animation style, soft colors, attention to detail, Miyazaki aesthetic."
89
+
90
+ # Ensure prompt isn't too long
91
+ if len(dall_e_prompt) > 4000:
92
+ dall_e_prompt = dall_e_prompt[:3997] + "..."
93
+
94
+ dalle_response = openai.images.generate(
95
+ model="dall-e-3",
96
+ prompt=dall_e_prompt,
97
+ n=1,
98
+ size="1024x1024",
99
+ quality="standard"
100
+ )
101
+
102
+ # Get the generated image URL
103
+ img_url = dalle_response.data[0].url
104
+ print(f"Generated DALL-E image for frame {os.path.basename(frame_path)}")
105
+
106
+ # Download the image
107
+ img_response = requests.get(img_url, timeout=30)
108
+ if img_response.status_code == 200:
109
+ with open(frame_path, "wb") as out_img:
110
+ out_img.write(img_response.content)
111
+ print(f"Successfully saved stylized frame: {os.path.basename(frame_path)}")
112
+ return True
113
+ else:
114
+ print(f"Failed to download image: HTTP {img_response.status_code}")
115
+ return False
116
+
117
+ except Exception as e:
118
+ import traceback
119
+ print(f"Error processing frame {os.path.basename(frame_path)}: {str(e)}")
120
+ print(traceback.format_exc())
121
+ return False
122
+
123
+ def stylize_video(video_path, style_prompt, api_key):
124
+ # Use the provided API key, or fall back to the default one
125
+ actual_api_key = api_key if api_key else DEFAULT_API_KEY
126
+
127
+ if not actual_api_key:
128
+ return None, "Please provide your OpenAI API key"
129
+
130
+ try:
131
+ # Create temp directories
132
+ temp_dir = tempfile.mkdtemp()
133
+ input_filename = os.path.join(temp_dir, "input.mp4")
134
+ frames_dir = os.path.join(temp_dir, "frames")
135
+ os.makedirs(frames_dir, exist_ok=True)
136
+
137
+ # Save the input video to a temporary file
138
+ if isinstance(video_path, str):
139
+ if os.path.exists(video_path):
140
+ # It's a file path, copy it
141
+ shutil.copy(video_path, input_filename)
142
+ else:
143
+ return None, f"Video file not found: {video_path}"
144
+ else:
145
+ # Assume it's binary data
146
+ with open(input_filename, "wb") as f:
147
+ f.write(video_path)
148
+
149
+ # Make sure the video file exists
150
+ if not os.path.exists(input_filename):
151
+ return None, "Failed to save input video"
152
+
153
+ # Extract frames - using lower fps for longer videos (1 frame per second)
154
+ ffmpeg.input(input_filename).output(f"{frames_dir}/%04d.png", vf="fps=1").run(quiet=True)
155
+
156
+ # Check if frames were extracted
157
+ frames = sorted([os.path.join(frames_dir, f) for f in os.listdir(frames_dir) if f.endswith('.png')])
158
+ if not frames:
159
+ return None, "No frames were extracted from the video"
160
+
161
+ # Limit to a maximum of 15 frames for reasonable processing times (15 seconds at 1fps)
162
+ if len(frames) > 15:
163
+ # Take evenly distributed frames
164
+ frames = frames[:15]
165
+
166
+ print(f"Processing {len(frames)} frames")
167
+
168
+ # Process frames in parallel with up to 2 concurrent workers to avoid rate limits
169
+ num_workers = 3 if use_gpu else 2 # More workers if GPU is available
170
+ with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
171
+ futures = {executor.submit(process_frame, frame, style_prompt, actual_api_key): frame for frame in frames}
172
+
173
+ # Collect results
174
+ processed_frames = []
175
+ for future in concurrent.futures.as_completed(futures):
176
+ frame = futures[future]
177
+ if future.result():
178
+ processed_frames.append(frame)
179
+ print(f"Completed frame {os.path.basename(frame)} ({len(processed_frames)}/{len(frames)})")
180
+
181
+ if not processed_frames:
182
+ return None, "Failed to process any frames. Please make sure your OpenAI API key has access to both GPT-4o and DALL-E 3."
183
+
184
+ # Even if not all frames were processed, try to create a video with what we have
185
+ print(f"Successfully processed {len(processed_frames)}/{len(frames)} frames")
186
+
187
+ # Ensure frames are in the correct order (important for video continuity)
188
+ processed_frames.sort()
189
+
190
+ # Reassemble frames into video
191
+ output_filename = os.path.join(temp_dir, "stylized.mp4")
192
+
193
+ # Use a higher bitrate and better codec for higher quality
194
+ # Note: We're using the original frames directory because the processed frame filenames match
195
+ ffmpeg.input(f"{frames_dir}/%04d.png", framerate=1) \
196
+ .output(output_filename, vcodec='libx264', pix_fmt='yuv420p', crf=18) \
197
+ .run(quiet=True)
198
+
199
+ # Check if the output file exists and has content
200
+ if not os.path.exists(output_filename) or os.path.getsize(output_filename) == 0:
201
+ return None, "Failed to create output video"
202
+
203
+ # Copy to a persistent location for Gradio to serve
204
+ os.makedirs("outputs", exist_ok=True)
205
+ persistent_output = os.path.join("outputs", f"stylized_{uuid.uuid4()}.mp4")
206
+ shutil.copy(output_filename, persistent_output)
207
+
208
+ # Return the relative path (Gradio can handle this)
209
+ print(f"Output video created at: {persistent_output}")
210
+
211
+ # Cleanup temp files
212
+ shutil.rmtree(temp_dir)
213
+
214
+ return persistent_output, f"Video stylized successfully with {len(processed_frames)} frames!"
215
+
216
+ except Exception as e:
217
+ import traceback
218
+ traceback_str = traceback.format_exc()
219
+ print(f"Error: {str(e)}\n{traceback_str}")
220
+ return None, f"Error: {str(e)}"
221
+
222
+ def use_sample_bear_video():
223
+ """Function to download and use the sample bear video"""
224
+ # URL to a sample bear video (from a public source)
225
+ bear_url = "https://storage.googleapis.com/tfjs-models/assets/posenet/camera_cnn_1080p30s.mp4"
226
+
227
+ # Download the video to a temporary file
228
+ temp_video_path = download_video_from_url(bear_url)
229
+ if temp_video_path:
230
+ return temp_video_path, "Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
231
+ else:
232
+ return None, "Failed to download sample video"
233
+
234
+ with gr.Blocks(title="Video-to-Ghibli Style Converter") as iface:
235
+ gr.Markdown("# Video-to-Ghibli Style Converter")
236
+ gr.Markdown("Upload a video and convert it to Studio Ghibli animation style using GPT-4o and DALL-E 3.")
237
+
238
+ with gr.Row():
239
+ with gr.Column():
240
+ video_input = gr.Video(label="Upload Video (up to 15 seconds)")
241
+ api_key = gr.Textbox(
242
+ label="OpenAI API Key (requires GPT-4o and DALL-E 3 access)",
243
+ type="password",
244
+ placeholder="Enter your OpenAI API key"
245
+ )
246
+ style_prompt = gr.Textbox(
247
+ label="Style Prompt",
248
+ value="Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
249
+ )
250
+
251
+ with gr.Row():
252
+ submit_btn = gr.Button("Stylize Video", variant="primary")
253
+ example_btn = gr.Button("Use Sample Bear Video")
254
+
255
+ with gr.Column():
256
+ video_output = gr.Video(label="Stylized Video")
257
+ status_output = gr.Textbox(label="Status")
258
+
259
+ submit_btn.click(
260
+ fn=stylize_video,
261
+ inputs=[video_input, style_prompt, api_key],
262
+ outputs=[video_output, status_output]
263
+ )
264
+
265
+ example_btn.click(
266
+ fn=use_sample_bear_video,
267
+ inputs=None,
268
+ outputs=[video_input, style_prompt]
269
+ )
270
+
271
+ gr.Markdown("""
272
+ ## Instructions
273
+ 1. Upload a video up to 15 seconds long or use the sample bear video
274
+ 2. Enter your OpenAI API key with GPT-4o and DALL-E 3 access
275
+ 3. Customize the style prompt if desired
276
+ 4. Click "Stylize Video" and wait for processing
277
+
278
+ ## Example Style Prompts
279
+ - "Studio Ghibli animation with Hayao Miyazaki's distinctive hand-drawn art style"
280
+ - "Studio Ghibli style with magical and dreamy atmosphere"
281
+ - "Nostalgic Studio Ghibli animation style with watercolor backgrounds and clean linework"
282
+ - "Ghibli-inspired animation with vibrant colors and fantasy elements"
283
+
284
+ Note: Each frame is analyzed by GPT-4o and then transformed by DALL-E 3.
285
+ Videos are processed at 1 frame per second to keep processing time reasonable.
286
+ """)
287
+
288
+ if __name__ == "__main__":
289
+ iface.launch()
packages.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ ffmpeg
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ ffmpeg-python>=0.2.0
2
+ openai>=1.0.0
3
+ gradio>=5.0.0
4
+ requests>=2.0.0
5
+ pillow>=9.0.0
6
+ python-dotenv>=1.0.0
7
+ huggingface-hub>=0.20.0