root commited on
Commit
9e21eef
·
1 Parent(s): a51853d

copyprevious

Browse files
Files changed (7) hide show
  1. DEPLOYMENT.md +42 -0
  2. README.md +42 -8
  3. app.py +383 -0
  4. emotionanalysis.py +471 -0
  5. example.py +49 -0
  6. requirements.txt +14 -0
  7. utils.py +105 -0
DEPLOYMENT.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploying to Hugging Face Spaces
2
+
3
+ This guide explains how to deploy the Music Genre Classifier & Lyrics Generator to Hugging Face Spaces.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. A Hugging Face account
8
+ 2. Access to the Llama 3.1 8B Instruct model (requires acceptance of the model license)
9
+ 3. A Hugging Face API token
10
+
11
+ ## Deployment Steps
12
+
13
+ ### 1. Create a New Space
14
+
15
+ 1. Go to the Hugging Face website and log in
16
+ 2. Navigate to "Spaces" in the top navigation
17
+ 3. Click "Create new Space"
18
+ 4. Choose "Gradio" as the SDK
19
+ 5. Give your Space a name and description
20
+ 6. Select "T4 GPU" as the hardware
21
+
22
+ ### 2. Set up Environment Variables
23
+
24
+ Set up your Hugging Face access token as an environment variable:
25
+
26
+ 1. Go to your profile settings in Hugging Face
27
+ 2. Navigate to "Access Tokens" and create a new token with "write" access
28
+ 3. In your Space settings, under "Repository secrets", add a new secret:
29
+ - Name: `HF_TOKEN`
30
+ - Value: Your Hugging Face access token
31
+
32
+ ### 3. Upload the Files
33
+
34
+ Upload all the files from this repository to your Space.
35
+
36
+ ### 4. Wait for Deployment
37
+
38
+ Hugging Face will automatically build and deploy your Space. This may take a few minutes, especially since it needs to download the models.
39
+
40
+ ### 5. Access Your Application
41
+
42
+ Once deployed, you can access your application on your Hugging Face Space URL.
README.md CHANGED
@@ -1,13 +1,47 @@
1
  ---
2
- title: Syllables Matching Experiment
3
- emoji: 🏢
4
- colorFrom: gray
5
- colorTo: yellow
6
- sdk: streamlit
7
- sdk_version: 1.45.0
8
  app_file: app.py
9
  pinned: false
10
- short_description: this project is for trying syllables matching techniques
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Music Genre Classifier & Lyrics Generator
3
+ emoji: 🎵
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.22.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: AI music genre detection and lyrics generation
12
  ---
13
 
14
+ # Music Genre Classifier & Lyrics Generator
15
+
16
+ This Hugging Face Space application provides two AI-powered features:
17
+
18
+ 1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
19
+
20
+ 2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
21
+
22
+ ## Features
23
+
24
+ - Upload any music file for instant genre classification
25
+ - Receive genre predictions with confidence scores
26
+ - Get AI-generated lyrics tailored to the detected music genre
27
+ - Lyrics length is automatically adjusted based on the song duration
28
+ - Simple and intuitive user interface
29
+
30
+ ## Usage
31
+
32
+ 1. Visit the live application on Hugging Face Spaces
33
+ 2. Upload your music file using the provided interface
34
+ 3. Click "Analyze & Generate" to process the audio
35
+ 4. View the detected genre and generated lyrics in the output panels
36
+
37
+ ## Technical Details
38
+
39
+ - Uses MFCC features extraction from audio for genre classification
40
+ - Leverages 4-bit quantization for efficient LLM inference on T4 GPU
41
+ - Implements a specialized prompt engineering approach to generate genre-specific lyrics
42
+ - Automatically scales lyrics length based on audio duration
43
+
44
+ ## Links
45
+
46
+ - [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
47
+ - [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
app.py ADDED
@@ -0,0 +1,383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import io
3
+ import gradio as gr
4
+ import torch
5
+ import numpy as np
6
+ from transformers import (
7
+ AutoModelForAudioClassification,
8
+ AutoFeatureExtractor,
9
+ AutoTokenizer,
10
+ pipeline,
11
+ AutoModelForCausalLM,
12
+ BitsAndBytesConfig
13
+ )
14
+ from huggingface_hub import login
15
+ from utils import (
16
+ load_audio,
17
+ extract_audio_duration,
18
+ extract_mfcc_features,
19
+ calculate_lyrics_length,
20
+ format_genre_results,
21
+ ensure_cuda_availability,
22
+ preprocess_audio_for_model
23
+ )
24
+ from emotionanalysis import MusicAnalyzer
25
+
26
+ # Login to Hugging Face Hub if token is provided
27
+ if "HF_TOKEN" in os.environ:
28
+ login(token=os.environ["HF_TOKEN"])
29
+
30
+ # Constants
31
+ GENRE_MODEL_NAME = "dima806/music_genres_classification"
32
+ MUSIC_DETECTION_MODEL = "MIT/ast-finetuned-audioset-10-10-0.4593"
33
+ LLM_MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
34
+ SAMPLE_RATE = 22050 # Standard sample rate for audio processing
35
+
36
+ # Check CUDA availability (for informational purposes)
37
+ CUDA_AVAILABLE = ensure_cuda_availability()
38
+
39
+ # Create music detection pipeline
40
+ print(f"Loading music detection model: {MUSIC_DETECTION_MODEL}")
41
+ try:
42
+ music_detector = pipeline(
43
+ "audio-classification",
44
+ model=MUSIC_DETECTION_MODEL,
45
+ device=0 if CUDA_AVAILABLE else -1
46
+ )
47
+ print("Successfully loaded music detection pipeline")
48
+ except Exception as e:
49
+ print(f"Error creating music detection pipeline: {str(e)}")
50
+ # Fallback to manual loading
51
+ try:
52
+ music_processor = AutoFeatureExtractor.from_pretrained(MUSIC_DETECTION_MODEL)
53
+ music_model = AutoModelForAudioClassification.from_pretrained(MUSIC_DETECTION_MODEL)
54
+ print("Successfully loaded music detection model and feature extractor")
55
+ except Exception as e2:
56
+ print(f"Error loading music detection model components: {str(e2)}")
57
+ raise RuntimeError(f"Could not load music detection model: {str(e2)}")
58
+
59
+ # Create genre classification pipeline
60
+ print(f"Loading audio classification model: {GENRE_MODEL_NAME}")
61
+ try:
62
+ genre_classifier = pipeline(
63
+ "audio-classification",
64
+ model=GENRE_MODEL_NAME,
65
+ device=0 if CUDA_AVAILABLE else -1
66
+ )
67
+ print("Successfully loaded audio classification pipeline")
68
+ except Exception as e:
69
+ print(f"Error creating pipeline: {str(e)}")
70
+ # Fallback to manual loading
71
+ try:
72
+ genre_processor = AutoFeatureExtractor.from_pretrained(GENRE_MODEL_NAME)
73
+ genre_model = AutoModelForAudioClassification.from_pretrained(GENRE_MODEL_NAME)
74
+ print("Successfully loaded audio classification model and feature extractor")
75
+ except Exception as e2:
76
+ print(f"Error loading model components: {str(e2)}")
77
+ raise RuntimeError(f"Could not load genre classification model: {str(e2)}")
78
+
79
+ # Load LLM with appropriate quantization for T4 GPU
80
+ bnb_config = BitsAndBytesConfig(
81
+ load_in_4bit=True,
82
+ bnb_4bit_quant_type="nf4",
83
+ bnb_4bit_compute_dtype=torch.float16,
84
+ )
85
+
86
+ llm_tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)
87
+ llm_model = AutoModelForCausalLM.from_pretrained(
88
+ LLM_MODEL_NAME,
89
+ device_map="auto",
90
+ quantization_config=bnb_config,
91
+ torch_dtype=torch.float16,
92
+ )
93
+
94
+ # Create LLM pipeline
95
+ llm_pipeline = pipeline(
96
+ "text-generation",
97
+ model=llm_model,
98
+ tokenizer=llm_tokenizer,
99
+ max_new_tokens=512,
100
+ )
101
+
102
+ # Initialize music emotion analyzer
103
+ music_analyzer = MusicAnalyzer()
104
+
105
+ def extract_audio_features(audio_file):
106
+ """Extract audio features from an audio file."""
107
+ # Load the audio file using utility function
108
+ y, sr = load_audio(audio_file, SAMPLE_RATE)
109
+
110
+ # Get audio duration in seconds
111
+ duration = extract_audio_duration(y, sr)
112
+
113
+ # Extract MFCCs for genre classification (may not be needed with the pipeline)
114
+ mfccs_mean = extract_mfcc_features(y, sr, n_mfcc=20)
115
+
116
+ return {
117
+ "features": mfccs_mean,
118
+ "duration": duration,
119
+ "waveform": y,
120
+ "sample_rate": sr,
121
+ "path": audio_file # Keep path for the pipeline
122
+ }
123
+
124
+ def classify_genre(audio_data):
125
+ """Classify the genre of the audio using the loaded model."""
126
+ try:
127
+ # First attempt: Try using the pipeline if available
128
+ if 'genre_classifier' in globals():
129
+ results = genre_classifier(audio_data["path"])
130
+ # Transform pipeline results to our expected format
131
+ top_genres = [(result["label"], result["score"]) for result in results[:3]]
132
+ return top_genres
133
+
134
+ # Second attempt: Use manually loaded model components
135
+ elif 'genre_processor' in globals() and 'genre_model' in globals():
136
+ # Process audio input with feature extractor
137
+ inputs = genre_processor(
138
+ audio_data["waveform"],
139
+ sampling_rate=audio_data["sample_rate"],
140
+ return_tensors="pt"
141
+ )
142
+
143
+ with torch.no_grad():
144
+ outputs = genre_model(**inputs)
145
+ predictions = outputs.logits.softmax(dim=-1)
146
+
147
+ # Get the top 3 genres
148
+ values, indices = torch.topk(predictions, 3)
149
+
150
+ # Map indices to genre labels
151
+ genre_labels = genre_model.config.id2label
152
+
153
+ top_genres = []
154
+ for i, (value, index) in enumerate(zip(values[0], indices[0])):
155
+ genre = genre_labels[index.item()]
156
+ confidence = value.item()
157
+ top_genres.append((genre, confidence))
158
+
159
+ return top_genres
160
+
161
+ else:
162
+ raise ValueError("No genre classification model available")
163
+
164
+ except Exception as e:
165
+ print(f"Error in genre classification: {str(e)}")
166
+ # Fallback: return a default genre if everything fails
167
+ return [("rock", 1.0)]
168
+
169
+ def generate_lyrics(genre, duration, emotion_results):
170
+ """Generate lyrics based on the genre and with appropriate length."""
171
+ # Calculate appropriate lyrics length based on audio duration
172
+ lines_count = calculate_lyrics_length(duration)
173
+
174
+ # Calculate approximate number of verses and chorus
175
+ if lines_count <= 6:
176
+ # Very short song - one verse and chorus
177
+ verse_lines = 2
178
+ chorus_lines = 2
179
+ elif lines_count <= 10:
180
+ # Medium song - two verses and chorus
181
+ verse_lines = 3
182
+ chorus_lines = 2
183
+ else:
184
+ # Longer song - two verses, chorus, and bridge
185
+ verse_lines = 3
186
+ chorus_lines = 2
187
+
188
+ # Extract emotion and theme data from analysis results
189
+ primary_emotion = emotion_results["emotion_analysis"]["primary_emotion"]
190
+ primary_theme = emotion_results["theme_analysis"]["primary_theme"]
191
+ tempo = emotion_results["rhythm_analysis"]["tempo"]
192
+ key = emotion_results["tonal_analysis"]["key"]
193
+ mode = emotion_results["tonal_analysis"]["mode"]
194
+
195
+ # Create prompt for the LLM
196
+ prompt = f"""
197
+ You are a talented songwriter who specializes in {genre} music.
198
+ Write original {genre} song lyrics for a song that is {duration:.1f} seconds long.
199
+
200
+ Music analysis has detected the following qualities in the music:
201
+ - Tempo: {tempo:.1f} BPM
202
+ - Key: {key} {mode}
203
+ - Primary emotion: {primary_emotion}
204
+ - Primary theme: {primary_theme}
205
+
206
+ The lyrics should:
207
+ - Perfectly capture the essence and style of {genre} music
208
+ - Express the {primary_emotion} emotion and {primary_theme} theme
209
+ - Be approximately {lines_count} lines long
210
+ - Have a coherent theme and flow
211
+ - Follow this structure:
212
+ * Verse: {verse_lines} lines
213
+ * Chorus: {chorus_lines} lines
214
+ * {f'Bridge: 2 lines' if lines_count > 10 else ''}
215
+ - Be completely original
216
+ - Match the song duration of {duration:.1f} seconds
217
+ - Keep each line concise and impactful
218
+
219
+ Your lyrics:
220
+ """
221
+
222
+ # Generate lyrics using the LLM
223
+ response = llm_pipeline(
224
+ prompt,
225
+ do_sample=True,
226
+ temperature=0.7,
227
+ top_p=0.9,
228
+ repetition_penalty=1.1,
229
+ return_full_text=False
230
+ )
231
+
232
+ # Extract and clean generated lyrics
233
+ lyrics = response[0]["generated_text"].strip()
234
+
235
+ # Add section labels if they're not present
236
+ if "Verse" not in lyrics and "Chorus" not in lyrics:
237
+ lines = lyrics.split('\n')
238
+ formatted_lyrics = []
239
+ current_section = "Verse"
240
+ for i, line in enumerate(lines):
241
+ if i == 0:
242
+ formatted_lyrics.append("[Verse]")
243
+ elif i == verse_lines:
244
+ formatted_lyrics.append("\n[Chorus]")
245
+ elif i == verse_lines + chorus_lines and lines_count > 10:
246
+ formatted_lyrics.append("\n[Bridge]")
247
+ formatted_lyrics.append(line)
248
+ lyrics = '\n'.join(formatted_lyrics)
249
+
250
+ return lyrics
251
+
252
+ def detect_music(audio_data):
253
+ """Detect if the audio is music using the MIT AST model."""
254
+ try:
255
+ # First attempt: Try using the pipeline if available
256
+ if 'music_detector' in globals():
257
+ results = music_detector(audio_data["path"])
258
+ # Look for music-related classes in the results
259
+ music_confidence = 0.0
260
+ for result in results:
261
+ label = result["label"].lower()
262
+ if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
263
+ music_confidence = max(music_confidence, result["score"])
264
+ return music_confidence >= 0.5
265
+
266
+ # Second attempt: Use manually loaded model components
267
+ elif 'music_processor' in globals() and 'music_model' in globals():
268
+ # Process audio input with feature extractor
269
+ inputs = music_processor(
270
+ audio_data["waveform"],
271
+ sampling_rate=audio_data["sample_rate"],
272
+ return_tensors="pt"
273
+ )
274
+
275
+ with torch.no_grad():
276
+ outputs = music_model(**inputs)
277
+ predictions = outputs.logits.softmax(dim=-1)
278
+
279
+ # Get the top predictions
280
+ values, indices = torch.topk(predictions, 5)
281
+
282
+ # Map indices to labels
283
+ labels = music_model.config.id2label
284
+
285
+ # Check for music-related classes
286
+ music_confidence = 0.0
287
+ for i, (value, index) in enumerate(zip(values[0], indices[0])):
288
+ label = labels[index.item()].lower()
289
+ if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
290
+ music_confidence = max(music_confidence, value.item())
291
+
292
+ return music_confidence >= 0.5
293
+
294
+ else:
295
+ raise ValueError("No music detection model available")
296
+
297
+ except Exception as e:
298
+ print(f"Error in music detection: {str(e)}")
299
+ return False
300
+
301
+ def process_audio(audio_file):
302
+ """Main function to process audio file, classify genre, and generate lyrics."""
303
+ if audio_file is None:
304
+ return "Please upload an audio file.", None
305
+
306
+ try:
307
+ # Extract audio features
308
+ audio_data = extract_audio_features(audio_file)
309
+
310
+ # First check if it's music
311
+ is_music = detect_music(audio_data)
312
+ if not is_music:
313
+ return "The uploaded audio does not appear to be music. Please upload a music file.", None
314
+
315
+ # Classify genre
316
+ top_genres = classify_genre(audio_data)
317
+
318
+ # Format genre results using utility function
319
+ genre_results = format_genre_results(top_genres)
320
+
321
+ # Analyze music emotions and themes
322
+ emotion_results = music_analyzer.analyze_music(audio_file)
323
+
324
+ # Generate lyrics based on top genre and emotion analysis
325
+ primary_genre, _ = top_genres[0]
326
+ lyrics = generate_lyrics(primary_genre, audio_data["duration"], emotion_results)
327
+
328
+ return genre_results, lyrics
329
+
330
+ except Exception as e:
331
+ return f"Error processing audio: {str(e)}", None
332
+
333
+ # Create Gradio interface
334
+ with gr.Blocks(title="Music Genre Classifier & Lyrics Generator") as demo:
335
+ gr.Markdown("# Music Genre Classifier & Lyrics Generator")
336
+ gr.Markdown("Upload a music file to classify its genre, analyze its emotions, and generate matching lyrics.")
337
+
338
+ with gr.Row():
339
+ with gr.Column():
340
+ audio_input = gr.Audio(label="Upload Music", type="filepath")
341
+ submit_btn = gr.Button("Analyze & Generate")
342
+
343
+ with gr.Column():
344
+ genre_output = gr.Textbox(label="Detected Genres", lines=5)
345
+ emotion_output = gr.Textbox(label="Emotion Analysis", lines=5)
346
+ lyrics_output = gr.Textbox(label="Generated Lyrics", lines=15)
347
+
348
+ def display_results(audio_file):
349
+ if audio_file is None:
350
+ return "Please upload an audio file.", "No emotion analysis available.", None
351
+
352
+ try:
353
+ # Process audio and get genre and lyrics
354
+ genre_results, lyrics = process_audio(audio_file)
355
+
356
+ # Format emotion analysis results
357
+ emotion_results = music_analyzer.analyze_music(audio_file)
358
+ emotion_text = f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM\n"
359
+ emotion_text += f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}\n"
360
+ emotion_text += f"Primary Emotion: {emotion_results['summary']['primary_emotion']}\n"
361
+ emotion_text += f"Primary Theme: {emotion_results['summary']['primary_theme']}"
362
+
363
+ return genre_results, emotion_text, lyrics
364
+ except Exception as e:
365
+ return f"Error: {str(e)}", "Error in emotion analysis", None
366
+
367
+ submit_btn.click(
368
+ fn=display_results,
369
+ inputs=[audio_input],
370
+ outputs=[genre_output, emotion_output, lyrics_output]
371
+ )
372
+
373
+ gr.Markdown("### How it works")
374
+ gr.Markdown("""
375
+ 1. Upload an audio file of your choice
376
+ 2. The system will classify the genre using the dima806/music_genres_classification model
377
+ 3. The system will analyze the musical emotion and theme using advanced audio processing
378
+ 4. Based on the detected genre and emotion, it will generate appropriate lyrics using Llama-3.1-8B-Instruct
379
+ 5. The lyrics length is automatically adjusted based on your audio duration
380
+ """)
381
+
382
+ # Launch the app
383
+ demo.launch()
emotionanalysis.py ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ import numpy as np
3
+ try:
4
+ import matplotlib.pyplot as plt
5
+ except ImportError:
6
+ plt = None
7
+ from scipy.stats import mode
8
+ import warnings
9
+ warnings.filterwarnings('ignore') # Suppress librosa warnings
10
+ class MusicAnalyzer:
11
+ def __init__(self):
12
+ # Emotion feature mappings - these define characteristics of different emotions
13
+ self.emotion_profiles = {
14
+ 'happy': {'tempo': (100, 180), 'energy': (0.6, 1.0), 'major_mode': True, 'brightness': (0.6, 1.0)},
15
+ 'sad': {'tempo': (40, 90), 'energy': (0, 0.5), 'major_mode': False, 'brightness': (0, 0.5)},
16
+ 'calm': {'tempo': (50, 90), 'energy': (0, 0.4), 'major_mode': True, 'brightness': (0.3, 0.6)},
17
+ 'energetic': {'tempo': (110, 200), 'energy': (0.7, 1.0), 'major_mode': True, 'brightness': (0.5, 0.9)},
18
+ 'tense': {'tempo': (70, 140), 'energy': (0.5, 0.9), 'major_mode': False, 'brightness': (0.3, 0.7)},
19
+ 'nostalgic': {'tempo': (60, 100), 'energy': (0.3, 0.7), 'major_mode': None, 'brightness': (0.4, 0.7)}
20
+ }
21
+
22
+ # Theme mappings based on musical features
23
+ self.theme_profiles = {
24
+ 'love': {'emotion': ['happy', 'nostalgic', 'sad'], 'harmony_complexity': (0.3, 0.7)},
25
+ 'triumph': {'emotion': ['energetic', 'happy'], 'harmony_complexity': (0.4, 0.8)},
26
+ 'loss': {'emotion': ['sad', 'nostalgic'], 'harmony_complexity': (0.3, 0.7)},
27
+ 'adventure': {'emotion': ['energetic', 'tense'], 'harmony_complexity': (0.5, 0.9)},
28
+ 'reflection': {'emotion': ['calm', 'nostalgic'], 'harmony_complexity': (0.4, 0.8)},
29
+ 'conflict': {'emotion': ['tense', 'energetic'], 'harmony_complexity': (0.6, 1.0)}
30
+ }
31
+
32
+ # Musical key mapping
33
+ self.key_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
34
+
35
+ def load_audio(self, file_path, sr=22050, duration=None):
36
+ """Load audio file and return time series and sample rate"""
37
+ try:
38
+ y, sr = librosa.load(file_path, sr=sr, duration=duration)
39
+ return y, sr
40
+ except Exception as e:
41
+ print(f"Error loading audio file: {e}")
42
+ return None, None
43
+
44
+ def analyze_rhythm(self, y, sr):
45
+ """Analyze rhythm-related features: tempo, beats, time signature"""
46
+ # Tempo and beat detection
47
+ onset_env = librosa.onset.onset_strength(y=y, sr=sr)
48
+ tempo, beat_frames = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
49
+ beat_times = librosa.frames_to_time(beat_frames, sr=sr)
50
+
51
+ # Beat intervals and regularity
52
+ beat_intervals = np.diff(beat_times) if len(beat_times) > 1 else np.array([0])
53
+ beat_regularity = 1.0 / np.std(beat_intervals) if len(beat_intervals) > 0 and np.std(beat_intervals) > 0 else 0
54
+
55
+ # Rhythm pattern analysis through autocorrelation
56
+ ac = librosa.autocorrelate(onset_env, max_size=sr // 2)
57
+ ac = librosa.util.normalize(ac, norm=np.inf)
58
+
59
+ # Time signature estimation - a challenging task with many limitations
60
+ estimated_signature = self._estimate_time_signature(y, sr, beat_times, onset_env)
61
+
62
+ # Compute onset strength to get a measure of rhythm intensity
63
+ rhythm_intensity = np.mean(onset_env) / np.max(onset_env) if np.max(onset_env) > 0 else 0
64
+
65
+ # Rhythm complexity based on variation in onset strength
66
+ rhythm_complexity = np.std(onset_env) / np.mean(onset_env) if np.mean(onset_env) > 0 else 0
67
+
68
+ return {
69
+ "tempo": float(tempo),
70
+ "beat_times": beat_times.tolist(),
71
+ "beat_intervals": beat_intervals.tolist(),
72
+ "beat_regularity": float(beat_regularity),
73
+ "rhythm_intensity": float(rhythm_intensity),
74
+ "rhythm_complexity": float(rhythm_complexity),
75
+ "estimated_time_signature": estimated_signature
76
+ }
77
+
78
+ def _estimate_time_signature(self, y, sr, beat_times, onset_env):
79
+ """Estimate the time signature based on beat patterns"""
80
+ # This is a simplified approach - accurate time signature detection is complex
81
+ if len(beat_times) < 4:
82
+ return "Unknown"
83
+
84
+ # Analyze beat emphasis patterns to detect meter
85
+ beat_intervals = np.diff(beat_times)
86
+
87
+ # Look for periodicity in the onset envelope
88
+ ac = librosa.autocorrelate(onset_env, max_size=sr)
89
+
90
+ # Find peaks in autocorrelation after the first one (which is at lag 0)
91
+ peaks = librosa.util.peak_pick(ac, pre_max=20, post_max=20, pre_avg=20, post_avg=20, delta=0.1, wait=1)
92
+ peaks = peaks[peaks > 0] # Remove the first peak which is at lag 0
93
+
94
+ if len(peaks) == 0:
95
+ return "4/4" # Default to most common
96
+
97
+ # Convert first significant peak to beats
98
+ first_peak_time = peaks[0] / sr
99
+ beats_per_bar = round(first_peak_time / np.median(beat_intervals))
100
+
101
+ # Map to common time signatures
102
+ if beats_per_bar == 4 or beats_per_bar == 8:
103
+ return "4/4"
104
+ elif beats_per_bar == 3 or beats_per_bar == 6:
105
+ return "3/4"
106
+ elif beats_per_bar == 2:
107
+ return "2/4"
108
+ else:
109
+ return f"{beats_per_bar}/4" # Default assumption
110
+
111
+ def analyze_tonality(self, y, sr):
112
+ """Analyze tonal features: key, mode, harmonic features"""
113
+ # Compute chromagram
114
+ chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
115
+
116
+ # Krumhansl-Schmuckler key-finding algorithm (simplified)
117
+ # Major and minor profiles from music theory research
118
+ major_profile = np.array([6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88])
119
+ minor_profile = np.array([6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17])
120
+
121
+ # Calculate the correlation of the chroma with each key profile
122
+ chroma_avg = np.mean(chroma, axis=1)
123
+ major_corr = np.zeros(12)
124
+ minor_corr = np.zeros(12)
125
+
126
+ for i in range(12):
127
+ major_corr[i] = np.corrcoef(np.roll(chroma_avg, i), major_profile)[0, 1]
128
+ minor_corr[i] = np.corrcoef(np.roll(chroma_avg, i), minor_profile)[0, 1]
129
+
130
+ # Find the key with the highest correlation
131
+ max_major_idx = np.argmax(major_corr)
132
+ max_minor_idx = np.argmax(minor_corr)
133
+
134
+ # Determine if the piece is in a major or minor key
135
+ if major_corr[max_major_idx] > minor_corr[max_minor_idx]:
136
+ mode = "major"
137
+ key = self.key_names[max_major_idx]
138
+ else:
139
+ mode = "minor"
140
+ key = self.key_names[max_minor_idx]
141
+
142
+ # Calculate harmony complexity (variability in harmonic content)
143
+ harmony_complexity = np.std(chroma) / np.mean(chroma) if np.mean(chroma) > 0 else 0
144
+
145
+ # Calculate tonal stability (consistency of tonal center)
146
+ tonal_stability = 1.0 / (np.std(chroma_avg) + 0.001) # Add small value to avoid division by zero
147
+
148
+ # Calculate spectral brightness (center of mass of the spectrum)
149
+ spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
150
+ brightness = np.mean(spectral_centroid) / (sr/2) # Normalize by Nyquist frequency
151
+
152
+ # Calculate dissonance using spectral contrast
153
+ spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
154
+ dissonance = np.mean(spectral_contrast[0]) # Higher values may indicate more dissonance
155
+
156
+ return {
157
+ "key": key,
158
+ "mode": mode,
159
+ "is_major": mode == "major",
160
+ "harmony_complexity": float(harmony_complexity),
161
+ "tonal_stability": float(tonal_stability),
162
+ "brightness": float(brightness),
163
+ "dissonance": float(dissonance)
164
+ }
165
+
166
+ def analyze_energy(self, y, sr):
167
+ """Analyze energy characteristics of the audio"""
168
+ # RMS Energy (overall loudness)
169
+ rms = librosa.feature.rms(y=y)[0]
170
+
171
+ # Energy metrics
172
+ mean_energy = np.mean(rms)
173
+ energy_std = np.std(rms)
174
+ energy_dynamic_range = np.max(rms) - np.min(rms) if len(rms) > 0 else 0
175
+
176
+ # Energy distribution across frequency ranges
177
+ spec = np.abs(librosa.stft(y))
178
+
179
+ # Divide the spectrum into low, mid, and high ranges
180
+ freq_bins = spec.shape[0]
181
+ low_freq_energy = np.mean(spec[:int(freq_bins*0.2), :])
182
+ mid_freq_energy = np.mean(spec[int(freq_bins*0.2):int(freq_bins*0.8), :])
183
+ high_freq_energy = np.mean(spec[int(freq_bins*0.8):, :])
184
+
185
+ # Normalize to create a distribution
186
+ total_energy = low_freq_energy + mid_freq_energy + high_freq_energy
187
+ if total_energy > 0:
188
+ low_freq_ratio = low_freq_energy / total_energy
189
+ mid_freq_ratio = mid_freq_energy / total_energy
190
+ high_freq_ratio = high_freq_energy / total_energy
191
+ else:
192
+ low_freq_ratio = mid_freq_ratio = high_freq_ratio = 1/3
193
+
194
+ return {
195
+ "mean_energy": float(mean_energy),
196
+ "energy_std": float(energy_std),
197
+ "energy_dynamic_range": float(energy_dynamic_range),
198
+ "frequency_distribution": {
199
+ "low_freq": float(low_freq_ratio),
200
+ "mid_freq": float(mid_freq_ratio),
201
+ "high_freq": float(high_freq_ratio)
202
+ }
203
+ }
204
+
205
+ def analyze_emotion(self, rhythm_data, tonal_data, energy_data):
206
+ """Classify the emotion based on musical features"""
207
+ # Extract key features for emotion detection
208
+ tempo = rhythm_data["tempo"]
209
+ is_major = tonal_data["is_major"]
210
+ energy = energy_data["mean_energy"]
211
+ brightness = tonal_data["brightness"]
212
+
213
+ # Calculate scores for each emotion
214
+ emotion_scores = {}
215
+ for emotion, profile in self.emotion_profiles.items():
216
+ score = 0.0
217
+
218
+ # Tempo contribution (0-1 score)
219
+ tempo_range = profile["tempo"]
220
+ if tempo_range[0] <= tempo <= tempo_range[1]:
221
+ score += 1.0
222
+ else:
223
+ # Partial score based on distance
224
+ distance = min(abs(tempo - tempo_range[0]), abs(tempo - tempo_range[1]))
225
+ max_distance = 40 # Maximum distance to consider
226
+ score += max(0, 1 - (distance / max_distance))
227
+
228
+ # Energy contribution (0-1 score)
229
+ energy_range = profile["energy"]
230
+ if energy_range[0] <= energy <= energy_range[1]:
231
+ score += 1.0
232
+ else:
233
+ # Partial score based on distance
234
+ distance = min(abs(energy - energy_range[0]), abs(energy - energy_range[1]))
235
+ max_distance = 0.5 # Maximum distance to consider
236
+ score += max(0, 1 - (distance / max_distance))
237
+
238
+ # Mode contribution (0-1 score)
239
+ if profile["major_mode"] is not None: # Some emotions don't have strong mode preference
240
+ score += 1.0 if profile["major_mode"] == is_major else 0.0
241
+ else:
242
+ score += 0.5 # Neutral contribution
243
+
244
+ # Brightness contribution (0-1 score)
245
+ brightness_range = profile["brightness"]
246
+ if brightness_range[0] <= brightness <= brightness_range[1]:
247
+ score += 1.0
248
+ else:
249
+ # Partial score based on distance
250
+ distance = min(abs(brightness - brightness_range[0]), abs(brightness - brightness_range[1]))
251
+ max_distance = 0.5 # Maximum distance to consider
252
+ score += max(0, 1 - (distance / max_distance))
253
+
254
+ # Normalize score (0-1 range)
255
+ emotion_scores[emotion] = score / 4.0
256
+
257
+ # Find primary emotion
258
+ primary_emotion = max(emotion_scores.items(), key=lambda x: x[1])
259
+
260
+ # Calculate valence and arousal (dimensional emotion model)
261
+ # Mapping different emotions to valence-arousal space
262
+ valence_map = {
263
+ 'happy': 0.8, 'sad': 0.2, 'calm': 0.6,
264
+ 'energetic': 0.7, 'tense': 0.3, 'nostalgic': 0.5
265
+ }
266
+
267
+ arousal_map = {
268
+ 'happy': 0.7, 'sad': 0.3, 'calm': 0.2,
269
+ 'energetic': 0.9, 'tense': 0.8, 'nostalgic': 0.4
270
+ }
271
+
272
+ # Calculate weighted valence and arousal
273
+ total_weight = sum(emotion_scores.values())
274
+ if total_weight > 0:
275
+ valence = sum(score * valence_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
276
+ arousal = sum(score * arousal_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
277
+ else:
278
+ valence = 0.5
279
+ arousal = 0.5
280
+
281
+ return {
282
+ "primary_emotion": primary_emotion[0],
283
+ "confidence": primary_emotion[1],
284
+ "emotion_scores": emotion_scores,
285
+ "valence": float(valence), # Pleasure dimension (0-1)
286
+ "arousal": float(arousal) # Activity dimension (0-1)
287
+ }
288
+
289
+ def analyze_theme(self, rhythm_data, tonal_data, emotion_data):
290
+ """Infer potential themes based on musical features and emotion"""
291
+ # Extract relevant features
292
+ primary_emotion = emotion_data["primary_emotion"]
293
+ harmony_complexity = tonal_data["harmony_complexity"]
294
+
295
+ # Calculate theme scores
296
+ theme_scores = {}
297
+ for theme, profile in self.theme_profiles.items():
298
+ score = 0.0
299
+
300
+ # Emotion contribution
301
+ if primary_emotion in profile["emotion"]:
302
+ # Emotions listed earlier have stronger connection to the theme
303
+ position_weight = 1.0 / (profile["emotion"].index(primary_emotion) + 1)
304
+ score += position_weight
305
+
306
+ # Secondary emotions contribution
307
+ secondary_emotions = [e for e, s in emotion_data["emotion_scores"].items()
308
+ if s > 0.5 and e != primary_emotion]
309
+ for emotion in secondary_emotions:
310
+ if emotion in profile["emotion"]:
311
+ score += 0.3 # Less weight than primary emotion
312
+
313
+ # Harmony complexity contribution
314
+ complexity_range = profile["harmony_complexity"]
315
+ if complexity_range[0] <= harmony_complexity <= complexity_range[1]:
316
+ score += 1.0
317
+ else:
318
+ # Partial score based on distance
319
+ distance = min(abs(harmony_complexity - complexity_range[0]),
320
+ abs(harmony_complexity - complexity_range[1]))
321
+ max_distance = 0.5 # Maximum distance to consider
322
+ score += max(0, 1 - (distance / max_distance))
323
+
324
+ # Normalize score
325
+ theme_scores[theme] = min(1.0, score / 2.5)
326
+
327
+ # Find primary theme
328
+ primary_theme = max(theme_scores.items(), key=lambda x: x[1])
329
+
330
+ # Find secondary themes (scores > 0.5)
331
+ secondary_themes = [(theme, score) for theme, score in theme_scores.items()
332
+ if score > 0.5 and theme != primary_theme[0]]
333
+ secondary_themes.sort(key=lambda x: x[1], reverse=True)
334
+
335
+ return {
336
+ "primary_theme": primary_theme[0],
337
+ "confidence": primary_theme[1],
338
+ "secondary_themes": [t[0] for t in secondary_themes[:2]], # Top 2 secondary themes
339
+ "theme_scores": theme_scores
340
+ }
341
+
342
+ def analyze_music(self, file_path):
343
+ """Main function to perform comprehensive music analysis"""
344
+ # Load the audio file
345
+ y, sr = self.load_audio(file_path)
346
+ if y is None:
347
+ return {"error": "Failed to load audio file"}
348
+
349
+ # Run all analyses
350
+ rhythm_data = self.analyze_rhythm(y, sr)
351
+ tonal_data = self.analyze_tonality(y, sr)
352
+ energy_data = self.analyze_energy(y, sr)
353
+
354
+ # Higher-level analyses that depend on the basic features
355
+ emotion_data = self.analyze_emotion(rhythm_data, tonal_data, energy_data)
356
+ theme_data = self.analyze_theme(rhythm_data, tonal_data, emotion_data)
357
+
358
+ # Combine all results
359
+ return {
360
+ "file": file_path,
361
+ "rhythm_analysis": rhythm_data,
362
+ "tonal_analysis": tonal_data,
363
+ "energy_analysis": energy_data,
364
+ "emotion_analysis": emotion_data,
365
+ "theme_analysis": theme_data,
366
+ "summary": {
367
+ "tempo": rhythm_data["tempo"],
368
+ "time_signature": rhythm_data["estimated_time_signature"],
369
+ "key": tonal_data["key"],
370
+ "mode": tonal_data["mode"],
371
+ "primary_emotion": emotion_data["primary_emotion"],
372
+ "primary_theme": theme_data["primary_theme"]
373
+ }
374
+ }
375
+
376
+ # def visualize_analysis(self, file_path):
377
+ # """Create visualizations for the music analysis results"""
378
+ # # Check if matplotlib is available
379
+ # if plt is None:
380
+ # print("Error: matplotlib is not installed. Visualization is not available.")
381
+ # return
382
+ #
383
+ # # Load audio and run analysis
384
+ # y, sr = self.load_audio(file_path)
385
+ # if y is None:
386
+ # print("Error: Failed to load audio file")
387
+ # return
388
+ #
389
+ # results = self.analyze_music(file_path)
390
+ #
391
+ # # Create visualization
392
+ # plt.figure(figsize=(15, 12))
393
+
394
+ # # Waveform
395
+ # plt.subplot(3, 2, 1)
396
+ # librosa.display.waveshow(y, sr=sr, alpha=0.6)
397
+ # plt.title(f'Waveform (Tempo: {results["rhythm_analysis"]["tempo"]:.1f} BPM)')
398
+
399
+ # # Spectrogram
400
+ # plt.subplot(3, 2, 2)
401
+ # D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
402
+ # librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
403
+ # plt.colorbar(format='%+2.0f dB')
404
+ # plt.title(f'Spectrogram (Key: {results["tonal_analysis"]["key"]} {results["tonal_analysis"]["mode"]})')
405
+
406
+ # # Chromagram
407
+ # plt.subplot(3, 2, 3)
408
+ # chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
409
+ # librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
410
+ # plt.colorbar()
411
+ # plt.title('Chromagram')
412
+
413
+ # # Onset strength and beats
414
+ # plt.subplot(3, 2, 4)
415
+ # onset_env = librosa.onset.onset_strength(y=y, sr=sr)
416
+ # times = librosa.times_like(onset_env, sr=sr)
417
+ # plt.plot(times, librosa.util.normalize(onset_env), label='Onset strength')
418
+ # plt.vlines(results["rhythm_analysis"]["beat_times"], 0, 1, alpha=0.5, color='r',
419
+ # linestyle='--', label='Beats')
420
+ # plt.legend()
421
+ # plt.title('Rhythm Analysis')
422
+
423
+ # # Emotion scores
424
+ # plt.subplot(3, 2, 5)
425
+ # emotions = list(results["emotion_analysis"]["emotion_scores"].keys())
426
+ # scores = list(results["emotion_analysis"]["emotion_scores"].values())
427
+ # plt.bar(emotions, scores, color='skyblue')
428
+ # plt.ylim(0, 1)
429
+ # plt.title(f'Emotion Analysis (Primary: {results["emotion_analysis"]["primary_emotion"]})')
430
+ # plt.xticks(rotation=45)
431
+
432
+ # # Theme scores
433
+ # plt.subplot(3, 2, 6)
434
+ # themes = list(results["theme_analysis"]["theme_scores"].keys())
435
+ # scores = list(results["theme_analysis"]["theme_scores"].values())
436
+ # plt.bar(themes, scores, color='lightgreen')
437
+ # plt.ylim(0, 1)
438
+ # plt.title(f'Theme Analysis (Primary: {results["theme_analysis"]["primary_theme"]})')
439
+ # plt.xticks(rotation=45)
440
+
441
+ # plt.tight_layout()
442
+ # plt.show()
443
+
444
+
445
+ # Create an instance of the analyzer
446
+ analyzer = MusicAnalyzer()
447
+
448
+ # The following code is for demonstration purposes only
449
+ # and will only run if executed directly (not when imported)
450
+ if __name__ == "__main__":
451
+ # Replace this with a real audio file path when running as a script
452
+ demo_file = "path/to/your/audio/file.mp3"
453
+
454
+ # Analyze the uploaded audio file
455
+ results = analyzer.analyze_music(demo_file)
456
+
457
+ # Print analysis summary
458
+ print("\n=== MUSIC ANALYSIS SUMMARY ===")
459
+ print(f"Tempo: {results['summary']['tempo']:.1f} BPM")
460
+ print(f"Time Signature: {results['summary']['time_signature']}")
461
+ print(f"Key: {results['summary']['key']} {results['summary']['mode']}")
462
+ print(f"Primary Emotion: {results['summary']['primary_emotion']}")
463
+ print(f"Primary Theme: {results['summary']['primary_theme']}")
464
+
465
+ # Show detailed results (optional)
466
+ import json
467
+ print("\n=== DETAILED ANALYSIS ===")
468
+ print(json.dumps(results, indent=2))
469
+
470
+ # Visualize the analysis
471
+ # analyzer.visualize_analysis(demo_file)
example.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from app import process_audio, music_analyzer
4
+
5
+ def main():
6
+ """
7
+ Example function to demonstrate the application with a sample audio file.
8
+
9
+ Usage:
10
+ python example.py <path_to_audio_file>
11
+ """
12
+ if len(sys.argv) != 2:
13
+ print("Usage: python example.py <path_to_audio_file>")
14
+ return
15
+
16
+ audio_file = sys.argv[1]
17
+ if not os.path.exists(audio_file):
18
+ print(f"Error: File {audio_file} does not exist.")
19
+ return
20
+
21
+ print(f"Processing audio file: {audio_file}")
22
+
23
+ # Call the main processing function
24
+ genre_results, lyrics = process_audio(audio_file)
25
+
26
+ # Get emotion analysis results
27
+ emotion_results = music_analyzer.analyze_music(audio_file)
28
+
29
+ # Print results
30
+ print("\n" + "="*50)
31
+ print("GENRE CLASSIFICATION RESULTS:")
32
+ print("="*50)
33
+ print(genre_results)
34
+
35
+ print("\n" + "="*50)
36
+ print("EMOTION ANALYSIS RESULTS:")
37
+ print("="*50)
38
+ print(f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM")
39
+ print(f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}")
40
+ print(f"Primary Emotion: {emotion_results['summary']['primary_emotion']}")
41
+ print(f"Primary Theme: {emotion_results['summary']['primary_theme']}")
42
+
43
+ print("\n" + "="*50)
44
+ print("GENERATED LYRICS:")
45
+ print("="*50)
46
+ print(lyrics)
47
+
48
+ if __name__ == "__main__":
49
+ main()
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=5.22.0
2
+ transformers>=4.36.2
3
+ torch>=2.1.2
4
+ torchaudio>=2.1.2
5
+ numpy>=1.26.2
6
+ accelerate>=0.25.0
7
+ librosa>=0.10.1
8
+ huggingface-hub>=0.20.3
9
+ bitsandbytes>=0.41.1
10
+ sentencepiece>=0.1.99
11
+ safetensors>=0.4.1
12
+ scipy>=1.12.0
13
+ soundfile>=0.12.1
14
+ matplotlib>=3.7.0
utils.py ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ import librosa
4
+
5
+ def load_audio(audio_file, sr=22050):
6
+ """Load an audio file and convert to mono if needed."""
7
+ try:
8
+ # Try to load audio with librosa
9
+ y, sr = librosa.load(audio_file, sr=sr, mono=True)
10
+ return y, sr
11
+ except Exception as e:
12
+ print(f"Error loading audio with librosa: {str(e)}")
13
+ # Fallback to basic loading if necessary
14
+ import soundfile as sf
15
+ try:
16
+ y, sr = sf.read(audio_file)
17
+ # Convert to mono if stereo
18
+ if len(y.shape) > 1:
19
+ y = y.mean(axis=1)
20
+ return y, sr
21
+ except Exception as e2:
22
+ print(f"Error loading audio with soundfile: {str(e2)}")
23
+ raise ValueError(f"Could not load audio file: {audio_file}")
24
+
25
+ def extract_audio_duration(y, sr):
26
+ """Get the duration of audio in seconds."""
27
+ return len(y) / sr
28
+
29
+ def extract_mfcc_features(y, sr, n_mfcc=20):
30
+ """Extract MFCC features from audio."""
31
+ try:
32
+ mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
33
+ mfccs_mean = np.mean(mfccs.T, axis=0)
34
+ return mfccs_mean
35
+ except Exception as e:
36
+ print(f"Error extracting MFCCs: {str(e)}")
37
+ # Return a fallback feature vector if extraction fails
38
+ return np.zeros(n_mfcc)
39
+
40
+ def calculate_lyrics_length(duration):
41
+ """
42
+ Calculate appropriate lyrics length based on audio duration.
43
+ Uses a more conservative calculation that generates shorter lyrics:
44
+ - Average words per line (8-10 words)
45
+ - Reduced words per minute (45 words instead of 135)
46
+ - Simplified song structure
47
+ """
48
+ # Convert duration to minutes
49
+ duration_minutes = duration / 60
50
+
51
+ # Calculate total words based on duration
52
+ # Using 45 words per minute (reduced from 135)
53
+ total_words = int(duration_minutes * 90)
54
+
55
+ # Calculate number of lines
56
+ # Assuming 8-10 words per line
57
+ words_per_line = 9 # average
58
+ total_lines = total_words // words_per_line
59
+
60
+ # Adjust for song structure with shorter lengths
61
+ if total_lines < 6:
62
+ # Very short song - keep it simple
63
+ return max(2, total_lines)
64
+ elif total_lines < 10:
65
+ # Short song - one verse and chorus
66
+ return min(6, total_lines)
67
+ elif total_lines < 15:
68
+ # Medium song - two verses and chorus
69
+ return min(10, total_lines)
70
+ else:
71
+ # Longer song - two verses, chorus, and bridge
72
+ return min(15, total_lines)
73
+
74
+ def format_genre_results(top_genres):
75
+ """Format genre classification results for display."""
76
+ result = "Top Detected Genres:\n"
77
+ for genre, confidence in top_genres:
78
+ result += f"- {genre}: {confidence*100:.2f}%\n"
79
+ return result
80
+
81
+ def ensure_cuda_availability():
82
+ """Check and report CUDA availability for informational purposes."""
83
+ cuda_available = torch.cuda.is_available()
84
+ if cuda_available:
85
+ device_count = torch.cuda.device_count()
86
+ device_name = torch.cuda.get_device_name(0) if device_count > 0 else "Unknown"
87
+ print(f"CUDA is available with {device_count} device(s). Using: {device_name}")
88
+ else:
89
+ print("CUDA is not available. Using CPU for inference.")
90
+ return cuda_available
91
+
92
+ def preprocess_audio_for_model(waveform, sample_rate, target_sample_rate=16000, max_length=16000):
93
+ """Preprocess audio for model input (resample, pad/trim)."""
94
+ # Resample if needed
95
+ if sample_rate != target_sample_rate:
96
+ waveform = librosa.resample(waveform, orig_sr=sample_rate, target_sr=target_sample_rate)
97
+
98
+ # Trim or pad to expected length
99
+ if len(waveform) > max_length:
100
+ waveform = waveform[:max_length]
101
+ elif len(waveform) < max_length:
102
+ padding = max_length - len(waveform)
103
+ waveform = np.pad(waveform, (0, padding), 'constant')
104
+
105
+ return waveform