root
commited on
Commit
·
9e21eef
1
Parent(s):
a51853d
copyprevious
Browse files- DEPLOYMENT.md +42 -0
- README.md +42 -8
- app.py +383 -0
- emotionanalysis.py +471 -0
- example.py +49 -0
- requirements.txt +14 -0
- utils.py +105 -0
DEPLOYMENT.md
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Deploying to Hugging Face Spaces
|
2 |
+
|
3 |
+
This guide explains how to deploy the Music Genre Classifier & Lyrics Generator to Hugging Face Spaces.
|
4 |
+
|
5 |
+
## Prerequisites
|
6 |
+
|
7 |
+
1. A Hugging Face account
|
8 |
+
2. Access to the Llama 3.1 8B Instruct model (requires acceptance of the model license)
|
9 |
+
3. A Hugging Face API token
|
10 |
+
|
11 |
+
## Deployment Steps
|
12 |
+
|
13 |
+
### 1. Create a New Space
|
14 |
+
|
15 |
+
1. Go to the Hugging Face website and log in
|
16 |
+
2. Navigate to "Spaces" in the top navigation
|
17 |
+
3. Click "Create new Space"
|
18 |
+
4. Choose "Gradio" as the SDK
|
19 |
+
5. Give your Space a name and description
|
20 |
+
6. Select "T4 GPU" as the hardware
|
21 |
+
|
22 |
+
### 2. Set up Environment Variables
|
23 |
+
|
24 |
+
Set up your Hugging Face access token as an environment variable:
|
25 |
+
|
26 |
+
1. Go to your profile settings in Hugging Face
|
27 |
+
2. Navigate to "Access Tokens" and create a new token with "write" access
|
28 |
+
3. In your Space settings, under "Repository secrets", add a new secret:
|
29 |
+
- Name: `HF_TOKEN`
|
30 |
+
- Value: Your Hugging Face access token
|
31 |
+
|
32 |
+
### 3. Upload the Files
|
33 |
+
|
34 |
+
Upload all the files from this repository to your Space.
|
35 |
+
|
36 |
+
### 4. Wait for Deployment
|
37 |
+
|
38 |
+
Hugging Face will automatically build and deploy your Space. This may take a few minutes, especially since it needs to download the models.
|
39 |
+
|
40 |
+
### 5. Access Your Application
|
41 |
+
|
42 |
+
Once deployed, you can access your application on your Hugging Face Space URL.
|
README.md
CHANGED
@@ -1,13 +1,47 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
-
sdk:
|
7 |
-
sdk_version:
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
-
|
|
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Music Genre Classifier & Lyrics Generator
|
3 |
+
emoji: 🎵
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: purple
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.22.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: mit
|
11 |
+
short_description: AI music genre detection and lyrics generation
|
12 |
---
|
13 |
|
14 |
+
# Music Genre Classifier & Lyrics Generator
|
15 |
+
|
16 |
+
This Hugging Face Space application provides two AI-powered features:
|
17 |
+
|
18 |
+
1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
|
19 |
+
|
20 |
+
2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
|
21 |
+
|
22 |
+
## Features
|
23 |
+
|
24 |
+
- Upload any music file for instant genre classification
|
25 |
+
- Receive genre predictions with confidence scores
|
26 |
+
- Get AI-generated lyrics tailored to the detected music genre
|
27 |
+
- Lyrics length is automatically adjusted based on the song duration
|
28 |
+
- Simple and intuitive user interface
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
1. Visit the live application on Hugging Face Spaces
|
33 |
+
2. Upload your music file using the provided interface
|
34 |
+
3. Click "Analyze & Generate" to process the audio
|
35 |
+
4. View the detected genre and generated lyrics in the output panels
|
36 |
+
|
37 |
+
## Technical Details
|
38 |
+
|
39 |
+
- Uses MFCC features extraction from audio for genre classification
|
40 |
+
- Leverages 4-bit quantization for efficient LLM inference on T4 GPU
|
41 |
+
- Implements a specialized prompt engineering approach to generate genre-specific lyrics
|
42 |
+
- Automatically scales lyrics length based on audio duration
|
43 |
+
|
44 |
+
## Links
|
45 |
+
|
46 |
+
- [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
|
47 |
+
- [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
app.py
ADDED
@@ -0,0 +1,383 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import io
|
3 |
+
import gradio as gr
|
4 |
+
import torch
|
5 |
+
import numpy as np
|
6 |
+
from transformers import (
|
7 |
+
AutoModelForAudioClassification,
|
8 |
+
AutoFeatureExtractor,
|
9 |
+
AutoTokenizer,
|
10 |
+
pipeline,
|
11 |
+
AutoModelForCausalLM,
|
12 |
+
BitsAndBytesConfig
|
13 |
+
)
|
14 |
+
from huggingface_hub import login
|
15 |
+
from utils import (
|
16 |
+
load_audio,
|
17 |
+
extract_audio_duration,
|
18 |
+
extract_mfcc_features,
|
19 |
+
calculate_lyrics_length,
|
20 |
+
format_genre_results,
|
21 |
+
ensure_cuda_availability,
|
22 |
+
preprocess_audio_for_model
|
23 |
+
)
|
24 |
+
from emotionanalysis import MusicAnalyzer
|
25 |
+
|
26 |
+
# Login to Hugging Face Hub if token is provided
|
27 |
+
if "HF_TOKEN" in os.environ:
|
28 |
+
login(token=os.environ["HF_TOKEN"])
|
29 |
+
|
30 |
+
# Constants
|
31 |
+
GENRE_MODEL_NAME = "dima806/music_genres_classification"
|
32 |
+
MUSIC_DETECTION_MODEL = "MIT/ast-finetuned-audioset-10-10-0.4593"
|
33 |
+
LLM_MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
|
34 |
+
SAMPLE_RATE = 22050 # Standard sample rate for audio processing
|
35 |
+
|
36 |
+
# Check CUDA availability (for informational purposes)
|
37 |
+
CUDA_AVAILABLE = ensure_cuda_availability()
|
38 |
+
|
39 |
+
# Create music detection pipeline
|
40 |
+
print(f"Loading music detection model: {MUSIC_DETECTION_MODEL}")
|
41 |
+
try:
|
42 |
+
music_detector = pipeline(
|
43 |
+
"audio-classification",
|
44 |
+
model=MUSIC_DETECTION_MODEL,
|
45 |
+
device=0 if CUDA_AVAILABLE else -1
|
46 |
+
)
|
47 |
+
print("Successfully loaded music detection pipeline")
|
48 |
+
except Exception as e:
|
49 |
+
print(f"Error creating music detection pipeline: {str(e)}")
|
50 |
+
# Fallback to manual loading
|
51 |
+
try:
|
52 |
+
music_processor = AutoFeatureExtractor.from_pretrained(MUSIC_DETECTION_MODEL)
|
53 |
+
music_model = AutoModelForAudioClassification.from_pretrained(MUSIC_DETECTION_MODEL)
|
54 |
+
print("Successfully loaded music detection model and feature extractor")
|
55 |
+
except Exception as e2:
|
56 |
+
print(f"Error loading music detection model components: {str(e2)}")
|
57 |
+
raise RuntimeError(f"Could not load music detection model: {str(e2)}")
|
58 |
+
|
59 |
+
# Create genre classification pipeline
|
60 |
+
print(f"Loading audio classification model: {GENRE_MODEL_NAME}")
|
61 |
+
try:
|
62 |
+
genre_classifier = pipeline(
|
63 |
+
"audio-classification",
|
64 |
+
model=GENRE_MODEL_NAME,
|
65 |
+
device=0 if CUDA_AVAILABLE else -1
|
66 |
+
)
|
67 |
+
print("Successfully loaded audio classification pipeline")
|
68 |
+
except Exception as e:
|
69 |
+
print(f"Error creating pipeline: {str(e)}")
|
70 |
+
# Fallback to manual loading
|
71 |
+
try:
|
72 |
+
genre_processor = AutoFeatureExtractor.from_pretrained(GENRE_MODEL_NAME)
|
73 |
+
genre_model = AutoModelForAudioClassification.from_pretrained(GENRE_MODEL_NAME)
|
74 |
+
print("Successfully loaded audio classification model and feature extractor")
|
75 |
+
except Exception as e2:
|
76 |
+
print(f"Error loading model components: {str(e2)}")
|
77 |
+
raise RuntimeError(f"Could not load genre classification model: {str(e2)}")
|
78 |
+
|
79 |
+
# Load LLM with appropriate quantization for T4 GPU
|
80 |
+
bnb_config = BitsAndBytesConfig(
|
81 |
+
load_in_4bit=True,
|
82 |
+
bnb_4bit_quant_type="nf4",
|
83 |
+
bnb_4bit_compute_dtype=torch.float16,
|
84 |
+
)
|
85 |
+
|
86 |
+
llm_tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)
|
87 |
+
llm_model = AutoModelForCausalLM.from_pretrained(
|
88 |
+
LLM_MODEL_NAME,
|
89 |
+
device_map="auto",
|
90 |
+
quantization_config=bnb_config,
|
91 |
+
torch_dtype=torch.float16,
|
92 |
+
)
|
93 |
+
|
94 |
+
# Create LLM pipeline
|
95 |
+
llm_pipeline = pipeline(
|
96 |
+
"text-generation",
|
97 |
+
model=llm_model,
|
98 |
+
tokenizer=llm_tokenizer,
|
99 |
+
max_new_tokens=512,
|
100 |
+
)
|
101 |
+
|
102 |
+
# Initialize music emotion analyzer
|
103 |
+
music_analyzer = MusicAnalyzer()
|
104 |
+
|
105 |
+
def extract_audio_features(audio_file):
|
106 |
+
"""Extract audio features from an audio file."""
|
107 |
+
# Load the audio file using utility function
|
108 |
+
y, sr = load_audio(audio_file, SAMPLE_RATE)
|
109 |
+
|
110 |
+
# Get audio duration in seconds
|
111 |
+
duration = extract_audio_duration(y, sr)
|
112 |
+
|
113 |
+
# Extract MFCCs for genre classification (may not be needed with the pipeline)
|
114 |
+
mfccs_mean = extract_mfcc_features(y, sr, n_mfcc=20)
|
115 |
+
|
116 |
+
return {
|
117 |
+
"features": mfccs_mean,
|
118 |
+
"duration": duration,
|
119 |
+
"waveform": y,
|
120 |
+
"sample_rate": sr,
|
121 |
+
"path": audio_file # Keep path for the pipeline
|
122 |
+
}
|
123 |
+
|
124 |
+
def classify_genre(audio_data):
|
125 |
+
"""Classify the genre of the audio using the loaded model."""
|
126 |
+
try:
|
127 |
+
# First attempt: Try using the pipeline if available
|
128 |
+
if 'genre_classifier' in globals():
|
129 |
+
results = genre_classifier(audio_data["path"])
|
130 |
+
# Transform pipeline results to our expected format
|
131 |
+
top_genres = [(result["label"], result["score"]) for result in results[:3]]
|
132 |
+
return top_genres
|
133 |
+
|
134 |
+
# Second attempt: Use manually loaded model components
|
135 |
+
elif 'genre_processor' in globals() and 'genre_model' in globals():
|
136 |
+
# Process audio input with feature extractor
|
137 |
+
inputs = genre_processor(
|
138 |
+
audio_data["waveform"],
|
139 |
+
sampling_rate=audio_data["sample_rate"],
|
140 |
+
return_tensors="pt"
|
141 |
+
)
|
142 |
+
|
143 |
+
with torch.no_grad():
|
144 |
+
outputs = genre_model(**inputs)
|
145 |
+
predictions = outputs.logits.softmax(dim=-1)
|
146 |
+
|
147 |
+
# Get the top 3 genres
|
148 |
+
values, indices = torch.topk(predictions, 3)
|
149 |
+
|
150 |
+
# Map indices to genre labels
|
151 |
+
genre_labels = genre_model.config.id2label
|
152 |
+
|
153 |
+
top_genres = []
|
154 |
+
for i, (value, index) in enumerate(zip(values[0], indices[0])):
|
155 |
+
genre = genre_labels[index.item()]
|
156 |
+
confidence = value.item()
|
157 |
+
top_genres.append((genre, confidence))
|
158 |
+
|
159 |
+
return top_genres
|
160 |
+
|
161 |
+
else:
|
162 |
+
raise ValueError("No genre classification model available")
|
163 |
+
|
164 |
+
except Exception as e:
|
165 |
+
print(f"Error in genre classification: {str(e)}")
|
166 |
+
# Fallback: return a default genre if everything fails
|
167 |
+
return [("rock", 1.0)]
|
168 |
+
|
169 |
+
def generate_lyrics(genre, duration, emotion_results):
|
170 |
+
"""Generate lyrics based on the genre and with appropriate length."""
|
171 |
+
# Calculate appropriate lyrics length based on audio duration
|
172 |
+
lines_count = calculate_lyrics_length(duration)
|
173 |
+
|
174 |
+
# Calculate approximate number of verses and chorus
|
175 |
+
if lines_count <= 6:
|
176 |
+
# Very short song - one verse and chorus
|
177 |
+
verse_lines = 2
|
178 |
+
chorus_lines = 2
|
179 |
+
elif lines_count <= 10:
|
180 |
+
# Medium song - two verses and chorus
|
181 |
+
verse_lines = 3
|
182 |
+
chorus_lines = 2
|
183 |
+
else:
|
184 |
+
# Longer song - two verses, chorus, and bridge
|
185 |
+
verse_lines = 3
|
186 |
+
chorus_lines = 2
|
187 |
+
|
188 |
+
# Extract emotion and theme data from analysis results
|
189 |
+
primary_emotion = emotion_results["emotion_analysis"]["primary_emotion"]
|
190 |
+
primary_theme = emotion_results["theme_analysis"]["primary_theme"]
|
191 |
+
tempo = emotion_results["rhythm_analysis"]["tempo"]
|
192 |
+
key = emotion_results["tonal_analysis"]["key"]
|
193 |
+
mode = emotion_results["tonal_analysis"]["mode"]
|
194 |
+
|
195 |
+
# Create prompt for the LLM
|
196 |
+
prompt = f"""
|
197 |
+
You are a talented songwriter who specializes in {genre} music.
|
198 |
+
Write original {genre} song lyrics for a song that is {duration:.1f} seconds long.
|
199 |
+
|
200 |
+
Music analysis has detected the following qualities in the music:
|
201 |
+
- Tempo: {tempo:.1f} BPM
|
202 |
+
- Key: {key} {mode}
|
203 |
+
- Primary emotion: {primary_emotion}
|
204 |
+
- Primary theme: {primary_theme}
|
205 |
+
|
206 |
+
The lyrics should:
|
207 |
+
- Perfectly capture the essence and style of {genre} music
|
208 |
+
- Express the {primary_emotion} emotion and {primary_theme} theme
|
209 |
+
- Be approximately {lines_count} lines long
|
210 |
+
- Have a coherent theme and flow
|
211 |
+
- Follow this structure:
|
212 |
+
* Verse: {verse_lines} lines
|
213 |
+
* Chorus: {chorus_lines} lines
|
214 |
+
* {f'Bridge: 2 lines' if lines_count > 10 else ''}
|
215 |
+
- Be completely original
|
216 |
+
- Match the song duration of {duration:.1f} seconds
|
217 |
+
- Keep each line concise and impactful
|
218 |
+
|
219 |
+
Your lyrics:
|
220 |
+
"""
|
221 |
+
|
222 |
+
# Generate lyrics using the LLM
|
223 |
+
response = llm_pipeline(
|
224 |
+
prompt,
|
225 |
+
do_sample=True,
|
226 |
+
temperature=0.7,
|
227 |
+
top_p=0.9,
|
228 |
+
repetition_penalty=1.1,
|
229 |
+
return_full_text=False
|
230 |
+
)
|
231 |
+
|
232 |
+
# Extract and clean generated lyrics
|
233 |
+
lyrics = response[0]["generated_text"].strip()
|
234 |
+
|
235 |
+
# Add section labels if they're not present
|
236 |
+
if "Verse" not in lyrics and "Chorus" not in lyrics:
|
237 |
+
lines = lyrics.split('\n')
|
238 |
+
formatted_lyrics = []
|
239 |
+
current_section = "Verse"
|
240 |
+
for i, line in enumerate(lines):
|
241 |
+
if i == 0:
|
242 |
+
formatted_lyrics.append("[Verse]")
|
243 |
+
elif i == verse_lines:
|
244 |
+
formatted_lyrics.append("\n[Chorus]")
|
245 |
+
elif i == verse_lines + chorus_lines and lines_count > 10:
|
246 |
+
formatted_lyrics.append("\n[Bridge]")
|
247 |
+
formatted_lyrics.append(line)
|
248 |
+
lyrics = '\n'.join(formatted_lyrics)
|
249 |
+
|
250 |
+
return lyrics
|
251 |
+
|
252 |
+
def detect_music(audio_data):
|
253 |
+
"""Detect if the audio is music using the MIT AST model."""
|
254 |
+
try:
|
255 |
+
# First attempt: Try using the pipeline if available
|
256 |
+
if 'music_detector' in globals():
|
257 |
+
results = music_detector(audio_data["path"])
|
258 |
+
# Look for music-related classes in the results
|
259 |
+
music_confidence = 0.0
|
260 |
+
for result in results:
|
261 |
+
label = result["label"].lower()
|
262 |
+
if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
|
263 |
+
music_confidence = max(music_confidence, result["score"])
|
264 |
+
return music_confidence >= 0.5
|
265 |
+
|
266 |
+
# Second attempt: Use manually loaded model components
|
267 |
+
elif 'music_processor' in globals() and 'music_model' in globals():
|
268 |
+
# Process audio input with feature extractor
|
269 |
+
inputs = music_processor(
|
270 |
+
audio_data["waveform"],
|
271 |
+
sampling_rate=audio_data["sample_rate"],
|
272 |
+
return_tensors="pt"
|
273 |
+
)
|
274 |
+
|
275 |
+
with torch.no_grad():
|
276 |
+
outputs = music_model(**inputs)
|
277 |
+
predictions = outputs.logits.softmax(dim=-1)
|
278 |
+
|
279 |
+
# Get the top predictions
|
280 |
+
values, indices = torch.topk(predictions, 5)
|
281 |
+
|
282 |
+
# Map indices to labels
|
283 |
+
labels = music_model.config.id2label
|
284 |
+
|
285 |
+
# Check for music-related classes
|
286 |
+
music_confidence = 0.0
|
287 |
+
for i, (value, index) in enumerate(zip(values[0], indices[0])):
|
288 |
+
label = labels[index.item()].lower()
|
289 |
+
if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
|
290 |
+
music_confidence = max(music_confidence, value.item())
|
291 |
+
|
292 |
+
return music_confidence >= 0.5
|
293 |
+
|
294 |
+
else:
|
295 |
+
raise ValueError("No music detection model available")
|
296 |
+
|
297 |
+
except Exception as e:
|
298 |
+
print(f"Error in music detection: {str(e)}")
|
299 |
+
return False
|
300 |
+
|
301 |
+
def process_audio(audio_file):
|
302 |
+
"""Main function to process audio file, classify genre, and generate lyrics."""
|
303 |
+
if audio_file is None:
|
304 |
+
return "Please upload an audio file.", None
|
305 |
+
|
306 |
+
try:
|
307 |
+
# Extract audio features
|
308 |
+
audio_data = extract_audio_features(audio_file)
|
309 |
+
|
310 |
+
# First check if it's music
|
311 |
+
is_music = detect_music(audio_data)
|
312 |
+
if not is_music:
|
313 |
+
return "The uploaded audio does not appear to be music. Please upload a music file.", None
|
314 |
+
|
315 |
+
# Classify genre
|
316 |
+
top_genres = classify_genre(audio_data)
|
317 |
+
|
318 |
+
# Format genre results using utility function
|
319 |
+
genre_results = format_genre_results(top_genres)
|
320 |
+
|
321 |
+
# Analyze music emotions and themes
|
322 |
+
emotion_results = music_analyzer.analyze_music(audio_file)
|
323 |
+
|
324 |
+
# Generate lyrics based on top genre and emotion analysis
|
325 |
+
primary_genre, _ = top_genres[0]
|
326 |
+
lyrics = generate_lyrics(primary_genre, audio_data["duration"], emotion_results)
|
327 |
+
|
328 |
+
return genre_results, lyrics
|
329 |
+
|
330 |
+
except Exception as e:
|
331 |
+
return f"Error processing audio: {str(e)}", None
|
332 |
+
|
333 |
+
# Create Gradio interface
|
334 |
+
with gr.Blocks(title="Music Genre Classifier & Lyrics Generator") as demo:
|
335 |
+
gr.Markdown("# Music Genre Classifier & Lyrics Generator")
|
336 |
+
gr.Markdown("Upload a music file to classify its genre, analyze its emotions, and generate matching lyrics.")
|
337 |
+
|
338 |
+
with gr.Row():
|
339 |
+
with gr.Column():
|
340 |
+
audio_input = gr.Audio(label="Upload Music", type="filepath")
|
341 |
+
submit_btn = gr.Button("Analyze & Generate")
|
342 |
+
|
343 |
+
with gr.Column():
|
344 |
+
genre_output = gr.Textbox(label="Detected Genres", lines=5)
|
345 |
+
emotion_output = gr.Textbox(label="Emotion Analysis", lines=5)
|
346 |
+
lyrics_output = gr.Textbox(label="Generated Lyrics", lines=15)
|
347 |
+
|
348 |
+
def display_results(audio_file):
|
349 |
+
if audio_file is None:
|
350 |
+
return "Please upload an audio file.", "No emotion analysis available.", None
|
351 |
+
|
352 |
+
try:
|
353 |
+
# Process audio and get genre and lyrics
|
354 |
+
genre_results, lyrics = process_audio(audio_file)
|
355 |
+
|
356 |
+
# Format emotion analysis results
|
357 |
+
emotion_results = music_analyzer.analyze_music(audio_file)
|
358 |
+
emotion_text = f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM\n"
|
359 |
+
emotion_text += f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}\n"
|
360 |
+
emotion_text += f"Primary Emotion: {emotion_results['summary']['primary_emotion']}\n"
|
361 |
+
emotion_text += f"Primary Theme: {emotion_results['summary']['primary_theme']}"
|
362 |
+
|
363 |
+
return genre_results, emotion_text, lyrics
|
364 |
+
except Exception as e:
|
365 |
+
return f"Error: {str(e)}", "Error in emotion analysis", None
|
366 |
+
|
367 |
+
submit_btn.click(
|
368 |
+
fn=display_results,
|
369 |
+
inputs=[audio_input],
|
370 |
+
outputs=[genre_output, emotion_output, lyrics_output]
|
371 |
+
)
|
372 |
+
|
373 |
+
gr.Markdown("### How it works")
|
374 |
+
gr.Markdown("""
|
375 |
+
1. Upload an audio file of your choice
|
376 |
+
2. The system will classify the genre using the dima806/music_genres_classification model
|
377 |
+
3. The system will analyze the musical emotion and theme using advanced audio processing
|
378 |
+
4. Based on the detected genre and emotion, it will generate appropriate lyrics using Llama-3.1-8B-Instruct
|
379 |
+
5. The lyrics length is automatically adjusted based on your audio duration
|
380 |
+
""")
|
381 |
+
|
382 |
+
# Launch the app
|
383 |
+
demo.launch()
|
emotionanalysis.py
ADDED
@@ -0,0 +1,471 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import librosa
|
2 |
+
import numpy as np
|
3 |
+
try:
|
4 |
+
import matplotlib.pyplot as plt
|
5 |
+
except ImportError:
|
6 |
+
plt = None
|
7 |
+
from scipy.stats import mode
|
8 |
+
import warnings
|
9 |
+
warnings.filterwarnings('ignore') # Suppress librosa warnings
|
10 |
+
class MusicAnalyzer:
|
11 |
+
def __init__(self):
|
12 |
+
# Emotion feature mappings - these define characteristics of different emotions
|
13 |
+
self.emotion_profiles = {
|
14 |
+
'happy': {'tempo': (100, 180), 'energy': (0.6, 1.0), 'major_mode': True, 'brightness': (0.6, 1.0)},
|
15 |
+
'sad': {'tempo': (40, 90), 'energy': (0, 0.5), 'major_mode': False, 'brightness': (0, 0.5)},
|
16 |
+
'calm': {'tempo': (50, 90), 'energy': (0, 0.4), 'major_mode': True, 'brightness': (0.3, 0.6)},
|
17 |
+
'energetic': {'tempo': (110, 200), 'energy': (0.7, 1.0), 'major_mode': True, 'brightness': (0.5, 0.9)},
|
18 |
+
'tense': {'tempo': (70, 140), 'energy': (0.5, 0.9), 'major_mode': False, 'brightness': (0.3, 0.7)},
|
19 |
+
'nostalgic': {'tempo': (60, 100), 'energy': (0.3, 0.7), 'major_mode': None, 'brightness': (0.4, 0.7)}
|
20 |
+
}
|
21 |
+
|
22 |
+
# Theme mappings based on musical features
|
23 |
+
self.theme_profiles = {
|
24 |
+
'love': {'emotion': ['happy', 'nostalgic', 'sad'], 'harmony_complexity': (0.3, 0.7)},
|
25 |
+
'triumph': {'emotion': ['energetic', 'happy'], 'harmony_complexity': (0.4, 0.8)},
|
26 |
+
'loss': {'emotion': ['sad', 'nostalgic'], 'harmony_complexity': (0.3, 0.7)},
|
27 |
+
'adventure': {'emotion': ['energetic', 'tense'], 'harmony_complexity': (0.5, 0.9)},
|
28 |
+
'reflection': {'emotion': ['calm', 'nostalgic'], 'harmony_complexity': (0.4, 0.8)},
|
29 |
+
'conflict': {'emotion': ['tense', 'energetic'], 'harmony_complexity': (0.6, 1.0)}
|
30 |
+
}
|
31 |
+
|
32 |
+
# Musical key mapping
|
33 |
+
self.key_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
|
34 |
+
|
35 |
+
def load_audio(self, file_path, sr=22050, duration=None):
|
36 |
+
"""Load audio file and return time series and sample rate"""
|
37 |
+
try:
|
38 |
+
y, sr = librosa.load(file_path, sr=sr, duration=duration)
|
39 |
+
return y, sr
|
40 |
+
except Exception as e:
|
41 |
+
print(f"Error loading audio file: {e}")
|
42 |
+
return None, None
|
43 |
+
|
44 |
+
def analyze_rhythm(self, y, sr):
|
45 |
+
"""Analyze rhythm-related features: tempo, beats, time signature"""
|
46 |
+
# Tempo and beat detection
|
47 |
+
onset_env = librosa.onset.onset_strength(y=y, sr=sr)
|
48 |
+
tempo, beat_frames = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
|
49 |
+
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
|
50 |
+
|
51 |
+
# Beat intervals and regularity
|
52 |
+
beat_intervals = np.diff(beat_times) if len(beat_times) > 1 else np.array([0])
|
53 |
+
beat_regularity = 1.0 / np.std(beat_intervals) if len(beat_intervals) > 0 and np.std(beat_intervals) > 0 else 0
|
54 |
+
|
55 |
+
# Rhythm pattern analysis through autocorrelation
|
56 |
+
ac = librosa.autocorrelate(onset_env, max_size=sr // 2)
|
57 |
+
ac = librosa.util.normalize(ac, norm=np.inf)
|
58 |
+
|
59 |
+
# Time signature estimation - a challenging task with many limitations
|
60 |
+
estimated_signature = self._estimate_time_signature(y, sr, beat_times, onset_env)
|
61 |
+
|
62 |
+
# Compute onset strength to get a measure of rhythm intensity
|
63 |
+
rhythm_intensity = np.mean(onset_env) / np.max(onset_env) if np.max(onset_env) > 0 else 0
|
64 |
+
|
65 |
+
# Rhythm complexity based on variation in onset strength
|
66 |
+
rhythm_complexity = np.std(onset_env) / np.mean(onset_env) if np.mean(onset_env) > 0 else 0
|
67 |
+
|
68 |
+
return {
|
69 |
+
"tempo": float(tempo),
|
70 |
+
"beat_times": beat_times.tolist(),
|
71 |
+
"beat_intervals": beat_intervals.tolist(),
|
72 |
+
"beat_regularity": float(beat_regularity),
|
73 |
+
"rhythm_intensity": float(rhythm_intensity),
|
74 |
+
"rhythm_complexity": float(rhythm_complexity),
|
75 |
+
"estimated_time_signature": estimated_signature
|
76 |
+
}
|
77 |
+
|
78 |
+
def _estimate_time_signature(self, y, sr, beat_times, onset_env):
|
79 |
+
"""Estimate the time signature based on beat patterns"""
|
80 |
+
# This is a simplified approach - accurate time signature detection is complex
|
81 |
+
if len(beat_times) < 4:
|
82 |
+
return "Unknown"
|
83 |
+
|
84 |
+
# Analyze beat emphasis patterns to detect meter
|
85 |
+
beat_intervals = np.diff(beat_times)
|
86 |
+
|
87 |
+
# Look for periodicity in the onset envelope
|
88 |
+
ac = librosa.autocorrelate(onset_env, max_size=sr)
|
89 |
+
|
90 |
+
# Find peaks in autocorrelation after the first one (which is at lag 0)
|
91 |
+
peaks = librosa.util.peak_pick(ac, pre_max=20, post_max=20, pre_avg=20, post_avg=20, delta=0.1, wait=1)
|
92 |
+
peaks = peaks[peaks > 0] # Remove the first peak which is at lag 0
|
93 |
+
|
94 |
+
if len(peaks) == 0:
|
95 |
+
return "4/4" # Default to most common
|
96 |
+
|
97 |
+
# Convert first significant peak to beats
|
98 |
+
first_peak_time = peaks[0] / sr
|
99 |
+
beats_per_bar = round(first_peak_time / np.median(beat_intervals))
|
100 |
+
|
101 |
+
# Map to common time signatures
|
102 |
+
if beats_per_bar == 4 or beats_per_bar == 8:
|
103 |
+
return "4/4"
|
104 |
+
elif beats_per_bar == 3 or beats_per_bar == 6:
|
105 |
+
return "3/4"
|
106 |
+
elif beats_per_bar == 2:
|
107 |
+
return "2/4"
|
108 |
+
else:
|
109 |
+
return f"{beats_per_bar}/4" # Default assumption
|
110 |
+
|
111 |
+
def analyze_tonality(self, y, sr):
|
112 |
+
"""Analyze tonal features: key, mode, harmonic features"""
|
113 |
+
# Compute chromagram
|
114 |
+
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
|
115 |
+
|
116 |
+
# Krumhansl-Schmuckler key-finding algorithm (simplified)
|
117 |
+
# Major and minor profiles from music theory research
|
118 |
+
major_profile = np.array([6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88])
|
119 |
+
minor_profile = np.array([6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17])
|
120 |
+
|
121 |
+
# Calculate the correlation of the chroma with each key profile
|
122 |
+
chroma_avg = np.mean(chroma, axis=1)
|
123 |
+
major_corr = np.zeros(12)
|
124 |
+
minor_corr = np.zeros(12)
|
125 |
+
|
126 |
+
for i in range(12):
|
127 |
+
major_corr[i] = np.corrcoef(np.roll(chroma_avg, i), major_profile)[0, 1]
|
128 |
+
minor_corr[i] = np.corrcoef(np.roll(chroma_avg, i), minor_profile)[0, 1]
|
129 |
+
|
130 |
+
# Find the key with the highest correlation
|
131 |
+
max_major_idx = np.argmax(major_corr)
|
132 |
+
max_minor_idx = np.argmax(minor_corr)
|
133 |
+
|
134 |
+
# Determine if the piece is in a major or minor key
|
135 |
+
if major_corr[max_major_idx] > minor_corr[max_minor_idx]:
|
136 |
+
mode = "major"
|
137 |
+
key = self.key_names[max_major_idx]
|
138 |
+
else:
|
139 |
+
mode = "minor"
|
140 |
+
key = self.key_names[max_minor_idx]
|
141 |
+
|
142 |
+
# Calculate harmony complexity (variability in harmonic content)
|
143 |
+
harmony_complexity = np.std(chroma) / np.mean(chroma) if np.mean(chroma) > 0 else 0
|
144 |
+
|
145 |
+
# Calculate tonal stability (consistency of tonal center)
|
146 |
+
tonal_stability = 1.0 / (np.std(chroma_avg) + 0.001) # Add small value to avoid division by zero
|
147 |
+
|
148 |
+
# Calculate spectral brightness (center of mass of the spectrum)
|
149 |
+
spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
|
150 |
+
brightness = np.mean(spectral_centroid) / (sr/2) # Normalize by Nyquist frequency
|
151 |
+
|
152 |
+
# Calculate dissonance using spectral contrast
|
153 |
+
spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
|
154 |
+
dissonance = np.mean(spectral_contrast[0]) # Higher values may indicate more dissonance
|
155 |
+
|
156 |
+
return {
|
157 |
+
"key": key,
|
158 |
+
"mode": mode,
|
159 |
+
"is_major": mode == "major",
|
160 |
+
"harmony_complexity": float(harmony_complexity),
|
161 |
+
"tonal_stability": float(tonal_stability),
|
162 |
+
"brightness": float(brightness),
|
163 |
+
"dissonance": float(dissonance)
|
164 |
+
}
|
165 |
+
|
166 |
+
def analyze_energy(self, y, sr):
|
167 |
+
"""Analyze energy characteristics of the audio"""
|
168 |
+
# RMS Energy (overall loudness)
|
169 |
+
rms = librosa.feature.rms(y=y)[0]
|
170 |
+
|
171 |
+
# Energy metrics
|
172 |
+
mean_energy = np.mean(rms)
|
173 |
+
energy_std = np.std(rms)
|
174 |
+
energy_dynamic_range = np.max(rms) - np.min(rms) if len(rms) > 0 else 0
|
175 |
+
|
176 |
+
# Energy distribution across frequency ranges
|
177 |
+
spec = np.abs(librosa.stft(y))
|
178 |
+
|
179 |
+
# Divide the spectrum into low, mid, and high ranges
|
180 |
+
freq_bins = spec.shape[0]
|
181 |
+
low_freq_energy = np.mean(spec[:int(freq_bins*0.2), :])
|
182 |
+
mid_freq_energy = np.mean(spec[int(freq_bins*0.2):int(freq_bins*0.8), :])
|
183 |
+
high_freq_energy = np.mean(spec[int(freq_bins*0.8):, :])
|
184 |
+
|
185 |
+
# Normalize to create a distribution
|
186 |
+
total_energy = low_freq_energy + mid_freq_energy + high_freq_energy
|
187 |
+
if total_energy > 0:
|
188 |
+
low_freq_ratio = low_freq_energy / total_energy
|
189 |
+
mid_freq_ratio = mid_freq_energy / total_energy
|
190 |
+
high_freq_ratio = high_freq_energy / total_energy
|
191 |
+
else:
|
192 |
+
low_freq_ratio = mid_freq_ratio = high_freq_ratio = 1/3
|
193 |
+
|
194 |
+
return {
|
195 |
+
"mean_energy": float(mean_energy),
|
196 |
+
"energy_std": float(energy_std),
|
197 |
+
"energy_dynamic_range": float(energy_dynamic_range),
|
198 |
+
"frequency_distribution": {
|
199 |
+
"low_freq": float(low_freq_ratio),
|
200 |
+
"mid_freq": float(mid_freq_ratio),
|
201 |
+
"high_freq": float(high_freq_ratio)
|
202 |
+
}
|
203 |
+
}
|
204 |
+
|
205 |
+
def analyze_emotion(self, rhythm_data, tonal_data, energy_data):
|
206 |
+
"""Classify the emotion based on musical features"""
|
207 |
+
# Extract key features for emotion detection
|
208 |
+
tempo = rhythm_data["tempo"]
|
209 |
+
is_major = tonal_data["is_major"]
|
210 |
+
energy = energy_data["mean_energy"]
|
211 |
+
brightness = tonal_data["brightness"]
|
212 |
+
|
213 |
+
# Calculate scores for each emotion
|
214 |
+
emotion_scores = {}
|
215 |
+
for emotion, profile in self.emotion_profiles.items():
|
216 |
+
score = 0.0
|
217 |
+
|
218 |
+
# Tempo contribution (0-1 score)
|
219 |
+
tempo_range = profile["tempo"]
|
220 |
+
if tempo_range[0] <= tempo <= tempo_range[1]:
|
221 |
+
score += 1.0
|
222 |
+
else:
|
223 |
+
# Partial score based on distance
|
224 |
+
distance = min(abs(tempo - tempo_range[0]), abs(tempo - tempo_range[1]))
|
225 |
+
max_distance = 40 # Maximum distance to consider
|
226 |
+
score += max(0, 1 - (distance / max_distance))
|
227 |
+
|
228 |
+
# Energy contribution (0-1 score)
|
229 |
+
energy_range = profile["energy"]
|
230 |
+
if energy_range[0] <= energy <= energy_range[1]:
|
231 |
+
score += 1.0
|
232 |
+
else:
|
233 |
+
# Partial score based on distance
|
234 |
+
distance = min(abs(energy - energy_range[0]), abs(energy - energy_range[1]))
|
235 |
+
max_distance = 0.5 # Maximum distance to consider
|
236 |
+
score += max(0, 1 - (distance / max_distance))
|
237 |
+
|
238 |
+
# Mode contribution (0-1 score)
|
239 |
+
if profile["major_mode"] is not None: # Some emotions don't have strong mode preference
|
240 |
+
score += 1.0 if profile["major_mode"] == is_major else 0.0
|
241 |
+
else:
|
242 |
+
score += 0.5 # Neutral contribution
|
243 |
+
|
244 |
+
# Brightness contribution (0-1 score)
|
245 |
+
brightness_range = profile["brightness"]
|
246 |
+
if brightness_range[0] <= brightness <= brightness_range[1]:
|
247 |
+
score += 1.0
|
248 |
+
else:
|
249 |
+
# Partial score based on distance
|
250 |
+
distance = min(abs(brightness - brightness_range[0]), abs(brightness - brightness_range[1]))
|
251 |
+
max_distance = 0.5 # Maximum distance to consider
|
252 |
+
score += max(0, 1 - (distance / max_distance))
|
253 |
+
|
254 |
+
# Normalize score (0-1 range)
|
255 |
+
emotion_scores[emotion] = score / 4.0
|
256 |
+
|
257 |
+
# Find primary emotion
|
258 |
+
primary_emotion = max(emotion_scores.items(), key=lambda x: x[1])
|
259 |
+
|
260 |
+
# Calculate valence and arousal (dimensional emotion model)
|
261 |
+
# Mapping different emotions to valence-arousal space
|
262 |
+
valence_map = {
|
263 |
+
'happy': 0.8, 'sad': 0.2, 'calm': 0.6,
|
264 |
+
'energetic': 0.7, 'tense': 0.3, 'nostalgic': 0.5
|
265 |
+
}
|
266 |
+
|
267 |
+
arousal_map = {
|
268 |
+
'happy': 0.7, 'sad': 0.3, 'calm': 0.2,
|
269 |
+
'energetic': 0.9, 'tense': 0.8, 'nostalgic': 0.4
|
270 |
+
}
|
271 |
+
|
272 |
+
# Calculate weighted valence and arousal
|
273 |
+
total_weight = sum(emotion_scores.values())
|
274 |
+
if total_weight > 0:
|
275 |
+
valence = sum(score * valence_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
|
276 |
+
arousal = sum(score * arousal_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
|
277 |
+
else:
|
278 |
+
valence = 0.5
|
279 |
+
arousal = 0.5
|
280 |
+
|
281 |
+
return {
|
282 |
+
"primary_emotion": primary_emotion[0],
|
283 |
+
"confidence": primary_emotion[1],
|
284 |
+
"emotion_scores": emotion_scores,
|
285 |
+
"valence": float(valence), # Pleasure dimension (0-1)
|
286 |
+
"arousal": float(arousal) # Activity dimension (0-1)
|
287 |
+
}
|
288 |
+
|
289 |
+
def analyze_theme(self, rhythm_data, tonal_data, emotion_data):
|
290 |
+
"""Infer potential themes based on musical features and emotion"""
|
291 |
+
# Extract relevant features
|
292 |
+
primary_emotion = emotion_data["primary_emotion"]
|
293 |
+
harmony_complexity = tonal_data["harmony_complexity"]
|
294 |
+
|
295 |
+
# Calculate theme scores
|
296 |
+
theme_scores = {}
|
297 |
+
for theme, profile in self.theme_profiles.items():
|
298 |
+
score = 0.0
|
299 |
+
|
300 |
+
# Emotion contribution
|
301 |
+
if primary_emotion in profile["emotion"]:
|
302 |
+
# Emotions listed earlier have stronger connection to the theme
|
303 |
+
position_weight = 1.0 / (profile["emotion"].index(primary_emotion) + 1)
|
304 |
+
score += position_weight
|
305 |
+
|
306 |
+
# Secondary emotions contribution
|
307 |
+
secondary_emotions = [e for e, s in emotion_data["emotion_scores"].items()
|
308 |
+
if s > 0.5 and e != primary_emotion]
|
309 |
+
for emotion in secondary_emotions:
|
310 |
+
if emotion in profile["emotion"]:
|
311 |
+
score += 0.3 # Less weight than primary emotion
|
312 |
+
|
313 |
+
# Harmony complexity contribution
|
314 |
+
complexity_range = profile["harmony_complexity"]
|
315 |
+
if complexity_range[0] <= harmony_complexity <= complexity_range[1]:
|
316 |
+
score += 1.0
|
317 |
+
else:
|
318 |
+
# Partial score based on distance
|
319 |
+
distance = min(abs(harmony_complexity - complexity_range[0]),
|
320 |
+
abs(harmony_complexity - complexity_range[1]))
|
321 |
+
max_distance = 0.5 # Maximum distance to consider
|
322 |
+
score += max(0, 1 - (distance / max_distance))
|
323 |
+
|
324 |
+
# Normalize score
|
325 |
+
theme_scores[theme] = min(1.0, score / 2.5)
|
326 |
+
|
327 |
+
# Find primary theme
|
328 |
+
primary_theme = max(theme_scores.items(), key=lambda x: x[1])
|
329 |
+
|
330 |
+
# Find secondary themes (scores > 0.5)
|
331 |
+
secondary_themes = [(theme, score) for theme, score in theme_scores.items()
|
332 |
+
if score > 0.5 and theme != primary_theme[0]]
|
333 |
+
secondary_themes.sort(key=lambda x: x[1], reverse=True)
|
334 |
+
|
335 |
+
return {
|
336 |
+
"primary_theme": primary_theme[0],
|
337 |
+
"confidence": primary_theme[1],
|
338 |
+
"secondary_themes": [t[0] for t in secondary_themes[:2]], # Top 2 secondary themes
|
339 |
+
"theme_scores": theme_scores
|
340 |
+
}
|
341 |
+
|
342 |
+
def analyze_music(self, file_path):
|
343 |
+
"""Main function to perform comprehensive music analysis"""
|
344 |
+
# Load the audio file
|
345 |
+
y, sr = self.load_audio(file_path)
|
346 |
+
if y is None:
|
347 |
+
return {"error": "Failed to load audio file"}
|
348 |
+
|
349 |
+
# Run all analyses
|
350 |
+
rhythm_data = self.analyze_rhythm(y, sr)
|
351 |
+
tonal_data = self.analyze_tonality(y, sr)
|
352 |
+
energy_data = self.analyze_energy(y, sr)
|
353 |
+
|
354 |
+
# Higher-level analyses that depend on the basic features
|
355 |
+
emotion_data = self.analyze_emotion(rhythm_data, tonal_data, energy_data)
|
356 |
+
theme_data = self.analyze_theme(rhythm_data, tonal_data, emotion_data)
|
357 |
+
|
358 |
+
# Combine all results
|
359 |
+
return {
|
360 |
+
"file": file_path,
|
361 |
+
"rhythm_analysis": rhythm_data,
|
362 |
+
"tonal_analysis": tonal_data,
|
363 |
+
"energy_analysis": energy_data,
|
364 |
+
"emotion_analysis": emotion_data,
|
365 |
+
"theme_analysis": theme_data,
|
366 |
+
"summary": {
|
367 |
+
"tempo": rhythm_data["tempo"],
|
368 |
+
"time_signature": rhythm_data["estimated_time_signature"],
|
369 |
+
"key": tonal_data["key"],
|
370 |
+
"mode": tonal_data["mode"],
|
371 |
+
"primary_emotion": emotion_data["primary_emotion"],
|
372 |
+
"primary_theme": theme_data["primary_theme"]
|
373 |
+
}
|
374 |
+
}
|
375 |
+
|
376 |
+
# def visualize_analysis(self, file_path):
|
377 |
+
# """Create visualizations for the music analysis results"""
|
378 |
+
# # Check if matplotlib is available
|
379 |
+
# if plt is None:
|
380 |
+
# print("Error: matplotlib is not installed. Visualization is not available.")
|
381 |
+
# return
|
382 |
+
#
|
383 |
+
# # Load audio and run analysis
|
384 |
+
# y, sr = self.load_audio(file_path)
|
385 |
+
# if y is None:
|
386 |
+
# print("Error: Failed to load audio file")
|
387 |
+
# return
|
388 |
+
#
|
389 |
+
# results = self.analyze_music(file_path)
|
390 |
+
#
|
391 |
+
# # Create visualization
|
392 |
+
# plt.figure(figsize=(15, 12))
|
393 |
+
|
394 |
+
# # Waveform
|
395 |
+
# plt.subplot(3, 2, 1)
|
396 |
+
# librosa.display.waveshow(y, sr=sr, alpha=0.6)
|
397 |
+
# plt.title(f'Waveform (Tempo: {results["rhythm_analysis"]["tempo"]:.1f} BPM)')
|
398 |
+
|
399 |
+
# # Spectrogram
|
400 |
+
# plt.subplot(3, 2, 2)
|
401 |
+
# D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
|
402 |
+
# librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
|
403 |
+
# plt.colorbar(format='%+2.0f dB')
|
404 |
+
# plt.title(f'Spectrogram (Key: {results["tonal_analysis"]["key"]} {results["tonal_analysis"]["mode"]})')
|
405 |
+
|
406 |
+
# # Chromagram
|
407 |
+
# plt.subplot(3, 2, 3)
|
408 |
+
# chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
|
409 |
+
# librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
|
410 |
+
# plt.colorbar()
|
411 |
+
# plt.title('Chromagram')
|
412 |
+
|
413 |
+
# # Onset strength and beats
|
414 |
+
# plt.subplot(3, 2, 4)
|
415 |
+
# onset_env = librosa.onset.onset_strength(y=y, sr=sr)
|
416 |
+
# times = librosa.times_like(onset_env, sr=sr)
|
417 |
+
# plt.plot(times, librosa.util.normalize(onset_env), label='Onset strength')
|
418 |
+
# plt.vlines(results["rhythm_analysis"]["beat_times"], 0, 1, alpha=0.5, color='r',
|
419 |
+
# linestyle='--', label='Beats')
|
420 |
+
# plt.legend()
|
421 |
+
# plt.title('Rhythm Analysis')
|
422 |
+
|
423 |
+
# # Emotion scores
|
424 |
+
# plt.subplot(3, 2, 5)
|
425 |
+
# emotions = list(results["emotion_analysis"]["emotion_scores"].keys())
|
426 |
+
# scores = list(results["emotion_analysis"]["emotion_scores"].values())
|
427 |
+
# plt.bar(emotions, scores, color='skyblue')
|
428 |
+
# plt.ylim(0, 1)
|
429 |
+
# plt.title(f'Emotion Analysis (Primary: {results["emotion_analysis"]["primary_emotion"]})')
|
430 |
+
# plt.xticks(rotation=45)
|
431 |
+
|
432 |
+
# # Theme scores
|
433 |
+
# plt.subplot(3, 2, 6)
|
434 |
+
# themes = list(results["theme_analysis"]["theme_scores"].keys())
|
435 |
+
# scores = list(results["theme_analysis"]["theme_scores"].values())
|
436 |
+
# plt.bar(themes, scores, color='lightgreen')
|
437 |
+
# plt.ylim(0, 1)
|
438 |
+
# plt.title(f'Theme Analysis (Primary: {results["theme_analysis"]["primary_theme"]})')
|
439 |
+
# plt.xticks(rotation=45)
|
440 |
+
|
441 |
+
# plt.tight_layout()
|
442 |
+
# plt.show()
|
443 |
+
|
444 |
+
|
445 |
+
# Create an instance of the analyzer
|
446 |
+
analyzer = MusicAnalyzer()
|
447 |
+
|
448 |
+
# The following code is for demonstration purposes only
|
449 |
+
# and will only run if executed directly (not when imported)
|
450 |
+
if __name__ == "__main__":
|
451 |
+
# Replace this with a real audio file path when running as a script
|
452 |
+
demo_file = "path/to/your/audio/file.mp3"
|
453 |
+
|
454 |
+
# Analyze the uploaded audio file
|
455 |
+
results = analyzer.analyze_music(demo_file)
|
456 |
+
|
457 |
+
# Print analysis summary
|
458 |
+
print("\n=== MUSIC ANALYSIS SUMMARY ===")
|
459 |
+
print(f"Tempo: {results['summary']['tempo']:.1f} BPM")
|
460 |
+
print(f"Time Signature: {results['summary']['time_signature']}")
|
461 |
+
print(f"Key: {results['summary']['key']} {results['summary']['mode']}")
|
462 |
+
print(f"Primary Emotion: {results['summary']['primary_emotion']}")
|
463 |
+
print(f"Primary Theme: {results['summary']['primary_theme']}")
|
464 |
+
|
465 |
+
# Show detailed results (optional)
|
466 |
+
import json
|
467 |
+
print("\n=== DETAILED ANALYSIS ===")
|
468 |
+
print(json.dumps(results, indent=2))
|
469 |
+
|
470 |
+
# Visualize the analysis
|
471 |
+
# analyzer.visualize_analysis(demo_file)
|
example.py
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import sys
|
3 |
+
from app import process_audio, music_analyzer
|
4 |
+
|
5 |
+
def main():
|
6 |
+
"""
|
7 |
+
Example function to demonstrate the application with a sample audio file.
|
8 |
+
|
9 |
+
Usage:
|
10 |
+
python example.py <path_to_audio_file>
|
11 |
+
"""
|
12 |
+
if len(sys.argv) != 2:
|
13 |
+
print("Usage: python example.py <path_to_audio_file>")
|
14 |
+
return
|
15 |
+
|
16 |
+
audio_file = sys.argv[1]
|
17 |
+
if not os.path.exists(audio_file):
|
18 |
+
print(f"Error: File {audio_file} does not exist.")
|
19 |
+
return
|
20 |
+
|
21 |
+
print(f"Processing audio file: {audio_file}")
|
22 |
+
|
23 |
+
# Call the main processing function
|
24 |
+
genre_results, lyrics = process_audio(audio_file)
|
25 |
+
|
26 |
+
# Get emotion analysis results
|
27 |
+
emotion_results = music_analyzer.analyze_music(audio_file)
|
28 |
+
|
29 |
+
# Print results
|
30 |
+
print("\n" + "="*50)
|
31 |
+
print("GENRE CLASSIFICATION RESULTS:")
|
32 |
+
print("="*50)
|
33 |
+
print(genre_results)
|
34 |
+
|
35 |
+
print("\n" + "="*50)
|
36 |
+
print("EMOTION ANALYSIS RESULTS:")
|
37 |
+
print("="*50)
|
38 |
+
print(f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM")
|
39 |
+
print(f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}")
|
40 |
+
print(f"Primary Emotion: {emotion_results['summary']['primary_emotion']}")
|
41 |
+
print(f"Primary Theme: {emotion_results['summary']['primary_theme']}")
|
42 |
+
|
43 |
+
print("\n" + "="*50)
|
44 |
+
print("GENERATED LYRICS:")
|
45 |
+
print("="*50)
|
46 |
+
print(lyrics)
|
47 |
+
|
48 |
+
if __name__ == "__main__":
|
49 |
+
main()
|
requirements.txt
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio>=5.22.0
|
2 |
+
transformers>=4.36.2
|
3 |
+
torch>=2.1.2
|
4 |
+
torchaudio>=2.1.2
|
5 |
+
numpy>=1.26.2
|
6 |
+
accelerate>=0.25.0
|
7 |
+
librosa>=0.10.1
|
8 |
+
huggingface-hub>=0.20.3
|
9 |
+
bitsandbytes>=0.41.1
|
10 |
+
sentencepiece>=0.1.99
|
11 |
+
safetensors>=0.4.1
|
12 |
+
scipy>=1.12.0
|
13 |
+
soundfile>=0.12.1
|
14 |
+
matplotlib>=3.7.0
|
utils.py
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import torch
|
2 |
+
import numpy as np
|
3 |
+
import librosa
|
4 |
+
|
5 |
+
def load_audio(audio_file, sr=22050):
|
6 |
+
"""Load an audio file and convert to mono if needed."""
|
7 |
+
try:
|
8 |
+
# Try to load audio with librosa
|
9 |
+
y, sr = librosa.load(audio_file, sr=sr, mono=True)
|
10 |
+
return y, sr
|
11 |
+
except Exception as e:
|
12 |
+
print(f"Error loading audio with librosa: {str(e)}")
|
13 |
+
# Fallback to basic loading if necessary
|
14 |
+
import soundfile as sf
|
15 |
+
try:
|
16 |
+
y, sr = sf.read(audio_file)
|
17 |
+
# Convert to mono if stereo
|
18 |
+
if len(y.shape) > 1:
|
19 |
+
y = y.mean(axis=1)
|
20 |
+
return y, sr
|
21 |
+
except Exception as e2:
|
22 |
+
print(f"Error loading audio with soundfile: {str(e2)}")
|
23 |
+
raise ValueError(f"Could not load audio file: {audio_file}")
|
24 |
+
|
25 |
+
def extract_audio_duration(y, sr):
|
26 |
+
"""Get the duration of audio in seconds."""
|
27 |
+
return len(y) / sr
|
28 |
+
|
29 |
+
def extract_mfcc_features(y, sr, n_mfcc=20):
|
30 |
+
"""Extract MFCC features from audio."""
|
31 |
+
try:
|
32 |
+
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
|
33 |
+
mfccs_mean = np.mean(mfccs.T, axis=0)
|
34 |
+
return mfccs_mean
|
35 |
+
except Exception as e:
|
36 |
+
print(f"Error extracting MFCCs: {str(e)}")
|
37 |
+
# Return a fallback feature vector if extraction fails
|
38 |
+
return np.zeros(n_mfcc)
|
39 |
+
|
40 |
+
def calculate_lyrics_length(duration):
|
41 |
+
"""
|
42 |
+
Calculate appropriate lyrics length based on audio duration.
|
43 |
+
Uses a more conservative calculation that generates shorter lyrics:
|
44 |
+
- Average words per line (8-10 words)
|
45 |
+
- Reduced words per minute (45 words instead of 135)
|
46 |
+
- Simplified song structure
|
47 |
+
"""
|
48 |
+
# Convert duration to minutes
|
49 |
+
duration_minutes = duration / 60
|
50 |
+
|
51 |
+
# Calculate total words based on duration
|
52 |
+
# Using 45 words per minute (reduced from 135)
|
53 |
+
total_words = int(duration_minutes * 90)
|
54 |
+
|
55 |
+
# Calculate number of lines
|
56 |
+
# Assuming 8-10 words per line
|
57 |
+
words_per_line = 9 # average
|
58 |
+
total_lines = total_words // words_per_line
|
59 |
+
|
60 |
+
# Adjust for song structure with shorter lengths
|
61 |
+
if total_lines < 6:
|
62 |
+
# Very short song - keep it simple
|
63 |
+
return max(2, total_lines)
|
64 |
+
elif total_lines < 10:
|
65 |
+
# Short song - one verse and chorus
|
66 |
+
return min(6, total_lines)
|
67 |
+
elif total_lines < 15:
|
68 |
+
# Medium song - two verses and chorus
|
69 |
+
return min(10, total_lines)
|
70 |
+
else:
|
71 |
+
# Longer song - two verses, chorus, and bridge
|
72 |
+
return min(15, total_lines)
|
73 |
+
|
74 |
+
def format_genre_results(top_genres):
|
75 |
+
"""Format genre classification results for display."""
|
76 |
+
result = "Top Detected Genres:\n"
|
77 |
+
for genre, confidence in top_genres:
|
78 |
+
result += f"- {genre}: {confidence*100:.2f}%\n"
|
79 |
+
return result
|
80 |
+
|
81 |
+
def ensure_cuda_availability():
|
82 |
+
"""Check and report CUDA availability for informational purposes."""
|
83 |
+
cuda_available = torch.cuda.is_available()
|
84 |
+
if cuda_available:
|
85 |
+
device_count = torch.cuda.device_count()
|
86 |
+
device_name = torch.cuda.get_device_name(0) if device_count > 0 else "Unknown"
|
87 |
+
print(f"CUDA is available with {device_count} device(s). Using: {device_name}")
|
88 |
+
else:
|
89 |
+
print("CUDA is not available. Using CPU for inference.")
|
90 |
+
return cuda_available
|
91 |
+
|
92 |
+
def preprocess_audio_for_model(waveform, sample_rate, target_sample_rate=16000, max_length=16000):
|
93 |
+
"""Preprocess audio for model input (resample, pad/trim)."""
|
94 |
+
# Resample if needed
|
95 |
+
if sample_rate != target_sample_rate:
|
96 |
+
waveform = librosa.resample(waveform, orig_sr=sample_rate, target_sr=target_sample_rate)
|
97 |
+
|
98 |
+
# Trim or pad to expected length
|
99 |
+
if len(waveform) > max_length:
|
100 |
+
waveform = waveform[:max_length]
|
101 |
+
elif len(waveform) < max_length:
|
102 |
+
padding = max_length - len(waveform)
|
103 |
+
waveform = np.pad(waveform, (0, padding), 'constant')
|
104 |
+
|
105 |
+
return waveform
|