Spaces:

jacob-c
/

syllables_matching_experiment

Sleeping

App Files Files Community

root commited on 9 days ago

Commit

9e21eef

1 Parent(s): a51853d

copyprevious

Browse files

Files changed (7) hide show

DEPLOYMENT.md +42 -0
README.md +42 -8
app.py +383 -0
emotionanalysis.py +471 -0
example.py +49 -0
requirements.txt +14 -0
utils.py +105 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# Deploying to Hugging Face Spaces
+This guide explains how to deploy the Music Genre Classifier & Lyrics Generator to Hugging Face Spaces.
+## Prerequisites
+1. A Hugging Face account
+2. Access to the Llama 3.1 8B Instruct model (requires acceptance of the model license)
+3. A Hugging Face API token
+## Deployment Steps
+### 1. Create a New Space
+1. Go to the Hugging Face website and log in
+2. Navigate to "Spaces" in the top navigation
+3. Click "Create new Space"
+4. Choose "Gradio" as the SDK
+5. Give your Space a name and description
+6. Select "T4 GPU" as the hardware
+### 2. Set up Environment Variables
+Set up your Hugging Face access token as an environment variable:
+1. Go to your profile settings in Hugging Face
+2. Navigate to "Access Tokens" and create a new token with "write" access
+3. In your Space settings, under "Repository secrets", add a new secret:
+   - Name: `HF_TOKEN`
+   - Value: Your Hugging Face access token
+### 3. Upload the Files
+Upload all the files from this repository to your Space.
+### 4. Wait for Deployment
+Hugging Face will automatically build and deploy your Space. This may take a few minutes, especially since it needs to download the models.
+### 5. Access Your Application
+Once deployed, you can access your application on your Hugging Face Space URL.

README.md CHANGED Viewed

@@ -1,13 +1,47 @@
 ---
-title: Syllables Matching Experiment
-emoji: 🏢
-colorFrom: gray
-colorTo: yellow
-sdk: streamlit
-sdk_version: 1.45.0
 app_file: app.py
 pinned: false
-short_description: this project is for trying syllables matching techniques
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Music Genre Classifier & Lyrics Generator
+emoji: 🎵
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: 5.22.0
 app_file: app.py
 pinned: false
+license: mit
+short_description: AI music genre detection and lyrics generation
 ---
+# Music Genre Classifier & Lyrics Generator
+This Hugging Face Space application provides two AI-powered features:
+1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
+2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
+## Features
+- Upload any music file for instant genre classification
+- Receive genre predictions with confidence scores
+- Get AI-generated lyrics tailored to the detected music genre
+- Lyrics length is automatically adjusted based on the song duration
+- Simple and intuitive user interface
+## Usage
+1. Visit the live application on Hugging Face Spaces
+2. Upload your music file using the provided interface
+3. Click "Analyze & Generate" to process the audio
+4. View the detected genre and generated lyrics in the output panels
+## Technical Details
+- Uses MFCC features extraction from audio for genre classification
+- Leverages 4-bit quantization for efficient LLM inference on T4 GPU
+- Implements a specialized prompt engineering approach to generate genre-specific lyrics
+- Automatically scales lyrics length based on audio duration
+## Links
+- [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
+- [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)

app.py ADDED Viewed

	@@ -0,0 +1,383 @@

+import os
+import io
+import gradio as gr
+import torch
+import numpy as np
+from transformers import (
+    AutoModelForAudioClassification,
+    AutoFeatureExtractor,
+    AutoTokenizer,
+    pipeline,
+    AutoModelForCausalLM,
+    BitsAndBytesConfig
+)
+from huggingface_hub import login
+from utils import (
+    load_audio,
+    extract_audio_duration,
+    extract_mfcc_features,
+    calculate_lyrics_length,
+    format_genre_results,
+    ensure_cuda_availability,
+    preprocess_audio_for_model
+)
+from emotionanalysis import MusicAnalyzer
+# Login to Hugging Face Hub if token is provided
+if "HF_TOKEN" in os.environ:
+    login(token=os.environ["HF_TOKEN"])
+# Constants
+GENRE_MODEL_NAME = "dima806/music_genres_classification"
+MUSIC_DETECTION_MODEL = "MIT/ast-finetuned-audioset-10-10-0.4593"
+LLM_MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
+SAMPLE_RATE = 22050  # Standard sample rate for audio processing
+# Check CUDA availability (for informational purposes)
+CUDA_AVAILABLE = ensure_cuda_availability()
+# Create music detection pipeline
+print(f"Loading music detection model: {MUSIC_DETECTION_MODEL}")
+try:
+    music_detector = pipeline(
+        "audio-classification",
+        model=MUSIC_DETECTION_MODEL,
+        device=0 if CUDA_AVAILABLE else -1
+    )
+    print("Successfully loaded music detection pipeline")
+except Exception as e:
+    print(f"Error creating music detection pipeline: {str(e)}")
+    # Fallback to manual loading
+    try:
+        music_processor = AutoFeatureExtractor.from_pretrained(MUSIC_DETECTION_MODEL)
+        music_model = AutoModelForAudioClassification.from_pretrained(MUSIC_DETECTION_MODEL)
+        print("Successfully loaded music detection model and feature extractor")
+    except Exception as e2:
+        print(f"Error loading music detection model components: {str(e2)}")
+        raise RuntimeError(f"Could not load music detection model: {str(e2)}")
+# Create genre classification pipeline
+print(f"Loading audio classification model: {GENRE_MODEL_NAME}")
+try:
+    genre_classifier = pipeline(
+        "audio-classification",
+        model=GENRE_MODEL_NAME,
+        device=0 if CUDA_AVAILABLE else -1
+    )
+    print("Successfully loaded audio classification pipeline")
+except Exception as e:
+    print(f"Error creating pipeline: {str(e)}")
+    # Fallback to manual loading
+    try:
+        genre_processor = AutoFeatureExtractor.from_pretrained(GENRE_MODEL_NAME)
+        genre_model = AutoModelForAudioClassification.from_pretrained(GENRE_MODEL_NAME)
+        print("Successfully loaded audio classification model and feature extractor")
+    except Exception as e2:
+        print(f"Error loading model components: {str(e2)}")
+        raise RuntimeError(f"Could not load genre classification model: {str(e2)}")
+# Load LLM with appropriate quantization for T4 GPU
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+llm_tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)
+llm_model = AutoModelForCausalLM.from_pretrained(
+    LLM_MODEL_NAME,
+    device_map="auto",
+    quantization_config=bnb_config,
+    torch_dtype=torch.float16,
+)
+# Create LLM pipeline
+llm_pipeline = pipeline(
+    "text-generation",
+    model=llm_model,
+    tokenizer=llm_tokenizer,
+    max_new_tokens=512,
+)
+# Initialize music emotion analyzer
+music_analyzer = MusicAnalyzer()
+def extract_audio_features(audio_file):
+    """Extract audio features from an audio file."""
+    # Load the audio file using utility function
+    y, sr = load_audio(audio_file, SAMPLE_RATE)
+    # Get audio duration in seconds
+    duration = extract_audio_duration(y, sr)
+    # Extract MFCCs for genre classification (may not be needed with the pipeline)
+    mfccs_mean = extract_mfcc_features(y, sr, n_mfcc=20)
+    return {
+        "features": mfccs_mean,
+        "duration": duration,
+        "waveform": y,
+        "sample_rate": sr,
+        "path": audio_file  # Keep path for the pipeline
+    }
+def classify_genre(audio_data):
+    """Classify the genre of the audio using the loaded model."""
+    try:
+        # First attempt: Try using the pipeline if available
+        if 'genre_classifier' in globals():
+            results = genre_classifier(audio_data["path"])
+            # Transform pipeline results to our expected format
+            top_genres = [(result["label"], result["score"]) for result in results[:3]]
+            return top_genres
+        # Second attempt: Use manually loaded model components
+        elif 'genre_processor' in globals() and 'genre_model' in globals():
+            # Process audio input with feature extractor
+            inputs = genre_processor(
+                audio_data["waveform"],
+                sampling_rate=audio_data["sample_rate"],
+                return_tensors="pt"
+            )
+            with torch.no_grad():
+                outputs = genre_model(**inputs)
+                predictions = outputs.logits.softmax(dim=-1)
+            # Get the top 3 genres
+            values, indices = torch.topk(predictions, 3)
+            # Map indices to genre labels
+            genre_labels = genre_model.config.id2label
+            top_genres = []
+            for i, (value, index) in enumerate(zip(values[0], indices[0])):
+                genre = genre_labels[index.item()]
+                confidence = value.item()
+                top_genres.append((genre, confidence))
+            return top_genres
+        else:
+            raise ValueError("No genre classification model available")
+    except Exception as e:
+        print(f"Error in genre classification: {str(e)}")
+        # Fallback: return a default genre if everything fails
+        return [("rock", 1.0)]
+def generate_lyrics(genre, duration, emotion_results):
+    """Generate lyrics based on the genre and with appropriate length."""
+    # Calculate appropriate lyrics length based on audio duration
+    lines_count = calculate_lyrics_length(duration)
+    # Calculate approximate number of verses and chorus
+    if lines_count <= 6:
+        # Very short song - one verse and chorus
+        verse_lines = 2
+        chorus_lines = 2
+    elif lines_count <= 10:
+        # Medium song - two verses and chorus
+        verse_lines = 3
+        chorus_lines = 2
+    else:
+        # Longer song - two verses, chorus, and bridge
+        verse_lines = 3
+        chorus_lines = 2
+    # Extract emotion and theme data from analysis results
+    primary_emotion = emotion_results["emotion_analysis"]["primary_emotion"]
+    primary_theme = emotion_results["theme_analysis"]["primary_theme"]
+    tempo = emotion_results["rhythm_analysis"]["tempo"]
+    key = emotion_results["tonal_analysis"]["key"]
+    mode = emotion_results["tonal_analysis"]["mode"]
+    # Create prompt for the LLM
+    prompt = f"""
+You are a talented songwriter who specializes in {genre} music.
+Write original {genre} song lyrics for a song that is {duration:.1f} seconds long.
+Music analysis has detected the following qualities in the music:
+- Tempo: {tempo:.1f} BPM
+- Key: {key} {mode}
+- Primary emotion: {primary_emotion}
+- Primary theme: {primary_theme}
+The lyrics should:
+- Perfectly capture the essence and style of {genre} music
+- Express the {primary_emotion} emotion and {primary_theme} theme
+- Be approximately {lines_count} lines long
+- Have a coherent theme and flow
+- Follow this structure:
+  * Verse: {verse_lines} lines
+  * Chorus: {chorus_lines} lines
+  * {f'Bridge: 2 lines' if lines_count > 10 else ''}
+- Be completely original
+- Match the song duration of {duration:.1f} seconds
+- Keep each line concise and impactful
+Your lyrics:
+"""
+    # Generate lyrics using the LLM
+    response = llm_pipeline(
+        prompt,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.9,
+        repetition_penalty=1.1,
+        return_full_text=False
+    )
+    # Extract and clean generated lyrics
+    lyrics = response[0]["generated_text"].strip()
+    # Add section labels if they're not present
+    if "Verse" not in lyrics and "Chorus" not in lyrics:
+        lines = lyrics.split('\n')
+        formatted_lyrics = []
+        current_section = "Verse"
+        for i, line in enumerate(lines):
+            if i == 0:
+                formatted_lyrics.append("[Verse]")
+            elif i == verse_lines:
+                formatted_lyrics.append("\n[Chorus]")
+            elif i == verse_lines + chorus_lines and lines_count > 10:
+                formatted_lyrics.append("\n[Bridge]")
+            formatted_lyrics.append(line)
+        lyrics = '\n'.join(formatted_lyrics)
+    return lyrics
+def detect_music(audio_data):
+    """Detect if the audio is music using the MIT AST model."""
+    try:
+        # First attempt: Try using the pipeline if available
+        if 'music_detector' in globals():
+            results = music_detector(audio_data["path"])
+            # Look for music-related classes in the results
+            music_confidence = 0.0
+            for result in results:
+                label = result["label"].lower()
+                if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
+                    music_confidence = max(music_confidence, result["score"])
+            return music_confidence >= 0.5
+        # Second attempt: Use manually loaded model components
+        elif 'music_processor' in globals() and 'music_model' in globals():
+            # Process audio input with feature extractor
+            inputs = music_processor(
+                audio_data["waveform"],
+                sampling_rate=audio_data["sample_rate"],
+                return_tensors="pt"
+            )
+            with torch.no_grad():
+                outputs = music_model(**inputs)
+                predictions = outputs.logits.softmax(dim=-1)
+            # Get the top predictions
+            values, indices = torch.topk(predictions, 5)
+            # Map indices to labels
+            labels = music_model.config.id2label
+            # Check for music-related classes
+            music_confidence = 0.0
+            for i, (value, index) in enumerate(zip(values[0], indices[0])):
+                label = labels[index.item()].lower()
+                if any(music_term in label for music_term in ["music", "song", "singing", "instrument"]):
+                    music_confidence = max(music_confidence, value.item())
+            return music_confidence >= 0.5
+        else:
+            raise ValueError("No music detection model available")
+    except Exception as e:
+        print(f"Error in music detection: {str(e)}")
+        return False
+def process_audio(audio_file):
+    """Main function to process audio file, classify genre, and generate lyrics."""
+    if audio_file is None:
+        return "Please upload an audio file.", None
+    try:
+        # Extract audio features
+        audio_data = extract_audio_features(audio_file)
+        # First check if it's music
+        is_music = detect_music(audio_data)
+        if not is_music:
+            return "The uploaded audio does not appear to be music. Please upload a music file.", None
+        # Classify genre
+        top_genres = classify_genre(audio_data)
+        # Format genre results using utility function
+        genre_results = format_genre_results(top_genres)
+        # Analyze music emotions and themes
+        emotion_results = music_analyzer.analyze_music(audio_file)
+        # Generate lyrics based on top genre and emotion analysis
+        primary_genre, _ = top_genres[0]
+        lyrics = generate_lyrics(primary_genre, audio_data["duration"], emotion_results)
+        return genre_results, lyrics
+    except Exception as e:
+        return f"Error processing audio: {str(e)}", None
+# Create Gradio interface
+with gr.Blocks(title="Music Genre Classifier & Lyrics Generator") as demo:
+    gr.Markdown("# Music Genre Classifier & Lyrics Generator")
+    gr.Markdown("Upload a music file to classify its genre, analyze its emotions, and generate matching lyrics.")
+    with gr.Row():
+        with gr.Column():
+            audio_input = gr.Audio(label="Upload Music", type="filepath")
+            submit_btn = gr.Button("Analyze & Generate")
+        with gr.Column():
+            genre_output = gr.Textbox(label="Detected Genres", lines=5)
+            emotion_output = gr.Textbox(label="Emotion Analysis", lines=5)
+            lyrics_output = gr.Textbox(label="Generated Lyrics", lines=15)
+    def display_results(audio_file):
+        if audio_file is None:
+            return "Please upload an audio file.", "No emotion analysis available.", None
+        try:
+            # Process audio and get genre and lyrics
+            genre_results, lyrics = process_audio(audio_file)
+            # Format emotion analysis results
+            emotion_results = music_analyzer.analyze_music(audio_file)
+            emotion_text = f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM\n"
+            emotion_text += f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}\n"
+            emotion_text += f"Primary Emotion: {emotion_results['summary']['primary_emotion']}\n"
+            emotion_text += f"Primary Theme: {emotion_results['summary']['primary_theme']}"
+            return genre_results, emotion_text, lyrics
+        except Exception as e:
+            return f"Error: {str(e)}", "Error in emotion analysis", None
+    submit_btn.click(
+        fn=display_results,
+        inputs=[audio_input],
+        outputs=[genre_output, emotion_output, lyrics_output]
+    )
+    gr.Markdown("### How it works")
+    gr.Markdown("""
+    1. Upload an audio file of your choice
+    2. The system will classify the genre using the dima806/music_genres_classification model
+    3. The system will analyze the musical emotion and theme using advanced audio processing
+    4. Based on the detected genre and emotion, it will generate appropriate lyrics using Llama-3.1-8B-Instruct
+    5. The lyrics length is automatically adjusted based on your audio duration
+    """)
+# Launch the app
+demo.launch()

emotionanalysis.py ADDED Viewed

	@@ -0,0 +1,471 @@

+import librosa
+import numpy as np
+try:
+    import matplotlib.pyplot as plt
+except ImportError:
+    plt = None
+from scipy.stats import mode
+import warnings
+warnings.filterwarnings('ignore')  # Suppress librosa warnings
+class MusicAnalyzer:
+    def __init__(self):
+        # Emotion feature mappings - these define characteristics of different emotions
+        self.emotion_profiles = {
+            'happy': {'tempo': (100, 180), 'energy': (0.6, 1.0), 'major_mode': True, 'brightness': (0.6, 1.0)},
+            'sad': {'tempo': (40, 90), 'energy': (0, 0.5), 'major_mode': False, 'brightness': (0, 0.5)},
+            'calm': {'tempo': (50, 90), 'energy': (0, 0.4), 'major_mode': True, 'brightness': (0.3, 0.6)},
+            'energetic': {'tempo': (110, 200), 'energy': (0.7, 1.0), 'major_mode': True, 'brightness': (0.5, 0.9)},
+            'tense': {'tempo': (70, 140), 'energy': (0.5, 0.9), 'major_mode': False, 'brightness': (0.3, 0.7)},
+            'nostalgic': {'tempo': (60, 100), 'energy': (0.3, 0.7), 'major_mode': None, 'brightness': (0.4, 0.7)}
+        }
+        # Theme mappings based on musical features
+        self.theme_profiles = {
+            'love': {'emotion': ['happy', 'nostalgic', 'sad'], 'harmony_complexity': (0.3, 0.7)},
+            'triumph': {'emotion': ['energetic', 'happy'], 'harmony_complexity': (0.4, 0.8)},
+            'loss': {'emotion': ['sad', 'nostalgic'], 'harmony_complexity': (0.3, 0.7)},
+            'adventure': {'emotion': ['energetic', 'tense'], 'harmony_complexity': (0.5, 0.9)},
+            'reflection': {'emotion': ['calm', 'nostalgic'], 'harmony_complexity': (0.4, 0.8)},
+            'conflict': {'emotion': ['tense', 'energetic'], 'harmony_complexity': (0.6, 1.0)}
+        }
+        # Musical key mapping
+        self.key_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
+    def load_audio(self, file_path, sr=22050, duration=None):
+        """Load audio file and return time series and sample rate"""
+        try:
+            y, sr = librosa.load(file_path, sr=sr, duration=duration)
+            return y, sr
+        except Exception as e:
+            print(f"Error loading audio file: {e}")
+            return None, None
+    def analyze_rhythm(self, y, sr):
+        """Analyze rhythm-related features: tempo, beats, time signature"""
+        # Tempo and beat detection
+        onset_env = librosa.onset.onset_strength(y=y, sr=sr)
+        tempo, beat_frames = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
+        beat_times = librosa.frames_to_time(beat_frames, sr=sr)
+        # Beat intervals and regularity
+        beat_intervals = np.diff(beat_times) if len(beat_times) > 1 else np.array([0])
+        beat_regularity = 1.0 / np.std(beat_intervals) if len(beat_intervals) > 0 and np.std(beat_intervals) > 0 else 0
+        # Rhythm pattern analysis through autocorrelation
+        ac = librosa.autocorrelate(onset_env, max_size=sr // 2)
+        ac = librosa.util.normalize(ac, norm=np.inf)
+        # Time signature estimation - a challenging task with many limitations
+        estimated_signature = self._estimate_time_signature(y, sr, beat_times, onset_env)
+        # Compute onset strength to get a measure of rhythm intensity
+        rhythm_intensity = np.mean(onset_env) / np.max(onset_env) if np.max(onset_env) > 0 else 0
+        # Rhythm complexity based on variation in onset strength
+        rhythm_complexity = np.std(onset_env) / np.mean(onset_env) if np.mean(onset_env) > 0 else 0
+        return {
+            "tempo": float(tempo),
+            "beat_times": beat_times.tolist(),
+            "beat_intervals": beat_intervals.tolist(),
+            "beat_regularity": float(beat_regularity),
+            "rhythm_intensity": float(rhythm_intensity),
+            "rhythm_complexity": float(rhythm_complexity),
+            "estimated_time_signature": estimated_signature
+        }
+    def _estimate_time_signature(self, y, sr, beat_times, onset_env):
+        """Estimate the time signature based on beat patterns"""
+        # This is a simplified approach - accurate time signature detection is complex
+        if len(beat_times) < 4:
+            return "Unknown"
+        # Analyze beat emphasis patterns to detect meter
+        beat_intervals = np.diff(beat_times)
+        # Look for periodicity in the onset envelope
+        ac = librosa.autocorrelate(onset_env, max_size=sr)
+        # Find peaks in autocorrelation after the first one (which is at lag 0)
+        peaks = librosa.util.peak_pick(ac, pre_max=20, post_max=20, pre_avg=20, post_avg=20, delta=0.1, wait=1)
+        peaks = peaks[peaks > 0]  # Remove the first peak which is at lag 0
+        if len(peaks) == 0:
+            return "4/4"  # Default to most common
+        # Convert first significant peak to beats
+        first_peak_time = peaks[0] / sr
+        beats_per_bar = round(first_peak_time / np.median(beat_intervals))
+        # Map to common time signatures
+        if beats_per_bar == 4 or beats_per_bar == 8:
+            return "4/4"
+        elif beats_per_bar == 3 or beats_per_bar == 6:
+            return "3/4"
+        elif beats_per_bar == 2:
+            return "2/4"
+        else:
+            return f"{beats_per_bar}/4"  # Default assumption
+    def analyze_tonality(self, y, sr):
+        """Analyze tonal features: key, mode, harmonic features"""
+        # Compute chromagram
+        chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
+        # Krumhansl-Schmuckler key-finding algorithm (simplified)
+        # Major and minor profiles from music theory research
+        major_profile = np.array([6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88])
+        minor_profile = np.array([6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17])
+        # Calculate the correlation of the chroma with each key profile
+        chroma_avg = np.mean(chroma, axis=1)
+        major_corr = np.zeros(12)
+        minor_corr = np.zeros(12)
+        for i in range(12):
+            major_corr[i] = np.corrcoef(np.roll(chroma_avg, i), major_profile)[0, 1]
+            minor_corr[i] = np.corrcoef(np.roll(chroma_avg, i), minor_profile)[0, 1]
+        # Find the key with the highest correlation
+        max_major_idx = np.argmax(major_corr)
+        max_minor_idx = np.argmax(minor_corr)
+        # Determine if the piece is in a major or minor key
+        if major_corr[max_major_idx] > minor_corr[max_minor_idx]:
+            mode = "major"
+            key = self.key_names[max_major_idx]
+        else:
+            mode = "minor"
+            key = self.key_names[max_minor_idx]
+        # Calculate harmony complexity (variability in harmonic content)
+        harmony_complexity = np.std(chroma) / np.mean(chroma) if np.mean(chroma) > 0 else 0
+        # Calculate tonal stability (consistency of tonal center)
+        tonal_stability = 1.0 / (np.std(chroma_avg) + 0.001)  # Add small value to avoid division by zero
+        # Calculate spectral brightness (center of mass of the spectrum)
+        spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
+        brightness = np.mean(spectral_centroid) / (sr/2)  # Normalize by Nyquist frequency
+        # Calculate dissonance using spectral contrast
+        spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
+        dissonance = np.mean(spectral_contrast[0])  # Higher values may indicate more dissonance
+        return {
+            "key": key,
+            "mode": mode,
+            "is_major": mode == "major",
+            "harmony_complexity": float(harmony_complexity),
+            "tonal_stability": float(tonal_stability),
+            "brightness": float(brightness),
+            "dissonance": float(dissonance)
+        }
+    def analyze_energy(self, y, sr):
+        """Analyze energy characteristics of the audio"""
+        # RMS Energy (overall loudness)
+        rms = librosa.feature.rms(y=y)[0]
+        # Energy metrics
+        mean_energy = np.mean(rms)
+        energy_std = np.std(rms)
+        energy_dynamic_range = np.max(rms) - np.min(rms) if len(rms) > 0 else 0
+        # Energy distribution across frequency ranges
+        spec = np.abs(librosa.stft(y))
+        # Divide the spectrum into low, mid, and high ranges
+        freq_bins = spec.shape[0]
+        low_freq_energy = np.mean(spec[:int(freq_bins*0.2), :])
+        mid_freq_energy = np.mean(spec[int(freq_bins*0.2):int(freq_bins*0.8), :])
+        high_freq_energy = np.mean(spec[int(freq_bins*0.8):, :])
+        # Normalize to create a distribution
+        total_energy = low_freq_energy + mid_freq_energy + high_freq_energy
+        if total_energy > 0:
+            low_freq_ratio = low_freq_energy / total_energy
+            mid_freq_ratio = mid_freq_energy / total_energy
+            high_freq_ratio = high_freq_energy / total_energy
+        else:
+            low_freq_ratio = mid_freq_ratio = high_freq_ratio = 1/3
+        return {
+            "mean_energy": float(mean_energy),
+            "energy_std": float(energy_std),
+            "energy_dynamic_range": float(energy_dynamic_range),
+            "frequency_distribution": {
+                "low_freq": float(low_freq_ratio),
+                "mid_freq": float(mid_freq_ratio),
+                "high_freq": float(high_freq_ratio)
+            }
+        }
+    def analyze_emotion(self, rhythm_data, tonal_data, energy_data):
+        """Classify the emotion based on musical features"""
+        # Extract key features for emotion detection
+        tempo = rhythm_data["tempo"]
+        is_major = tonal_data["is_major"]
+        energy = energy_data["mean_energy"]
+        brightness = tonal_data["brightness"]
+        # Calculate scores for each emotion
+        emotion_scores = {}
+        for emotion, profile in self.emotion_profiles.items():
+            score = 0.0
+            # Tempo contribution (0-1 score)
+            tempo_range = profile["tempo"]
+            if tempo_range[0] <= tempo <= tempo_range[1]:
+                score += 1.0
+            else:
+                # Partial score based on distance
+                distance = min(abs(tempo - tempo_range[0]), abs(tempo - tempo_range[1]))
+                max_distance = 40  # Maximum distance to consider
+                score += max(0, 1 - (distance / max_distance))
+            # Energy contribution (0-1 score)
+            energy_range = profile["energy"]
+            if energy_range[0] <= energy <= energy_range[1]:
+                score += 1.0
+            else:
+                # Partial score based on distance
+                distance = min(abs(energy - energy_range[0]), abs(energy - energy_range[1]))
+                max_distance = 0.5  # Maximum distance to consider
+                score += max(0, 1 - (distance / max_distance))
+            # Mode contribution (0-1 score)
+            if profile["major_mode"] is not None:  # Some emotions don't have strong mode preference
+                score += 1.0 if profile["major_mode"] == is_major else 0.0
+            else:
+                score += 0.5  # Neutral contribution
+            # Brightness contribution (0-1 score)
+            brightness_range = profile["brightness"]
+            if brightness_range[0] <= brightness <= brightness_range[1]:
+                score += 1.0
+            else:
+                # Partial score based on distance
+                distance = min(abs(brightness - brightness_range[0]), abs(brightness - brightness_range[1]))
+                max_distance = 0.5  # Maximum distance to consider
+                score += max(0, 1 - (distance / max_distance))
+            # Normalize score (0-1 range)
+            emotion_scores[emotion] = score / 4.0
+        # Find primary emotion
+        primary_emotion = max(emotion_scores.items(), key=lambda x: x[1])
+        # Calculate valence and arousal (dimensional emotion model)
+        # Mapping different emotions to valence-arousal space
+        valence_map = {
+            'happy': 0.8, 'sad': 0.2, 'calm': 0.6,
+            'energetic': 0.7, 'tense': 0.3, 'nostalgic': 0.5
+        }
+        arousal_map = {
+            'happy': 0.7, 'sad': 0.3, 'calm': 0.2,
+            'energetic': 0.9, 'tense': 0.8, 'nostalgic': 0.4
+        }
+        # Calculate weighted valence and arousal
+        total_weight = sum(emotion_scores.values())
+        if total_weight > 0:
+            valence = sum(score * valence_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
+            arousal = sum(score * arousal_map[emotion] for emotion, score in emotion_scores.items()) / total_weight
+        else:
+            valence = 0.5
+            arousal = 0.5
+        return {
+            "primary_emotion": primary_emotion[0],
+            "confidence": primary_emotion[1],
+            "emotion_scores": emotion_scores,
+            "valence": float(valence),    # Pleasure dimension (0-1)
+            "arousal": float(arousal)     # Activity dimension (0-1)
+        }
+    def analyze_theme(self, rhythm_data, tonal_data, emotion_data):
+        """Infer potential themes based on musical features and emotion"""
+        # Extract relevant features
+        primary_emotion = emotion_data["primary_emotion"]
+        harmony_complexity = tonal_data["harmony_complexity"]
+        # Calculate theme scores
+        theme_scores = {}
+        for theme, profile in self.theme_profiles.items():
+            score = 0.0
+            # Emotion contribution
+            if primary_emotion in profile["emotion"]:
+                # Emotions listed earlier have stronger connection to the theme
+                position_weight = 1.0 / (profile["emotion"].index(primary_emotion) + 1)
+                score += position_weight
+            # Secondary emotions contribution
+            secondary_emotions = [e for e, s in emotion_data["emotion_scores"].items()
+                                 if s > 0.5 and e != primary_emotion]
+            for emotion in secondary_emotions:
+                if emotion in profile["emotion"]:
+                    score += 0.3  # Less weight than primary emotion
+            # Harmony complexity contribution
+            complexity_range = profile["harmony_complexity"]
+            if complexity_range[0] <= harmony_complexity <= complexity_range[1]:
+                score += 1.0
+            else:
+                # Partial score based on distance
+                distance = min(abs(harmony_complexity - complexity_range[0]),
+                              abs(harmony_complexity - complexity_range[1]))
+                max_distance = 0.5  # Maximum distance to consider
+                score += max(0, 1 - (distance / max_distance))
+            # Normalize score
+            theme_scores[theme] = min(1.0, score / 2.5)
+        # Find primary theme
+        primary_theme = max(theme_scores.items(), key=lambda x: x[1])
+        # Find secondary themes (scores > 0.5)
+        secondary_themes = [(theme, score) for theme, score in theme_scores.items()
+                          if score > 0.5 and theme != primary_theme[0]]
+        secondary_themes.sort(key=lambda x: x[1], reverse=True)
+        return {
+            "primary_theme": primary_theme[0],
+            "confidence": primary_theme[1],
+            "secondary_themes": [t[0] for t in secondary_themes[:2]],  # Top 2 secondary themes
+            "theme_scores": theme_scores
+        }
+    def analyze_music(self, file_path):
+        """Main function to perform comprehensive music analysis"""
+        # Load the audio file
+        y, sr = self.load_audio(file_path)
+        if y is None:
+            return {"error": "Failed to load audio file"}
+        # Run all analyses
+        rhythm_data = self.analyze_rhythm(y, sr)
+        tonal_data = self.analyze_tonality(y, sr)
+        energy_data = self.analyze_energy(y, sr)
+        # Higher-level analyses that depend on the basic features
+        emotion_data = self.analyze_emotion(rhythm_data, tonal_data, energy_data)
+        theme_data = self.analyze_theme(rhythm_data, tonal_data, emotion_data)
+        # Combine all results
+        return {
+            "file": file_path,
+            "rhythm_analysis": rhythm_data,
+            "tonal_analysis": tonal_data,
+            "energy_analysis": energy_data,
+            "emotion_analysis": emotion_data,
+            "theme_analysis": theme_data,
+            "summary": {
+                "tempo": rhythm_data["tempo"],
+                "time_signature": rhythm_data["estimated_time_signature"],
+                "key": tonal_data["key"],
+                "mode": tonal_data["mode"],
+                "primary_emotion": emotion_data["primary_emotion"],
+                "primary_theme": theme_data["primary_theme"]
+            }
+        }
+    # def visualize_analysis(self, file_path):
+    #     """Create visualizations for the music analysis results"""
+    #     # Check if matplotlib is available
+    #     if plt is None:
+    #         print("Error: matplotlib is not installed. Visualization is not available.")
+    #         return
+    #
+    #     # Load audio and run analysis
+    #     y, sr = self.load_audio(file_path)
+    #     if y is None:
+    #         print("Error: Failed to load audio file")
+    #         return
+    #
+    #     results = self.analyze_music(file_path)
+    #
+    #     # Create visualization
+    #     plt.figure(figsize=(15, 12))
+    #     # Waveform
+    #     plt.subplot(3, 2, 1)
+    #     librosa.display.waveshow(y, sr=sr, alpha=0.6)
+    #     plt.title(f'Waveform (Tempo: {results["rhythm_analysis"]["tempo"]:.1f} BPM)')
+    #     # Spectrogram
+    #     plt.subplot(3, 2, 2)
+    #     D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
+    #     librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
+    #     plt.colorbar(format='%+2.0f dB')
+    #     plt.title(f'Spectrogram (Key: {results["tonal_analysis"]["key"]} {results["tonal_analysis"]["mode"]})')
+    #     # Chromagram
+    #     plt.subplot(3, 2, 3)
+    #     chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
+    #     librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
+    #     plt.colorbar()
+    #     plt.title('Chromagram')
+    #     # Onset strength and beats
+    #     plt.subplot(3, 2, 4)
+    #     onset_env = librosa.onset.onset_strength(y=y, sr=sr)
+    #     times = librosa.times_like(onset_env, sr=sr)
+    #     plt.plot(times, librosa.util.normalize(onset_env), label='Onset strength')
+    #     plt.vlines(results["rhythm_analysis"]["beat_times"], 0, 1, alpha=0.5, color='r',
+    #               linestyle='--', label='Beats')
+    #     plt.legend()
+    #     plt.title('Rhythm Analysis')
+    #     # Emotion scores
+    #     plt.subplot(3, 2, 5)
+    #     emotions = list(results["emotion_analysis"]["emotion_scores"].keys())
+    #     scores = list(results["emotion_analysis"]["emotion_scores"].values())
+    #     plt.bar(emotions, scores, color='skyblue')
+    #     plt.ylim(0, 1)
+    #     plt.title(f'Emotion Analysis (Primary: {results["emotion_analysis"]["primary_emotion"]})')
+    #     plt.xticks(rotation=45)
+    #     # Theme scores
+    #     plt.subplot(3, 2, 6)
+    #     themes = list(results["theme_analysis"]["theme_scores"].keys())
+    #     scores = list(results["theme_analysis"]["theme_scores"].values())
+    #     plt.bar(themes, scores, color='lightgreen')
+    #     plt.ylim(0, 1)
+    #     plt.title(f'Theme Analysis (Primary: {results["theme_analysis"]["primary_theme"]})')
+    #     plt.xticks(rotation=45)
+    #     plt.tight_layout()
+    #     plt.show()
+# Create an instance of the analyzer
+analyzer = MusicAnalyzer()
+# The following code is for demonstration purposes only
+# and will only run if executed directly (not when imported)
+if __name__ == "__main__":
+    # Replace this with a real audio file path when running as a script
+    demo_file = "path/to/your/audio/file.mp3"
+    # Analyze the uploaded audio file
+    results = analyzer.analyze_music(demo_file)
+    # Print analysis summary
+    print("\n=== MUSIC ANALYSIS SUMMARY ===")
+    print(f"Tempo: {results['summary']['tempo']:.1f} BPM")
+    print(f"Time Signature: {results['summary']['time_signature']}")
+    print(f"Key: {results['summary']['key']} {results['summary']['mode']}")
+    print(f"Primary Emotion: {results['summary']['primary_emotion']}")
+    print(f"Primary Theme: {results['summary']['primary_theme']}")
+    # Show detailed results (optional)
+    import json
+    print("\n=== DETAILED ANALYSIS ===")
+    print(json.dumps(results, indent=2))
+    # Visualize the analysis
+    # analyzer.visualize_analysis(demo_file)

example.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import os
+import sys
+from app import process_audio, music_analyzer
+def main():
+    """
+    Example function to demonstrate the application with a sample audio file.
+    Usage:
+    python example.py <path_to_audio_file>
+    """
+    if len(sys.argv) != 2:
+        print("Usage: python example.py <path_to_audio_file>")
+        return
+    audio_file = sys.argv[1]
+    if not os.path.exists(audio_file):
+        print(f"Error: File {audio_file} does not exist.")
+        return
+    print(f"Processing audio file: {audio_file}")
+    # Call the main processing function
+    genre_results, lyrics = process_audio(audio_file)
+    # Get emotion analysis results
+    emotion_results = music_analyzer.analyze_music(audio_file)
+    # Print results
+    print("\n" + "="*50)
+    print("GENRE CLASSIFICATION RESULTS:")
+    print("="*50)
+    print(genre_results)
+    print("\n" + "="*50)
+    print("EMOTION ANALYSIS RESULTS:")
+    print("="*50)
+    print(f"Tempo: {emotion_results['summary']['tempo']:.1f} BPM")
+    print(f"Key: {emotion_results['summary']['key']} {emotion_results['summary']['mode']}")
+    print(f"Primary Emotion: {emotion_results['summary']['primary_emotion']}")
+    print(f"Primary Theme: {emotion_results['summary']['primary_theme']}")
+    print("\n" + "="*50)
+    print("GENERATED LYRICS:")
+    print("="*50)
+    print(lyrics)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+gradio>=5.22.0
+transformers>=4.36.2
+torch>=2.1.2
+torchaudio>=2.1.2
+numpy>=1.26.2
+accelerate>=0.25.0
+librosa>=0.10.1
+huggingface-hub>=0.20.3
+bitsandbytes>=0.41.1
+sentencepiece>=0.1.99
+safetensors>=0.4.1
+scipy>=1.12.0
+soundfile>=0.12.1
+matplotlib>=3.7.0

utils.py ADDED Viewed

	@@ -0,0 +1,105 @@

+import torch
+import numpy as np
+import librosa
+def load_audio(audio_file, sr=22050):
+    """Load an audio file and convert to mono if needed."""
+    try:
+        # Try to load audio with librosa
+        y, sr = librosa.load(audio_file, sr=sr, mono=True)
+        return y, sr
+    except Exception as e:
+        print(f"Error loading audio with librosa: {str(e)}")
+        # Fallback to basic loading if necessary
+        import soundfile as sf
+        try:
+            y, sr = sf.read(audio_file)
+            # Convert to mono if stereo
+            if len(y.shape) > 1:
+                y = y.mean(axis=1)
+            return y, sr
+        except Exception as e2:
+            print(f"Error loading audio with soundfile: {str(e2)}")
+            raise ValueError(f"Could not load audio file: {audio_file}")
+def extract_audio_duration(y, sr):
+    """Get the duration of audio in seconds."""
+    return len(y) / sr
+def extract_mfcc_features(y, sr, n_mfcc=20):
+    """Extract MFCC features from audio."""
+    try:
+        mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
+        mfccs_mean = np.mean(mfccs.T, axis=0)
+        return mfccs_mean
+    except Exception as e:
+        print(f"Error extracting MFCCs: {str(e)}")
+        # Return a fallback feature vector if extraction fails
+        return np.zeros(n_mfcc)
+def calculate_lyrics_length(duration):
+    """
+    Calculate appropriate lyrics length based on audio duration.
+    Uses a more conservative calculation that generates shorter lyrics:
+    - Average words per line (8-10 words)
+    - Reduced words per minute (45 words instead of 135)
+    - Simplified song structure
+    """
+    # Convert duration to minutes
+    duration_minutes = duration / 60
+    # Calculate total words based on duration
+    # Using 45 words per minute (reduced from 135)
+    total_words = int(duration_minutes * 90)
+    # Calculate number of lines
+    # Assuming 8-10 words per line
+    words_per_line = 9  # average
+    total_lines = total_words // words_per_line
+    # Adjust for song structure with shorter lengths
+    if total_lines < 6:
+        # Very short song - keep it simple
+        return max(2, total_lines)
+    elif total_lines < 10:
+        # Short song - one verse and chorus
+        return min(6, total_lines)
+    elif total_lines < 15:
+        # Medium song - two verses and chorus
+        return min(10, total_lines)
+    else:
+        # Longer song - two verses, chorus, and bridge
+        return min(15, total_lines)
+def format_genre_results(top_genres):
+    """Format genre classification results for display."""
+    result = "Top Detected Genres:\n"
+    for genre, confidence in top_genres:
+        result += f"- {genre}: {confidence*100:.2f}%\n"
+    return result
+def ensure_cuda_availability():
+    """Check and report CUDA availability for informational purposes."""
+    cuda_available = torch.cuda.is_available()
+    if cuda_available:
+        device_count = torch.cuda.device_count()
+        device_name = torch.cuda.get_device_name(0) if device_count > 0 else "Unknown"
+        print(f"CUDA is available with {device_count} device(s). Using: {device_name}")
+    else:
+        print("CUDA is not available. Using CPU for inference.")
+    return cuda_available
+def preprocess_audio_for_model(waveform, sample_rate, target_sample_rate=16000, max_length=16000):
+    """Preprocess audio for model input (resample, pad/trim)."""
+    # Resample if needed
+    if sample_rate != target_sample_rate:
+        waveform = librosa.resample(waveform, orig_sr=sample_rate, target_sr=target_sample_rate)
+    # Trim or pad to expected length
+    if len(waveform) > max_length:
+        waveform = waveform[:max_length]
+    elif len(waveform) < max_length:
+        padding = max_length - len(waveform)
+        waveform = np.pad(waveform, (0, padding), 'constant')
+    return waveform