Spaces:
Sleeping
Sleeping
File size: 4,911 Bytes
975f207 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# Feelings to Emoji: Technical Reference
This document provides technical details about the implementation of the Feelings to Emoji application.
## Project Structure
The application is organized into several Python modules:
- `app.py` - Main application file with Gradio interface
- `emoji_processor.py` - Core processing logic for emoji matching
- `config.py` - Configuration settings
- `utils.py` - Utility functions
- `generate_embeddings.py` - Standalone tool to pre-generate embeddings
## Embedding Models
The system uses the following sentence embedding models from the Sentence Transformers library:
| Model Key | Model ID | Size | Description |
|-----------|----------|------|-------------|
| mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model |
| gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance |
| bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity |
## Emoji Matching Algorithm
The application uses cosine similarity between sentence embeddings to match text with emojis:
1. For each emoji category (emotion and event):
- Embed descriptions using the selected model
- Calculate cosine similarity between the input text embedding and each emoji description embedding
- Return the emoji with the highest similarity score
2. The embeddings are pre-computed and cached to improve performance:
- Stored as pickle files in the `embeddings/` directory
- Generated using `generate_embeddings.py`
- Loaded at startup to minimize processing time
## Module Reference
### `config.py`
Contains configuration settings including:
- `CONFIG`: Dictionary with basic application settings (model name, file paths, etc.)
- `EMBEDDING_MODELS`: Dictionary defining the available embedding models
### `utils.py`
Utility functions including:
- `setup_logging()`: Configures application logging
- `kitchen_txt_to_dict(filepath)`: Parses emoji dictionary files
- `save_embeddings_to_pickle(embeddings, filepath)`: Saves embeddings to pickle files
- `load_embeddings_from_pickle(filepath)`: Loads embeddings from pickle files
- `get_embeddings_pickle_path(model_id, emoji_type)`: Generates consistent paths for embedding files
### `emoji_processor.py`
Core processing logic:
- `EmojiProcessor`: Main class for emoji matching and processing
- `__init__(model_name=None, model_key=None, use_cached_embeddings=True)`: Initializes the processor with a specific model
- `load_emoji_dictionaries(emotion_file, item_file)`: Loads emoji dictionaries from text files
- `switch_model(model_key)`: Switches to a different embedding model
- `sentence_to_emojis(sentence)`: Processes text to find matching emojis and generate mashup
- `find_top_emojis(embedding, emoji_embeddings, top_n=1)`: Finds top matching emojis using cosine similarity
### `app.py`
Gradio interface:
- `EmojiMashupApp`: Main application class
- `create_interface()`: Creates the Gradio interface
- `process_with_model(model_selection, text, use_cached_embeddings)`: Processes text with selected model
- `get_random_example()`: Gets a random example sentence for demonstration
### `generate_embeddings.py`
Standalone utility to pre-generate embeddings:
- `generate_embeddings_for_model(model_key, model_info)`: Generates embeddings for a specific model
- `main()`: Main function that processes all models and saves embeddings
## Emoji Data Files
- `google-emoji-kitchen-emotion.txt`: Emotion emojis with descriptions
- `google-emoji-kitchen-item.txt`: Event/object emojis with descriptions
- `google-emoji-kitchen-compatible.txt`: Compatibility information for emoji combinations
## Embedding Cache Structure
The `embeddings/` directory contains pre-generated embeddings in pickle format:
- `[model_id]_emotion.pkl`: Embeddings for emotion emojis
- `[model_id]_event.pkl`: Embeddings for event/object emojis
## API Usage Examples
### Using the EmojiProcessor Directly
```python
from emoji_processor import EmojiProcessor
# Initialize with default model (mpnet)
processor = EmojiProcessor()
processor.load_emoji_dictionaries()
# Process a sentence
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
print(f"Emotion emoji: {emotion}")
print(f"Event emoji: {event}")
# image contains the PIL Image object of the mashup
```
### Switching Models
```python
# Switch to a different model
processor.switch_model("gte")
# Process with the new model
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")
```
## Performance Considerations
- Embedding generation is computationally intensive but only happens once per model
- Using cached embeddings significantly improves response time
- Larger models (GTE, BGE) may provide better accuracy but require more resources
- The MPNet model offers a good balance of performance and accuracy for most use cases |