Spaces:
Sleeping
Sleeping
Dan Mo
Add comprehensive technical reference documentation for the Feelings to Emoji application
975f207
# Feelings to Emoji: Technical Reference | |
This document provides technical details about the implementation of the Feelings to Emoji application. | |
## Project Structure | |
The application is organized into several Python modules: | |
- `app.py` - Main application file with Gradio interface | |
- `emoji_processor.py` - Core processing logic for emoji matching | |
- `config.py` - Configuration settings | |
- `utils.py` - Utility functions | |
- `generate_embeddings.py` - Standalone tool to pre-generate embeddings | |
## Embedding Models | |
The system uses the following sentence embedding models from the Sentence Transformers library: | |
| Model Key | Model ID | Size | Description | | |
|-----------|----------|------|-------------| | |
| mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model | | |
| gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance | | |
| bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity | | |
## Emoji Matching Algorithm | |
The application uses cosine similarity between sentence embeddings to match text with emojis: | |
1. For each emoji category (emotion and event): | |
- Embed descriptions using the selected model | |
- Calculate cosine similarity between the input text embedding and each emoji description embedding | |
- Return the emoji with the highest similarity score | |
2. The embeddings are pre-computed and cached to improve performance: | |
- Stored as pickle files in the `embeddings/` directory | |
- Generated using `generate_embeddings.py` | |
- Loaded at startup to minimize processing time | |
## Module Reference | |
### `config.py` | |
Contains configuration settings including: | |
- `CONFIG`: Dictionary with basic application settings (model name, file paths, etc.) | |
- `EMBEDDING_MODELS`: Dictionary defining the available embedding models | |
### `utils.py` | |
Utility functions including: | |
- `setup_logging()`: Configures application logging | |
- `kitchen_txt_to_dict(filepath)`: Parses emoji dictionary files | |
- `save_embeddings_to_pickle(embeddings, filepath)`: Saves embeddings to pickle files | |
- `load_embeddings_from_pickle(filepath)`: Loads embeddings from pickle files | |
- `get_embeddings_pickle_path(model_id, emoji_type)`: Generates consistent paths for embedding files | |
### `emoji_processor.py` | |
Core processing logic: | |
- `EmojiProcessor`: Main class for emoji matching and processing | |
- `__init__(model_name=None, model_key=None, use_cached_embeddings=True)`: Initializes the processor with a specific model | |
- `load_emoji_dictionaries(emotion_file, item_file)`: Loads emoji dictionaries from text files | |
- `switch_model(model_key)`: Switches to a different embedding model | |
- `sentence_to_emojis(sentence)`: Processes text to find matching emojis and generate mashup | |
- `find_top_emojis(embedding, emoji_embeddings, top_n=1)`: Finds top matching emojis using cosine similarity | |
### `app.py` | |
Gradio interface: | |
- `EmojiMashupApp`: Main application class | |
- `create_interface()`: Creates the Gradio interface | |
- `process_with_model(model_selection, text, use_cached_embeddings)`: Processes text with selected model | |
- `get_random_example()`: Gets a random example sentence for demonstration | |
### `generate_embeddings.py` | |
Standalone utility to pre-generate embeddings: | |
- `generate_embeddings_for_model(model_key, model_info)`: Generates embeddings for a specific model | |
- `main()`: Main function that processes all models and saves embeddings | |
## Emoji Data Files | |
- `google-emoji-kitchen-emotion.txt`: Emotion emojis with descriptions | |
- `google-emoji-kitchen-item.txt`: Event/object emojis with descriptions | |
- `google-emoji-kitchen-compatible.txt`: Compatibility information for emoji combinations | |
## Embedding Cache Structure | |
The `embeddings/` directory contains pre-generated embeddings in pickle format: | |
- `[model_id]_emotion.pkl`: Embeddings for emotion emojis | |
- `[model_id]_event.pkl`: Embeddings for event/object emojis | |
## API Usage Examples | |
### Using the EmojiProcessor Directly | |
```python | |
from emoji_processor import EmojiProcessor | |
# Initialize with default model (mpnet) | |
processor = EmojiProcessor() | |
processor.load_emoji_dictionaries() | |
# Process a sentence | |
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!") | |
print(f"Emotion emoji: {emotion}") | |
print(f"Event emoji: {event}") | |
# image contains the PIL Image object of the mashup | |
``` | |
### Switching Models | |
```python | |
# Switch to a different model | |
processor.switch_model("gte") | |
# Process with the new model | |
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.") | |
``` | |
## Performance Considerations | |
- Embedding generation is computationally intensive but only happens once per model | |
- Using cached embeddings significantly improves response time | |
- Larger models (GTE, BGE) may provide better accuracy but require more resources | |
- The MPNet model offers a good balance of performance and accuracy for most use cases |