# Feelings to Emoji: Technical Reference This document provides technical details about the implementation of the Feelings to Emoji application. ## Project Structure The application is organized into several Python modules: - `app.py` - Main application file with Gradio interface - `emoji_processor.py` - Core processing logic for emoji matching - `config.py` - Configuration settings - `utils.py` - Utility functions - `generate_embeddings.py` - Standalone tool to pre-generate embeddings ## Embedding Models The system uses the following sentence embedding models from the Sentence Transformers library: | Model Key | Model ID | Size | Description | |-----------|----------|------|-------------| | mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model | | gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance | | bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity | ## Emoji Matching Algorithm The application uses cosine similarity between sentence embeddings to match text with emojis: 1. For each emoji category (emotion and event): - Embed descriptions using the selected model - Calculate cosine similarity between the input text embedding and each emoji description embedding - Return the emoji with the highest similarity score 2. The embeddings are pre-computed and cached to improve performance: - Stored as pickle files in the `embeddings/` directory - Generated using `generate_embeddings.py` - Loaded at startup to minimize processing time ## Module Reference ### `config.py` Contains configuration settings including: - `CONFIG`: Dictionary with basic application settings (model name, file paths, etc.) - `EMBEDDING_MODELS`: Dictionary defining the available embedding models ### `utils.py` Utility functions including: - `setup_logging()`: Configures application logging - `kitchen_txt_to_dict(filepath)`: Parses emoji dictionary files - `save_embeddings_to_pickle(embeddings, filepath)`: Saves embeddings to pickle files - `load_embeddings_from_pickle(filepath)`: Loads embeddings from pickle files - `get_embeddings_pickle_path(model_id, emoji_type)`: Generates consistent paths for embedding files ### `emoji_processor.py` Core processing logic: - `EmojiProcessor`: Main class for emoji matching and processing - `__init__(model_name=None, model_key=None, use_cached_embeddings=True)`: Initializes the processor with a specific model - `load_emoji_dictionaries(emotion_file, item_file)`: Loads emoji dictionaries from text files - `switch_model(model_key)`: Switches to a different embedding model - `sentence_to_emojis(sentence)`: Processes text to find matching emojis and generate mashup - `find_top_emojis(embedding, emoji_embeddings, top_n=1)`: Finds top matching emojis using cosine similarity ### `app.py` Gradio interface: - `EmojiMashupApp`: Main application class - `create_interface()`: Creates the Gradio interface - `process_with_model(model_selection, text, use_cached_embeddings)`: Processes text with selected model - `get_random_example()`: Gets a random example sentence for demonstration ### `generate_embeddings.py` Standalone utility to pre-generate embeddings: - `generate_embeddings_for_model(model_key, model_info)`: Generates embeddings for a specific model - `main()`: Main function that processes all models and saves embeddings ## Emoji Data Files - `google-emoji-kitchen-emotion.txt`: Emotion emojis with descriptions - `google-emoji-kitchen-item.txt`: Event/object emojis with descriptions - `google-emoji-kitchen-compatible.txt`: Compatibility information for emoji combinations ## Embedding Cache Structure The `embeddings/` directory contains pre-generated embeddings in pickle format: - `[model_id]_emotion.pkl`: Embeddings for emotion emojis - `[model_id]_event.pkl`: Embeddings for event/object emojis ## API Usage Examples ### Using the EmojiProcessor Directly ```python from emoji_processor import EmojiProcessor # Initialize with default model (mpnet) processor = EmojiProcessor() processor.load_emoji_dictionaries() # Process a sentence emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!") print(f"Emotion emoji: {emotion}") print(f"Event emoji: {event}") # image contains the PIL Image object of the mashup ``` ### Switching Models ```python # Switch to a different model processor.switch_model("gte") # Process with the new model emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.") ``` ## Performance Considerations - Embedding generation is computationally intensive but only happens once per model - Using cached embeddings significantly improves response time - Larger models (GTE, BGE) may provide better accuracy but require more resources - The MPNet model offers a good balance of performance and accuracy for most use cases