FinalProject / README.md
MonicaDasari's picture
Update README.md
3b1dd8a verified
Here’s a **README** template for your project, designed to highlight the models used, evaluation methodology, and key results. You can adapt this for Hugging Face or any similar platform.
---
# **English-to-Japanese Translation Project**
## **Overview**
This project focuses on building a robust system for English-to-Japanese translation using state-of-the-art multilingual models. Two models were used: **mT5** as the primary model and **mBART** as the secondary model. Together, they ensure high-quality translations and versatility in multilingual tasks.
---
## **Models Used**
### **1. mT5 (Primary Model)**
- **Reason for Selection**:
- mT5 is highly versatile and trained on a broad multilingual dataset, making it suitable for translation and other tasks like summarization or answering questions.
- It performs well without extensive fine-tuning, saving computational resources.
- **Strengths**:
- Handles translation naturally with minimal training.
- Can perform additional tasks beyond translation.
- **Limitations**:
- Sometimes lacks precision in detailed translations.
---
### **2. mBART (Secondary Model)**
- **Reason for Selection**:
- mBART specializes in multilingual translation tasks and provides highly accurate translations when fine-tuned.
- **Strengths**:
- Optimized for translation accuracy, especially for long sentences and contextual consistency.
- Handles grammatical and contextual errors well.
- **Limitations**:
- Less flexible for tasks like summarization or question answering compared to mT5.
---
## **Evaluation Strategy**
To evaluate model performance, the following metrics were used:
1. **BLEU Score**:
- Measures how close the model's output is to the correct translation.
- Chosen because it is a standard for evaluating translation accuracy.
2. **Training Loss**:
- Tracks how well the model is learning during training.
- A lower loss shows better learning and fewer errors.
3. **Perplexity**:
- Checks the confidence of the model’s predictions.
- Lower perplexity means fewer mistakes and more fluent translations.
---
## **Steps Taken**
1. Fine-tuned both models using a dataset of English-Japanese text pairs to improve translation accuracy.
2. Tested the models on unseen data to measure their real-world performance.
3. Applied optimizations like **4-bit quantization** to reduce memory usage and make the models faster during evaluation.
---
## **Results**
- **mT5**:
- Performed well in handling translations and additional tasks like summarization and answering questions.
- Showed versatility but sometimes lacked detailed accuracy for translations.
- **mBART**:
- Delivered precise and contextually accurate translations, especially for longer sentences.
- Required fine-tuning but outperformed mT5 in translation-focused tasks.
- **Overall Conclusion**:
mT5 is a flexible model for multilingual tasks, while mBART ensures high-quality translations. Together, they balance versatility and accuracy, making them ideal for English-to-Japanese translations.
---
## **How to Use**
1. Load the models from Hugging Face:
- [mT5 Model on Hugging Face](https://huggingface.co/google/mt5-small)
- [mBART Model on Hugging Face](https://huggingface.co/facebook/mbart-large-50)
2. Fine-tune the models for your dataset using English-Japanese text pairs.
3. Evaluate performance using BLEU Score, training loss, and perplexity.
---
## **Future Work**
- Expand the dataset for better fine-tuning.
- Explore task-specific fine-tuning for mT5 to improve its translation accuracy.
- Optimize the models further for deployment in resource-constrained environments.
---
## **References**
- [mT5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/2010.11934)
- [mBART: Multilingual Denoising Pretraining for Neural Machine Translation](https://arxiv.org/abs/2001.08210)
---