ayushsinha commited on
Commit
c5a90ea
Β·
verified Β·
1 Parent(s): d5da1c2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Paraphrase Generation with Text-to-Text Transfer Transformer
2
+
3
+ ## πŸ“Œ Overview
4
+
5
+ This repository hosts the quantized version of the T5 model fine-tuned for Paraphrase Generation. The model has been trained on the chatgpt-paraphrases dataset from Hugging Face to enhance grammatical accuracy in given text inputs. The model is quantized to Float16 (FP16) to optimize inference speed and efficiency while maintaining high performance.
6
+
7
+ ## πŸ— Model Details
8
+
9
+ - **Model Architecture:** t5-small
10
+ - **Task:** Paraphrase Generation
11
+ - **Dataset:** Hugging Face's `chatgpt-paraphrases`
12
+ - **Quantization:** Float16 (FP16) for optimized inference
13
+ - **Fine-tuning Framework:** Hugging Face Transformers
14
+
15
+ ## πŸš€ Usage
16
+
17
+ ### Installation
18
+
19
+ ```bash
20
+ pip install transformers torch
21
+ ```
22
+
23
+ ### Loading the Model
24
+
25
+ ```python
26
+ from transformers import T5Tokenizer, T5ForConditionalGeneration, pipeline
27
+ import torch
28
+
29
+ device = "cuda" if torch.cuda.is_available() else "cpu"
30
+
31
+ model_name = "AventIQ-AI/t5-paraphrase-generation"
32
+ model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
33
+ tokenizer = T5Tokenizer.from_pretrained(model_name)
34
+ ```
35
+
36
+ ### Grammar Correction Inference
37
+
38
+ ```python
39
+ paraphrase_pipeline = pipeline("text2text-generation", model=quantized_model, tokenizer=tokenizer)
40
+ test_text = "The quick brown fox jumps over the lazy dog"
41
+
42
+ # Generate paraphrases
43
+ results = paraphrase_pipeline(
44
+ test_text,
45
+ max_length=256,
46
+ truncation=True,
47
+ num_return_sequences=5,
48
+ do_sample=True,
49
+ top_k=50,
50
+ temperature=0.7
51
+ )
52
+
53
+ print("Original Text:", test_text)
54
+ print("\nParaphrased Outputs:")
55
+
56
+ for i, output in enumerate(results):
57
+ generated_text = output["generated_text"] if isinstance(output, dict) else str(output)
58
+ print(f"{i+1}. {generated_text.strip()}")
59
+ ```
60
+
61
+ # πŸ“Š ROUGE Evaluation Results
62
+
63
+ After fine-tuning the **T5-Small** model for paraphrase generation, we obtained the following **ROUGE** scores:
64
+
65
+ | **Metric** | **Score** | **Meaning** |
66
+ |-------------|-----------|-------------|
67
+ | **ROUGE-1** | **0.7777** (~78%) | Measures overlap of **unigrams (single words)** between the reference and generated summary. |
68
+ | **ROUGE-2** | **0.5** (~50%) | Measures overlap of **bigrams (two-word phrases)**, indicating coherence and fluency. |
69
+ | **ROUGE-L** | **0.7777** (~78%) | Measures **longest matching word sequences**, testing sentence structure preservation. |
70
+ | **ROUGE-Lsum** | **0.7777** (~78%) | Similar to ROUGE-L but optimized for summarization tasks. |
71
+
72
+
73
+ ## ⚑ Quantization Details
74
+
75
+ Post-training quantization was applied using PyTorch's built-in quantization framework. The model was quantized to Float16 (FP16) to reduce model size and improve inference efficiency while balancing accuracy.
76
+
77
+ ## πŸ“‚ Repository Structure
78
+
79
+ ```
80
+ .
81
+ β”œβ”€β”€ model/ # Contains the quantized model files
82
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
83
+ β”œβ”€β”€ model.safetensors/ # Quantized Model
84
+ β”œβ”€β”€ README.md # Model documentation
85
+ ```
86
+
87
+ ## ⚠️ Limitations
88
+
89
+ - The model may struggle with highly ambiguous sentences.
90
+ - Quantization may lead to slight degradation in accuracy compared to full-precision models.
91
+ - Performance may vary across different writing styles and sentence structures.
92
+
93
+ ## 🀝 Contributing
94
+
95
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.