File size: 4,009 Bytes

43251d4
ea0ef38
b32ffdd
 
c10009b
 
7957232
4e253e9
bcdb661
b32ffdd
252c0c9
b7d75f5
43251d4
 
4e253e9
 
 
78e272c
899b715
4e253e9
 
 
78e272c
3f23add
4e253e9
 
 
 
3f23add
43251d4
 
 
 
 
b7d75f5
 
43251d4
 
 
 
 
 
 
 
 
 
 
 
 
 
72edf1b
 
252c0c9
 
e94274b
252c0c9
e94274b
 
 
72edf1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252c0c9
43251d4
 
 
b5f41e0

---
license: apache-2.0
inference:
  parameters:
    num_beams: 3
    num_beam_groups: 3
    num_return_sequences: 1
    repetition_penalty: 3
    diversity_penalty: 3.01
    no_repeat_ngram_size: 2
    temperature: 0.8
    max_length: 64
widget:
- text: >-
    paraphraser: Learn to build generative AI applications with an expert AWS
    instructor with the 2-day Developing Generative AI Applications on AWS
    course.
  example_title: AWS course
- text: >-
    paraphraser: In healthcare, Generative AI can help generate synthetic
    medical data to train machine learning models, develop new drug candidates,
    and design clinical trials.
  example_title: Generative AI
- text: >-
    paraphraser: By leveraging prior model training through transfer learning,
    fine-tuning can reduce the amount of expensive computing power and labeled
    data needed to obtain large models tailored to niche use cases and business
    needs.
  example_title: Fine Tuning
---


# Text Rewriter Paraphraser

This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters.

## Key Features:

* **Fine-tuned on t5-base:** Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing.
* **Large Dataset (430k examples):** Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance.
* **High Quality Paraphrases:** Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness.
* **Non-AI Detectable:** Aims to produce paraphrases that appear natural and indistinguishable from human-written text.

**Model Performance:**

* Train Loss: 1.0645
* Validation Loss: 0.8761

## Getting Started:

T5 model expects a task related prefix: since it is a paraphrasing task, we will add a prefix "paraphraser: "

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser")
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser").to(device)

def generate_title(text):
    input_ids = tokenizer(f'paraphraser: {text}', return_tensors="pt", padding="longest", truncation=True, max_length=64).input_ids.to(device)
    outputs = model.generate(
        input_ids,
        num_beams=4,
        num_beam_groups=4,
        num_return_sequences=4,
        repetition_penalty=10.0,
        diversity_penalty=3.0,
        no_repeat_ngram_size=2,
        temperature=0.8,
        max_length=64
    )
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

text = 'By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs.'
generate_title(text)
```
### Output:
```
 ['The fine-tuning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs by using prior model training through transfer learning.',
 'fine-tuning, by utilizing prior model training through transfer learning, can reduce the amount of expensive computing power and labeled data required to obtain large models tailored for niche use cases and business needs.',
 'Fine-tunering by using prior model training through transfer learning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs.',
 'Using transfer learning to use prior model training, fine-tuning can reduce the amount of expensive computing power and labeled data required for large models that are suitable in niche usage cases or businesses.']
```

**Further Development:**

(Mention any ongoing development or areas for future improvement in Discussions.)