|
--- |
|
language: en |
|
datasets: |
|
- c4 |
|
- wikipedia |
|
metrics: |
|
- f1 |
|
--- |
|
|
|
# T5-V1.1-large-rss |
|
This model is [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) finetuned on RSS dataset. The model was finetuned as part of |
|
["How Optimal is Greedy Decoding for Extractive Question Answering?"](https://arxiv.org/abs/2108.05857), while the RSS pretraining method was introduced in [this paper](https://arxiv.org/pdf/2101.00438.pdf). |
|
|
|
## Model description |
|
The original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) was only pre-trained on C4 excluding any supervised training. Our version is further trained on Rucurrent Span Selection scheme (RSS), using a sample from the dataset used to pretrain [Splinter](tau/splinter-large): |
|
* contexts with a span occurring more than once are detected |
|
* a single instance of the recurring span is maked |
|
* the model is trained (teacher forcing) to predict the masked span |
|
This training scheme naturally matches the extractive question answering task. |
|
|
|
During training time, the masked span is replaced with `<extra_id_0>` and the labels are formatted as `<extra_id_0>span<extra_id_0>`. Unlike [Splinter](tau/splinter-large), only one span is mask at a time. |
|
|
|
## Intended uses & limitations |
|
This model naturally fits tasks where a span from a context is intended to be copied, like extractive question answering. |
|
This checkpoint is primarily aimed to be used in zero-shot setting - further fine-tuning it on an annotated dataset gives equal results to those of the original T5-v1.1-large. |
|
|
|
### How to use |
|
You can use this model directly but it is recommended to format the input to be aligned with that of the training scheme and as a text-question context: |
|
```python |
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
model = AutoModelForSeq2SeqLM.from_pretrained('tau/t5-v1_1-large-rss') |
|
tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-rss') |
|
|
|
passage = 'Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. ' |
|
question = 'When was Obama inaugurated?' |
|
text = f'Text: {passage}.\nQuestion: {question}\nAnswer:{tokenizer.additional_special_tokens[0]}.' |
|
encoded_input = tokenizer(text, return_tensors='pt') |
|
output_ids = model.generate(input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, |
|
eos_token_id=tokenizer.additional_special_tokens_ids[1], num_beams=1, max_length=512, min_length=3) |
|
tokenizer.decode(output_ids[0]) |
|
``` |
|
The generated answer is then `"<pad><extra_id_0> 2009<extra_id_1>"`, while the one generated by the original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) is `"<pad><extra_id_0> On January 20, 2009<extra_id_1>"` - a correct yet non-extractive answer. |
|
|
|
### Limitations and bias |
|
Although using the model with greedy decoding tends toward extracted outputs, is may sometimes produce non-extracted ones - may it be different casing or a whole different string (or substring) that may bear another semantic meaning. |
|
|
|
### Pretraining |
|
The model was finetuned with 100,000 rss-examples for 3 epochs using Adafactor optimizer with constant learning rate of 5e-5. |
|
|
|
## Evaluation results |
|
Evaluated over few-shot QA in a zero-shot setting (no finetuning on annotated examples): |
|
|
|
|Model \ Dataset| SQuAD |TriviaQA | NaturalQs | NewsQA | SearchQA | HotpotQA | BioASQ | TextbookQA| |
|
|:-------------:|:-----:|:-------:|:---------:|:------:|:--------:|:--------:|:------:|:---------:| |
|
|T5 | 50.4 | 61.7 | 42.1 | 19.2 | 24.0 | 43.3 | 55.5 | 17.8 | |
|
|T5-rss | 71.4 | 69.3 | 57.2 | 43.2 | 29.7 | 59.0 | 65.5 | 39.0 | |
|
|
|
The gap between the two models diminishes as more training examples are introduced, for additional result see the [paper]((https://arxiv.org/abs/2108.05857). |
|
|
|
### BibTeX entry and citation info |
|
```bibtex |
|
@inproceedings{ram-etal-2021-shot, |
|
title = "Few-Shot Question Answering by Pretraining Span Selection", |
|
author = "Ram, Ori and |
|
Kirstain, Yuval and |
|
Berant, Jonathan and |
|
Globerson, Amir and |
|
Levy, Omer", |
|
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", |
|
month = aug, |
|
year = "2021", |
|
address = "Online", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2021.acl-long.239", |
|
doi = "10.18653/v1/2021.acl-long.239", |
|
pages = "3066--3079", |
|
}, |
|
@misc{castel2021optimal, |
|
title={How Optimal is Greedy Decoding for Extractive Question Answering?}, |
|
author={Or Castel and Ori Ram and Avia Efrat and Omer Levy}, |
|
year={2021}, |
|
eprint={2108.05857}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
|
|
``` |
|
|