t5-v1_1-large-rss / README.md

Update README.md

5cf6ecc almost 4 years ago

4.88 kB

	---
	language: en
	datasets:
	- c4
	- wikipedia
	metrics:
	- f1
	---

	# T5-V1.1-large-rss
	This model is [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) finetuned on RSS dataset. The model was finetuned as part of
	["How Optimal is Greedy Decoding for Extractive Question Answering?"](https://arxiv.org/abs/2108.05857), while the RSS pretraining method was introduced in [this paper](https://arxiv.org/pdf/2101.00438.pdf).

	## Model description
	The original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) was only pre-trained on C4 excluding any supervised training. Our version is further trained on Rucurrent Span Selection scheme (RSS), using a sample from the dataset used to pretrain [Splinter](tau/splinter-large):
	* contexts with a span occurring more than once are detected
	* a single instance of the recurring span is maked
	* the model is trained (teacher forcing) to predict the masked span
	This training scheme naturally matches the extractive question answering task.

	During training time, the masked span is replaced with `<extra_id_0>` and the labels are formatted as `<extra_id_0>span<extra_id_0>`. Unlike [Splinter](tau/splinter-large), only one span is mask at a time.

	## Intended uses & limitations
	This model naturally fits tasks where a span from a context is intended to be copied, like extractive question answering.
	This checkpoint is primarily aimed to be used in zero-shot setting - further fine-tuning it on an annotated dataset gives equal results to those of the original T5-v1.1-large.

	### How to use
	You can use this model directly but it is recommended to format the input to be aligned with that of the training scheme and as a text-question context:
	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
	model = AutoModelForSeq2SeqLM.from_pretrained('tau/t5-v1_1-large-rss')
	tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-rss')

	passage = 'Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. '
	question = 'When was Obama inaugurated?'
	text = f'Text: {passage}.\nQuestion: {question}\nAnswer:{tokenizer.additional_special_tokens[0]}.'
	encoded_input = tokenizer(text, return_tensors='pt')
	output_ids = model.generate(input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask,
	eos_token_id=tokenizer.additional_special_tokens_ids[1], num_beams=1, max_length=512, min_length=3)
	tokenizer.decode(output_ids[0])
	```
	The generated answer is then `"<pad><extra_id_0> 2009<extra_id_1>"`, while the one generated by the original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) is `"<pad><extra_id_0> On January 20, 2009<extra_id_1>"` - a correct yet non-extractive answer.

	### Limitations and bias
	Although using the model with greedy decoding tends toward extracted outputs, is may sometimes produce non-extracted ones - may it be different casing or a whole different string (or substring) that may bear another semantic meaning.

	### Pretraining
	The model was finetuned with 100,000 rss-examples for 3 epochs using Adafactor optimizer with constant learning rate of 5e-5.

	## Evaluation results
	Evaluated over few-shot QA in a zero-shot setting (no finetuning on annotated examples):

	\|Model \ Dataset\| SQuAD \|TriviaQA \| NaturalQs \| NewsQA \| SearchQA \| HotpotQA \| BioASQ \| TextbookQA\|
	\|:-------------:\|:-----:\|:-------:\|:---------:\|:------:\|:--------:\|:--------:\|:------:\|:---------:\|
	\|T5 \| 50.4 \| 61.7 \| 42.1 \| 19.2 \| 24.0 \| 43.3 \| 55.5 \| 17.8 \|
	\|T5-rss \| 71.4 \| 69.3 \| 57.2 \| 43.2 \| 29.7 \| 59.0 \| 65.5 \| 39.0 \|

	The gap between the two models diminishes as more training examples are introduced, for additional result see the [paper]((https://arxiv.org/abs/2108.05857).

	### BibTeX entry and citation info
	```bibtex
	@inproceedings{ram-etal-2021-shot,
	title = "Few-Shot Question Answering by Pretraining Span Selection",
	author = "Ram, Ori and
	Kirstain, Yuval and
	Berant, Jonathan and
	Globerson, Amir and
	Levy, Omer",
	booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
	month = aug,
	year = "2021",
	address = "Online",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2021.acl-long.239",
	doi = "10.18653/v1/2021.acl-long.239",
	pages = "3066--3079",
	},
	@misc{castel2021optimal,
	title={How Optimal is Greedy Decoding for Extractive Question Answering?},
	author={Or Castel and Ori Ram and Avia Efrat and Omer Levy},
	year={2021},
	eprint={2108.05857},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}

	```