Results on Wikitext-2 with GPT2 don't match paper
#1
by
brucardoso2
- opened
Hey, I tested the example code and compared the results achieved on https://huggingface.co/docs/transformers/perplexity using gpt2
and wikitext-2-raw-v1
.
- The values reported on the post range from 16.44 to 19.64 (depending on the size of the stride)
- The value achieved using the this lib is 546.62
The difference is quite bit. Am I missing something?
Code:
import datasets
import evaluate
input_texts = datasets.load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"]
input_texts = [s for s in input_texts if s!='']
perplexity = evaluate.load("perplexity", module_type="measurement")
results = perplexity.compute(model_id='gpt2', data=input_texts)
print(results['mean_perplexity'])
This comment has been hidden