This is a [Unigram tokenizer](https://huggingface.co/course/chapter6/7?fw=pt) trained on the [Wikitext dataset](https://huggingface.co/datasets/wikitext). Refer to the `train_unigram.py` script within this repository to know how it was trained.