UW
/

Text Generation
Transformers
Safetensors
English
olmo2

Training code for Tokenizer

#1
by amazingvince - opened

Awesome paper and love the idea of superBPE. I would like to try training my own superBPE tokenizer and was wondering if you guys were planning on sharing the tokenizer training code? The link https://superbpe.github.io/ says coming soon. Any idea on when that might be?

University of Washington org

Thanks for following up β€” the tokenizer training code has been released here! https://github.com/PythonNut/superbpe

alisawuffles changed discussion status to closed

Sign up or log in to comment