Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Paper
•
2402.12030
•
Published
The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers.