Nicolas-BZRD 's Collections

LLMs Distillation

The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers.