Pablo Montalvo‑Leroux · ML Engineer @ Hugging Face
Static graphs and transformers existence are inversely correlated
torch.compile ≈ sweet spot: author in dynamic, ship in static.
Research cadence ≈ hours; any friction kills momentum.
class BertEmbeddings(nn.Module):
…
class BertModel(BertPreTrainedModel):
…
Atomic PRs → faster reviews → community velocity.
Compose new blocks via subclass & override.
class LlamaRotaryLoRA(LlamaAttention):
def __init__(…):
super().__init__(…)
self.q_proj = LoRA(self.q_proj)
self.apply_rotary()
tp_plan
keeps module code intact0‑copy weight partitioning · 15 % RAM cut on A100
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-8B")
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen3-8B")
Same API across text · vision · audio.
Mitigations: Triton, compiled custom ops, compile‑time fallback, and callable kernels!
New initiative
https://huggingface.co/kernels-community
We want to facilitate adoption. How does a radio work? Would you know how to tune it?
How does a computer work? Should you know how it does to be able to navigate the web?