Pytorch-day-prez / revamped_index.html
Molbap's picture
Molbap HF Staff
Upload folder using huggingface_hub
d25266e verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>PyTorch × Transformers Journey</title>
<!-- Google Fonts -->
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;800&family=Fira+Code:wght@400;600&display=swap" rel="stylesheet" />
<!-- Reveal.js core & dark theme base -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reset.css" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reveal.css" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/theme/black.css" id="theme" />
<!-- Highlight.js -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/styles/github-dark.min.css" />
<!-- Animations -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/animate.css@4/animate.min.css" />
<style>
:root {
--accent-primary: #ee4c2c; /* PyTorch orange‑red */
--accent-secondary: #ffb347; /* lighter highlight */
--bg-gradient-start: #1b1b1b;
--bg-gradient-end: #242424;
}
html, body { font-family: 'Inter', sans-serif; }
.reveal .slides {
background: linear-gradient(135deg, var(--bg-gradient-start), var(--bg-gradient-end));
}
.reveal h1, .reveal h2, .reveal h3 { color: var(--accent-primary); font-weight: 800; letter-spacing: -0.5px; }
.reveal pre code { font-family: 'Fira Code', monospace; font-size: 0.75em; }
.reveal section img, .reveal section svg { border-radius: 1rem; box-shadow: 0 8px 22px rgba(0,0,0,0.4); }
.fragment.highlight-current-blue.visible { color: var(--accent-secondary) !important; }
/* slide-density patch */
.reveal h1 { font-size: 2.6rem; line-height: 1.1; }
.reveal h2 { font-size: 1.9rem; line-height: 1.15; }
.reveal h3 { font-size: 1.4rem; line-height: 1.2; }
.reveal p, .reveal li { font-size: 0.9rem; line-height: 1.35; }
.reveal pre code { font-size: 0.67em; }
@media (max-width: 1024px) { .reveal h1{font-size:2.2rem;} .reveal h2{font-size:1.6rem;} }
.reveal table td, .reveal table th { font-size: 0.85rem; padding: 4px 8px; }
</style>
</head>
<body>
<div class="reveal">
<div class="slides">
<!-- 1 · Opening -->
<section data-auto-animate>
<h1 class="animate__animated animate__fadeInDown">PyTorch × Transformers Journey</h1>
<h3 class="animate__animated animate__fadeInDown animate__delay-1s">Pythonicity, Autodiff &amp; Modularity in Modern AI</h3>
<p class="animate__animated animate__fadeInUp animate__delay-2s">Pablo Montalvo‑Leroux · ML Engineer @ Hugging Face</p>
</section>
<!-- 2 · 2016: Backprop & Birth Pangs -->
<section>
<h2>2016‑2018: Backprop &amp; Birth Pangs</h2>
<ul>
<li>Hand‑crafted chain‑rule; frameworks such as Theano and CNTK appeared then vanished.</li>
<li>MLPs → RNNs → LSTMs — until <strong>BERT</strong> detonated the field in 2018.</li>
<li class="fragment">Reproducibility was painful ✗ — until Transformers met PyTorch ✓.</li>
</ul>
</section>
<!-- 3 · Static vs Dynamic Graphs -->
<section>
<h2>Static vs Dynamic Graphs</h2>
<p class="fragment">Static graphs require you to compile, wait, and cross fingers the bug reproduces.</p>
<p class="fragment">Dynamic graphs mean you can drop <code>pdb.set_trace()</code> anywhere and continue iterating.</p>
<p class="fragment"><code>torch.compile</code> gives the best of both worlds: write dynamically, ship something ahead‑of‑time optimised.</p>
</section>
<!-- 4 · Dynamic Graphs Enabled Contribution -->
<section>
<h2>Dynamic Graphs Enabled Contribution</h2>
<ul>
<li>Developers debug at line‑rate — no cold‑start recompiles.</li>
<li>Pull‑requests remained reproducible overnight, which accelerated trust.</li>
<li>Static‑graph alternatives stalled and the community consolidated around PyTorch.</li>
</ul>
</section>
<!-- 5 · Paper Tonight → Tweak Tomorrow -->
<section>
<h2>Clone the Paper Tonight → Tweak Tomorrow</h2>
<p>Research cadence is measured in <strong>hours</strong>; any friction kills momentum.</p>
<ul>
<li class="fragment">2018: BERT fine‑tuning required printing tensors live rather than recompiling graphs.</li>
<li class="fragment">Community PRs merged overnight — credibility snowballed for both PyTorch and Transformers.</li>
</ul>
</section>
<!-- 6 · One Model · One File -->
<section>
<h2>“One Model · One File” — Why it Matters</h2>
<pre><code class="language-python" data-trim data-noescape>
# modeling_bert.py — single source of truth 🗄️
class BertConfig(PretrainedConfig):
...
class BertSelfAttention(nn.Module):
...
class BertLayer(nn.Module):
...
class BertModel(PreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.embeddings = BertEmbeddings(config)
self.encoder = nn.ModuleList(
[BertLayer(config) for _ in range(config.num_hidden_layers)]
)
self.init_weights()
</code></pre>
<ul>
<li>All layers, forward pass, and <code>from_pretrained()</code> logic live together.</li>
<li>No cross‑file inheritance maze — copy to Colab, hack, and run.</li>
<li>Reviewers diff one file; merge time dropped from days to hours.</li>
</ul>
</section>
<!-- 7 · Transformers Grew With Python -->
<section>
<h2>Transformers Grew with Python</h2>
<ul>
<li>The library prioritises hackability, which in turn accelerates adoption.</li>
<li>Python is slow by default, so we lean on compiled CUDA kernels and Triton for raw speed.</li>
<li>The new <strong>Kernel Hub</strong> means Transformers automatically uses a faster op the moment it is published — no application changes required.</li>
</ul>
</section>
<!-- 8 · Back to Python: Mary Shelley Mode -->
<section>
<h2>Back to Python: Modular “Mary Shelley” Mode</h2>
<p>Compose new blocks via subclassing and selective override.</p>
<pre><code class="language-python" data-trim data-noescape>
class LlamaRotaryLoRA(LlamaAttention):
def __init__(...):
super().__init__(...)
self.q_proj = LoRA(self.q_proj) # swap in LoRA
self.apply_rotary() # keep RoPE
</code></pre>
</section>
<!-- 9 · Logit Debugger -->
<section>
<h2>Logit Debugger: Trust but Verify</h2>
<ul>
<li>Attach a hook to every <code>nn.Module</code>; dump logits layer‑by‑layer.</li>
<li>Spot ε‑level drifts — LayerNorm precision, FP16 underflow, etc.</li>
<li>JSON traces are diffable in CI, so regressions stay caught.</li>
</ul>
</section>
<!-- 10 · DTensor & TP API -->
<section>
<h2>DTensor & Tensor‑Parallel API</h2>
<ul>
<li>Logical tensor views unlock device‑mesh sharding.</li>
<li>The <code>tp_plan</code> JSON keeps model code pristine and declarative.</li>
<li>We regularly validate 100‑billion‑parameter checkpoints inside HF test infra.</li>
</ul>
<img data-src="assets/mesh.svg" alt="Device mesh" />
</section>
<!-- 11 · Zero‑Config Parallelism -->
<section>
<h2>Zero‑Config Parallelism</h2>
<pre><code class="language-json" data-trim data-noescape>{
"layer.*.self_attn.q_proj": "colwise",
"layer.*.self_attn.k_proj": "colwise",
"layer.*.self_attn.v_proj": "colwise",
"layer.*.self_attn.o_proj": "rowwise"
}</code></pre>
<pre><code class="language-python" data-trim data-noescape>
def translate_to_torch_parallel_style(style: str):
if style == "colwise":
return ColwiseParallel()
elif style == "rowwise":
return RowwiseParallel()
</code></pre>
<p class="fragment">One JSON file loads a 17‑billion‑parameter Llama‑4 on 8 GPUs; tweak the plan, not the network.</p>
</section>
<!-- 12 · Cache Allocator -->
<section>
<h2>Load Faster &amp; Stronger: Cache Allocator</h2>
<p>Zero‑copy weight sharding shaves <strong>15 %</strong> VRAM on A100 while cutting load time below 60 s for a 100‑B model.</p>
<img data-src="assets/memory_bars.svg" alt="Memory bars" />
</section>
<!-- 13 · Modular Transformers: GLM Example -->
<section>
<h2>Modular Transformers: GLM by Example</h2>
<pre><code class="language-python" data-trim>
class GlmMLP(Phi3MLP):
pass
class GlmAttention(LlamaAttention):
def __init__(self, config, layer_idx=None):
super().__init__(config, layer_idx)
self.o_proj = nn.Linear(
config.num_attention_heads * self.head_dim,
config.hidden_size,
bias=False,
)
def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
# Slightly different RoPE
...
class GlmForCausalLM(LlamaForCausalLM):
pass
</code></pre>
<p>AST magic expands this 40‑line prototype into a full modelling file, ready for training.</p>
</section>
<!-- 14 · Rise of Multimodality -->
<section>
<h2>Rise of Multimodality</h2>
<pre><code class="language-python" data-trim data-noescape>
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-8B")
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen3-8B")
</code></pre>
<p class="fragment">Same API across text, vision, and audio: learn once, apply everywhere.</p>
</section>
<!-- 15 · Why Python wins -->
<section>
<h2>Why Python Wins</h2>
<ul>
<li>Low entry barrier attracts newcomers and domain specialists alike.</li>
<li>High‑level semantics concisely express low‑level intent.</li>
<li>The C++/Rust back‑end remains accessible for critical paths.</li>
</ul>
</section>
<!-- 16 · Where Python can bite -->
<section>
<h2>Where Python can bite 🐍</h2>
<ul>
<li class="fragment">Interpreter overhead hurts microkernels (token‑by‑token decoding).</li>
<li class="fragment">The GIL throttles concurrent host‑side work.</li>
<li class="fragment">Fresh research code is easy to leave unoptimised.</li>
</ul>
<p class="fragment">Mitigations: Triton, compiled custom ops, compile‑time fallbacks, and callable kernels.</p>
</section>
<!-- 17 · Kernel Hub -->
<section>
<h2>Kernel Hub: Optimised Ops from the Community</h2>
<p>Kernel Hub lets any Python program <em>download and hot‑load</em> compiled CUDA/C++ kernels directly from the Hugging Face Hub at runtime.</p>
<ul>
<li><strong>Portable</strong> – kernels work from arbitrary paths outside <code>PYTHONPATH</code>.</li>
<li><strong>Unique</strong> – load multiple versions of the same op side‑by‑side in one process.</li>
<li><strong>Compatible</strong> – every kernel targets all recent PyTorch wheels (CUDA, ROCm, CPU) and C‑library ABIs.</li>
</ul>
<p class="fragment">🚀 <strong>Quick start</strong> (requires <code>torch >= 2.5</code>):</p>
<pre><code class="language-bash" data-trim>pip install kernels</code></pre>
<pre><code class="language-python" data-trim data-noescape>
import torch
from kernels import get_kernel
# Download optimised kernels from the Hugging Face Hub
activation = get_kernel("kernels-community/activation")
x = torch.randn(10, 10, dtype=torch.float16, device="cuda")
y = torch.empty_like(x)
activation.gelu_fast(y, x)
print(y)
</code></pre>
<p class="fragment">Same Transformer code — now with a <strong>3× faster</strong> GELU on A100s.</p>
</section>
<!-- 18 · API design lessons -->
<section>
<h2>API Design Lessons</h2>
<ul>
<li>Make easy things obvious, and hard things merely possible.</li>
<li>Keep the paper‑to‑repository delta minimal for new models.</li>
<li>Hide sharding mechanics; expose developer intent.</li>
</ul>
<p class="fragment">We tune radios without learning RF theory — ML frameworks should feel as frictionless.</p>
</section>
<!-- 19 · Model Growth by Modality -->
<section>
<h2>Model Growth by Modality</h2>
<iframe src="model_growth.html" width="100%" height="600" style="border:none;"></iframe>
</section>
<!-- 20 · Takeaways -->
<section>
<h2>Takeaways &amp; The Future</h2>
<ul>
<li>PyTorch and <code>transformers</code> have grown symbiotically for eight years—expect the spiral to continue.</li>
<li>Pythonicity plus pragmatism keeps the barrier to innovation low.</li>
<li>Open‑source models are shipping faster, larger, and more multimodal than ever.</li>
</ul>
<p><a href="https://huggingface.co/transformers/contribute" target="_blank">hf.co/transformers/contribute</a></p>
</section>
</div>
</div>
<!-- Reveal.js core -->
<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reveal.js"></script>
<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/plugin/highlight/highlight.js"></script>
<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/plugin/notes/notes.js"></script>
<!-- Plotly for interactive charts -->
<script src="https://cdn.plot.ly/plotly-2.31.1.min.js"></script>
<script>
Reveal.initialize({
hash: true,
slideNumber: true,
transition: 'slide',
backgroundTransition: 'convex',
plugins: [ RevealHighlight, RevealNotes ]
});
</script>
</body>
</html>