Spaces:

Tinkering
/

Pytorch-day-prez

Running

App Files Files Community

Pytorch-day-prez / revamped_index.html

Molbap HF Staff

Upload folder using huggingface_hub

d25266e verified 8 days ago

raw

history blame contribute delete

14.3 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0" />
	<title>PyTorch × Transformers Journey</title>

	<!-- Google Fonts -->
	<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;800&family=Fira+Code:wght@400;600&display=swap" rel="stylesheet" />

	<!-- Reveal.js core & dark theme base -->
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reset.css" />
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reveal.css" />
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/theme/black.css" id="theme" />

	<!-- Highlight.js -->
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/styles/github-dark.min.css" />

	<!-- Animations -->
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/animate.css@4/animate.min.css" />

	<style>
	:root {
	--accent-primary: #ee4c2c; /* PyTorch orange‑red */
	--accent-secondary: #ffb347; /* lighter highlight */
	--bg-gradient-start: #1b1b1b;
	--bg-gradient-end: #242424;
	}
	html, body { font-family: 'Inter', sans-serif; }
	.reveal .slides {
	background: linear-gradient(135deg, var(--bg-gradient-start), var(--bg-gradient-end));
	}
	.reveal h1, .reveal h2, .reveal h3 { color: var(--accent-primary); font-weight: 800; letter-spacing: -0.5px; }
	.reveal pre code { font-family: 'Fira Code', monospace; font-size: 0.75em; }
	.reveal section img, .reveal section svg { border-radius: 1rem; box-shadow: 0 8px 22px rgba(0,0,0,0.4); }
	.fragment.highlight-current-blue.visible { color: var(--accent-secondary) !important; }
	/* slide-density patch */
	.reveal h1 { font-size: 2.6rem; line-height: 1.1; }
	.reveal h2 { font-size: 1.9rem; line-height: 1.15; }
	.reveal h3 { font-size: 1.4rem; line-height: 1.2; }
	.reveal p, .reveal li { font-size: 0.9rem; line-height: 1.35; }
	.reveal pre code { font-size: 0.67em; }
	@media (max-width: 1024px) { .reveal h1{font-size:2.2rem;} .reveal h2{font-size:1.6rem;} }
	.reveal table td, .reveal table th { font-size: 0.85rem; padding: 4px 8px; }
	</style>
	</head>
	<body>
	<div class="reveal">
	<div class="slides">

	<!-- 1 · Opening -->
	<section data-auto-animate>
	<h1 class="animate__animated animate__fadeInDown">PyTorch × Transformers Journey</h1>
	<h3 class="animate__animated animate__fadeInDown animate__delay-1s">Pythonicity, Autodiff & Modularity in Modern AI</h3>
	<p class="animate__animated animate__fadeInUp animate__delay-2s">Pablo Montalvo‑Leroux · ML Engineer @ Hugging Face</p>
	</section>

	<!-- 2 · 2016: Backprop & Birth Pangs -->
	<section>
	<h2>2016‑2018: Backprop & Birth Pangs</h2>
	<ul>
	<li>Hand‑crafted chain‑rule; frameworks such as Theano and CNTK appeared then vanished.</li>
	<li>MLPs → RNNs → LSTMs — until <strong>BERT</strong> detonated the field in 2018.</li>
	<li class="fragment">Reproducibility was painful ✗ — until Transformers met PyTorch ✓.</li>
	</ul>
	</section>

	<!-- 3 · Static vs Dynamic Graphs -->
	<section>
	<h2>Static vs Dynamic Graphs</h2>
	<p class="fragment">Static graphs require you to compile, wait, and cross fingers the bug reproduces.</p>
	<p class="fragment">Dynamic graphs mean you can drop <code>pdb.set_trace()</code> anywhere and continue iterating.</p>
	<p class="fragment"><code>torch.compile</code> gives the best of both worlds: write dynamically, ship something ahead‑of‑time optimised.</p>
	</section>

	<!-- 4 · Dynamic Graphs Enabled Contribution -->
	<section>
	<h2>Dynamic Graphs Enabled Contribution</h2>
	<ul>
	<li>Developers debug at line‑rate — no cold‑start recompiles.</li>
	<li>Pull‑requests remained reproducible overnight, which accelerated trust.</li>
	<li>Static‑graph alternatives stalled and the community consolidated around PyTorch.</li>
	</ul>
	</section>

	<!-- 5 · Paper Tonight → Tweak Tomorrow -->
	<section>
	<h2>Clone the Paper Tonight → Tweak Tomorrow</h2>
	<p>Research cadence is measured in <strong>hours</strong>; any friction kills momentum.</p>
	<ul>
	<li class="fragment">2018: BERT fine‑tuning required printing tensors live rather than recompiling graphs.</li>
	<li class="fragment">Community PRs merged overnight — credibility snowballed for both PyTorch and Transformers.</li>
	</ul>
	</section>

	<!-- 6 · One Model · One File -->
	<section>
	<h2>“One Model · One File” — Why it Matters</h2>
	<pre><code class="language-python" data-trim data-noescape>
	# modeling_bert.py — single source of truth 🗄️
	class BertConfig(PretrainedConfig):
	...

	class BertSelfAttention(nn.Module):
	...

	class BertLayer(nn.Module):
	...

	class BertModel(PreTrainedModel):
	def __init__(self, config):
	super().__init__(config)
	self.embeddings = BertEmbeddings(config)
	self.encoder = nn.ModuleList(
	[BertLayer(config) for _ in range(config.num_hidden_layers)]
	)
	self.init_weights()
	</code></pre>
	<ul>
	<li>All layers, forward pass, and <code>from_pretrained()</code> logic live together.</li>
	<li>No cross‑file inheritance maze — copy to Colab, hack, and run.</li>
	<li>Reviewers diff one file; merge time dropped from days to hours.</li>
	</ul>
	</section>

	<!-- 7 · Transformers Grew With Python -->
	<section>
	<h2>Transformers Grew with Python</h2>
	<ul>
	<li>The library prioritises hackability, which in turn accelerates adoption.</li>
	<li>Python is slow by default, so we lean on compiled CUDA kernels and Triton for raw speed.</li>
	<li>The new <strong>Kernel Hub</strong> means Transformers automatically uses a faster op the moment it is published — no application changes required.</li>
	</ul>
	</section>

	<!-- 8 · Back to Python: Mary Shelley Mode -->
	<section>
	<h2>Back to Python: Modular “Mary Shelley” Mode</h2>
	<p>Compose new blocks via subclassing and selective override.</p>
	<pre><code class="language-python" data-trim data-noescape>
	class LlamaRotaryLoRA(LlamaAttention):
	def __init__(...):
	super().__init__(...)
	self.q_proj = LoRA(self.q_proj) # swap in LoRA
	self.apply_rotary() # keep RoPE
	</code></pre>
	</section>

	<!-- 9 · Logit Debugger -->
	<section>
	<h2>Logit Debugger: Trust but Verify</h2>
	<ul>
	<li>Attach a hook to every <code>nn.Module</code>; dump logits layer‑by‑layer.</li>
	<li>Spot ε‑level drifts — LayerNorm precision, FP16 underflow, etc.</li>
	<li>JSON traces are diffable in CI, so regressions stay caught.</li>
	</ul>
	</section>

	<!-- 10 · DTensor & TP API -->
	<section>
	<h2>DTensor & Tensor‑Parallel API</h2>
	<ul>
	<li>Logical tensor views unlock device‑mesh sharding.</li>
	<li>The <code>tp_plan</code> JSON keeps model code pristine and declarative.</li>
	<li>We regularly validate 100‑billion‑parameter checkpoints inside HF test infra.</li>
	</ul>
	<img data-src="assets/mesh.svg" alt="Device mesh" />
	</section>

	<!-- 11 · Zero‑Config Parallelism -->
	<section>
	<h2>Zero‑Config Parallelism</h2>
	<pre><code class="language-json" data-trim data-noescape>{
	"layer.*.self_attn.q_proj": "colwise",
	"layer.*.self_attn.k_proj": "colwise",
	"layer.*.self_attn.v_proj": "colwise",
	"layer.*.self_attn.o_proj": "rowwise"
	}</code></pre>
	<pre><code class="language-python" data-trim data-noescape>
	def translate_to_torch_parallel_style(style: str):
	if style == "colwise":
	return ColwiseParallel()
	elif style == "rowwise":
	return RowwiseParallel()
	</code></pre>
	<p class="fragment">One JSON file loads a 17‑billion‑parameter Llama‑4 on 8 GPUs; tweak the plan, not the network.</p>
	</section>

	<!-- 12 · Cache Allocator -->
	<section>
	<h2>Load Faster & Stronger: Cache Allocator</h2>
	<p>Zero‑copy weight sharding shaves <strong>15 %</strong> VRAM on A100 while cutting load time below 60 s for a 100‑B model.</p>
	<img data-src="assets/memory_bars.svg" alt="Memory bars" />
	</section>

	<!-- 13 · Modular Transformers: GLM Example -->
	<section>
	<h2>Modular Transformers: GLM by Example</h2>
	<pre><code class="language-python" data-trim>
	class GlmMLP(Phi3MLP):
	pass

	class GlmAttention(LlamaAttention):
	def __init__(self, config, layer_idx=None):
	super().__init__(config, layer_idx)
	self.o_proj = nn.Linear(
	config.num_attention_heads * self.head_dim,
	config.hidden_size,
	bias=False,
	)

	def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
	# Slightly different RoPE
	...

	class GlmForCausalLM(LlamaForCausalLM):
	pass
	</code></pre>
	<p>AST magic expands this 40‑line prototype into a full modelling file, ready for training.</p>
	</section>

	<!-- 14 · Rise of Multimodality -->
	<section>
	<h2>Rise of Multimodality</h2>
	<pre><code class="language-python" data-trim data-noescape>
	processor = AutoProcessor.from_pretrained("Qwen/Qwen3-8B")
	model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen3-8B")
	</code></pre>
	<p class="fragment">Same API across text, vision, and audio: learn once, apply everywhere.</p>
	</section>

	<!-- 15 · Why Python wins -->
	<section>
	<h2>Why Python Wins</h2>
	<ul>
	<li>Low entry barrier attracts newcomers and domain specialists alike.</li>
	<li>High‑level semantics concisely express low‑level intent.</li>
	<li>The C++/Rust back‑end remains accessible for critical paths.</li>
	</ul>
	</section>

	<!-- 16 · Where Python can bite -->
	<section>
	<h2>Where Python can bite 🐍</h2>
	<ul>
	<li class="fragment">Interpreter overhead hurts microkernels (token‑by‑token decoding).</li>
	<li class="fragment">The GIL throttles concurrent host‑side work.</li>
	<li class="fragment">Fresh research code is easy to leave unoptimised.</li>
	</ul>
	<p class="fragment">Mitigations: Triton, compiled custom ops, compile‑time fallbacks, and callable kernels.</p>
	</section>

	<!-- 17 · Kernel Hub -->
	<section>
	<h2>Kernel Hub: Optimised Ops from the Community</h2>
	<p>Kernel Hub lets any Python program <em>download and hot‑load</em> compiled CUDA/C++ kernels directly from the Hugging Face Hub at runtime.</p>
	<ul>
	<li><strong>Portable</strong> – kernels work from arbitrary paths outside <code>PYTHONPATH</code>.</li>
	<li><strong>Unique</strong> – load multiple versions of the same op side‑by‑side in one process.</li>
	<li><strong>Compatible</strong> – every kernel targets all recent PyTorch wheels (CUDA, ROCm, CPU) and C‑library ABIs.</li>
	</ul>
	<p class="fragment">🚀 <strong>Quick start</strong> (requires <code>torch >= 2.5</code>):</p>
	<pre><code class="language-bash" data-trim>pip install kernels</code></pre>
	<pre><code class="language-python" data-trim data-noescape>
	import torch
	from kernels import get_kernel

	# Download optimised kernels from the Hugging Face Hub
	activation = get_kernel("kernels-community/activation")

	x = torch.randn(10, 10, dtype=torch.float16, device="cuda")
	y = torch.empty_like(x)
	activation.gelu_fast(y, x)
	print(y)
	</code></pre>
	<p class="fragment">Same Transformer code — now with a <strong>3× faster</strong> GELU on A100s.</p>
	</section>

	<!-- 18 · API design lessons -->
	<section>
	<h2>API Design Lessons</h2>
	<ul>
	<li>Make easy things obvious, and hard things merely possible.</li>
	<li>Keep the paper‑to‑repository delta minimal for new models.</li>
	<li>Hide sharding mechanics; expose developer intent.</li>
	</ul>
	<p class="fragment">We tune radios without learning RF theory — ML frameworks should feel as frictionless.</p>
	</section>

	<!-- 19 · Model Growth by Modality -->
	<section>
	<h2>Model Growth by Modality</h2>
	<iframe src="model_growth.html" width="100%" height="600" style="border:none;"></iframe>
	</section>

	<!-- 20 · Takeaways -->
	<section>
	<h2>Takeaways & The Future</h2>
	<ul>
	<li>PyTorch and <code>transformers</code> have grown symbiotically for eight years—expect the spiral to continue.</li>
	<li>Pythonicity plus pragmatism keeps the barrier to innovation low.</li>
	<li>Open‑source models are shipping faster, larger, and more multimodal than ever.</li>
	</ul>
	<p><a href="https://huggingface.co/transformers/contribute" target="_blank">hf.co/transformers/contribute</a></p>
	</section>

	</div>
	</div>

	<!-- Reveal.js core -->
	<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/dist/reveal.js"></script>
	<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/plugin/highlight/highlight.js"></script>
	<script src="https://cdn.jsdelivr.net/npm/reveal.js@5/plugin/notes/notes.js"></script>
	<!-- Plotly for interactive charts -->
	<script src="https://cdn.plot.ly/plotly-2.31.1.min.js"></script>
	<script>
	Reveal.initialize({
	hash: true,
	slideNumber: true,
	transition: 'slide',
	backgroundTransition: 'convex',
	plugins: [ RevealHighlight, RevealNotes ]
	});
	</script>
	</body>
	</html>