google
/

gemma-scope-2b-pt-transcoders

SAELens

Model card Files Files and versions Community

ArthurConmyGDM commited on 5 days ago

Commit

50eec2f

verified ·

1 Parent(s): cd73048

Clarify input and output site

Browse files

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -13,9 +13,23 @@ See our [landing page](https://huggingface.co/google/gemma-scope) for details on
 - `gemma-scope-`: See 1.
 - `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
-- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers: see https://arxiv.org/abs/2406.11944
-# 3. Point of Contact
 Point of contact: Arthur Conmy

 - `gemma-scope-`: See 1.
 - `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
+- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers. For more details, see https://arxiv.org/abs/2406.11944 and the clarification below.
+# 3. Transcoder Input/Output Clarification
+There has been some discussion regarding the precise input and output points for these transcoders relative to the normalization layers in Gemma 2.
+As detailed in the Gemma Scope paper ([arXiv:2408.05147v2, page 18, "Language model technical details"](https://arxiv.org/pdf/2408.05147v2#page=18)):
+> "We fold the pre-MLP RMS norm gain parameters ([Zhang and Sennrich (2019)](https://arxiv.org/abs/1910.07467), Section 3) into the MLP input matrices, as described in ([Gurnee et al. (2024)](https://arxiv.org/abs/2401.12181), Appendix A.1) and then train the transcoder on input activations **just after the pre-MLP RMSNorm**, to reconstruct the **MLP sublayer’s output** as the target activations."
+To further clarify "MLP sublayer's output":
+If the state after the attention layer is `post_att_resid`, and the transformer block update is written as `post_mlp_resid = post_att_resid + mlp_output`, then this transcoder aims to reconstruct this `mlp_output` value, i.e. this `mlp_output` is after the post-MLP RMSNorm in Gemma.
+Figure 12 in the paper provides TransformerLens code demonstrating how to load Gemma 2 2B with the necessary weight folding for using these transcoders.
+# 4. Point of Contact
 Point of contact: Arthur Conmy