Clarify input and output site
Browse files
README.md
CHANGED
@@ -13,9 +13,23 @@ See our [landing page](https://huggingface.co/google/gemma-scope) for details on
|
|
13 |
|
14 |
- `gemma-scope-`: See 1.
|
15 |
- `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
|
16 |
-
- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers
|
17 |
|
18 |
-
# 3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
Point of contact: Arthur Conmy
|
21 |
|
|
|
13 |
|
14 |
- `gemma-scope-`: See 1.
|
15 |
- `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
|
16 |
+
- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers. For more details, see https://arxiv.org/abs/2406.11944 and the clarification below.
|
17 |
|
18 |
+
# 3. Transcoder Input/Output Clarification
|
19 |
+
|
20 |
+
There has been some discussion regarding the precise input and output points for these transcoders relative to the normalization layers in Gemma 2.
|
21 |
+
|
22 |
+
As detailed in the Gemma Scope paper ([arXiv:2408.05147v2, page 18, "Language model technical details"](https://arxiv.org/pdf/2408.05147v2#page=18)):
|
23 |
+
|
24 |
+
> "We fold the pre-MLP RMS norm gain parameters ([Zhang and Sennrich (2019)](https://arxiv.org/abs/1910.07467), Section 3) into the MLP input matrices, as described in ([Gurnee et al. (2024)](https://arxiv.org/abs/2401.12181), Appendix A.1) and then train the transcoder on input activations **just after the pre-MLP RMSNorm**, to reconstruct the **MLP sublayer’s output** as the target activations."
|
25 |
+
|
26 |
+
To further clarify "MLP sublayer's output":
|
27 |
+
|
28 |
+
If the state after the attention layer is `post_att_resid`, and the transformer block update is written as `post_mlp_resid = post_att_resid + mlp_output`, then this transcoder aims to reconstruct this `mlp_output` value, i.e. this `mlp_output` is after the post-MLP RMSNorm in Gemma.
|
29 |
+
|
30 |
+
Figure 12 in the paper provides TransformerLens code demonstrating how to load Gemma 2 2B with the necessary weight folding for using these transcoders.
|
31 |
+
|
32 |
+
# 4. Point of Contact
|
33 |
|
34 |
Point of contact: Arthur Conmy
|
35 |
|