SAELens
ArthurConmyGDM commited on
Commit
50eec2f
·
verified ·
1 Parent(s): cd73048

Clarify input and output site

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -13,9 +13,23 @@ See our [landing page](https://huggingface.co/google/gemma-scope) for details on
13
 
14
  - `gemma-scope-`: See 1.
15
  - `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
16
- - `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers: see https://arxiv.org/abs/2406.11944
17
 
18
- # 3. Point of Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  Point of contact: Arthur Conmy
21
 
 
13
 
14
  - `gemma-scope-`: See 1.
15
  - `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
16
+ - `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers. For more details, see https://arxiv.org/abs/2406.11944 and the clarification below.
17
 
18
+ # 3. Transcoder Input/Output Clarification
19
+
20
+ There has been some discussion regarding the precise input and output points for these transcoders relative to the normalization layers in Gemma 2.
21
+
22
+ As detailed in the Gemma Scope paper ([arXiv:2408.05147v2, page 18, "Language model technical details"](https://arxiv.org/pdf/2408.05147v2#page=18)):
23
+
24
+ > "We fold the pre-MLP RMS norm gain parameters ([Zhang and Sennrich (2019)](https://arxiv.org/abs/1910.07467), Section 3) into the MLP input matrices, as described in ([Gurnee et al. (2024)](https://arxiv.org/abs/2401.12181), Appendix A.1) and then train the transcoder on input activations **just after the pre-MLP RMSNorm**, to reconstruct the **MLP sublayer’s output** as the target activations."
25
+
26
+ To further clarify "MLP sublayer's output":
27
+
28
+ If the state after the attention layer is `post_att_resid`, and the transformer block update is written as `post_mlp_resid = post_att_resid + mlp_output`, then this transcoder aims to reconstruct this `mlp_output` value, i.e. this `mlp_output` is after the post-MLP RMSNorm in Gemma.
29
+
30
+ Figure 12 in the paper provides TransformerLens code demonstrating how to load Gemma 2 2B with the necessary weight folding for using these transcoders.
31
+
32
+ # 4. Point of Contact
33
 
34
  Point of contact: Arthur Conmy
35