|
--- |
|
license: cc-by-4.0 |
|
library_name: saelens |
|
--- |
|
|
|
# 1. Gemma Scope |
|
|
|
Gemma Scope is a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals. |
|
|
|
See our [landing page](https://huggingface.co/google/gemma-scope) for details on the whole suite. This is a specific set of SAEs: |
|
|
|
# 2. What Is `gemma-scope-2b-pt-transcoders`? |
|
|
|
- `gemma-scope-`: See 1. |
|
- `2b-pt-`: These SAEs were trained on Gemma v2 2B base model. |
|
- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers. For more details, see https://arxiv.org/abs/2406.11944 and the clarification below. |
|
|
|
# 3. Transcoder Input/Output Clarification |
|
|
|
There has been some discussion regarding the precise input and output points for these transcoders relative to the normalization layers in Gemma 2. |
|
|
|
As detailed in the Gemma Scope paper ([arXiv:2408.05147v2, page 18, "Language model technical details"](https://arxiv.org/pdf/2408.05147v2#page=18)): |
|
|
|
> "We fold the pre-MLP RMS norm gain parameters ([Zhang and Sennrich (2019)](https://arxiv.org/abs/1910.07467), Section 3) into the MLP input matrices, as described in ([Gurnee et al. (2024)](https://arxiv.org/abs/2401.12181), Appendix A.1) and then train the transcoder on input activations **just after the pre-MLP RMSNorm**, to reconstruct the **MLP sublayer’s output** as the target activations." |
|
|
|
To further clarify "MLP sublayer's output": |
|
|
|
If the state after the attention layer is `post_att_resid`, and the transformer block update is written as `post_mlp_resid = post_att_resid + mlp_output`, then this transcoder aims to reconstruct this `mlp_output` value, i.e. this `mlp_output` is after the post-MLP RMSNorm in Gemma. |
|
|
|
Figure 12 in the paper provides TransformerLens code demonstrating how to load Gemma 2 2B with the necessary weight folding for using these transcoders. |
|
|
|
# 4. Point of Contact |
|
|
|
Point of contact: Arthur Conmy |
|
|
|
Contact by email: |
|
|
|
```python |
|
''.join(list('moc.elgoog@ymnoc')[::-1]) |
|
``` |
|
|
|
HuggingFace account: |
|
https://huggingface.co/ArthurConmyGDM |
|
|