google
/

gemma-scope-2b-pt-transcoders

Model card Files Files and versions Community

gemma-scope-2b-pt-transcoders / README.md

ArthurConmyGDM's picture

Clarify input and output site

50eec2f verified 5 days ago

|

history blame contribute delete

2.3 kB

	---
	license: cc-by-4.0
	library_name: saelens
	---

	# 1. Gemma Scope

	Gemma Scope is a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

	See our [landing page](https://huggingface.co/google/gemma-scope) for details on the whole suite. This is a specific set of SAEs:

	# 2. What Is `gemma-scope-2b-pt-transcoders`?

	- `gemma-scope-`: See 1.
	- `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
	- `transcoders`: These SAEs are transcoders: they were trained to reconstruct the output of MLP sublayers from the input to the MLP sublayers. For more details, see https://arxiv.org/abs/2406.11944 and the clarification below.

	# 3. Transcoder Input/Output Clarification

	There has been some discussion regarding the precise input and output points for these transcoders relative to the normalization layers in Gemma 2.

	As detailed in the Gemma Scope paper ([arXiv:2408.05147v2, page 18, "Language model technical details"](https://arxiv.org/pdf/2408.05147v2#page=18)):

	> "We fold the pre-MLP RMS norm gain parameters ([Zhang and Sennrich (2019)](https://arxiv.org/abs/1910.07467), Section 3) into the MLP input matrices, as described in ([Gurnee et al. (2024)](https://arxiv.org/abs/2401.12181), Appendix A.1) and then train the transcoder on input activations just after the pre-MLP RMSNorm, to reconstruct the MLP sublayer’s output as the target activations."

	To further clarify "MLP sublayer's output":

	If the state after the attention layer is `post_att_resid`, and the transformer block update is written as `post_mlp_resid = post_att_resid + mlp_output`, then this transcoder aims to reconstruct this `mlp_output` value, i.e. this `mlp_output` is after the post-MLP RMSNorm in Gemma.

	Figure 12 in the paper provides TransformerLens code demonstrating how to load Gemma 2 2B with the necessary weight folding for using these transcoders.

	# 4. Point of Contact

	Point of contact: Arthur Conmy

	Contact by email:

	```python
	''.join(list('moc.elgoog@ymnoc')[::-1])
	```

	HuggingFace account:
	https://huggingface.co/ArthurConmyGDM