Transformers documentation

カスタムレイヤーとユーティリティ

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.51.3).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

カスタムレイヤーとユーティリティ

このページには、ライブラリで使用されるすべてのカスタム レイヤーと、モデリングに提供されるユーティリティ関数がリストされます。

これらのほとんどは、ライブラリ内のモデルのコードを研究する場合にのみ役に立ちます。

Pytorch custom modules

class transformers.Conv1D

< >

( nf nx )

Parameters

  • nf (int) — The number of output features.
  • nx (int) — The number of input features.

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

PyTorch Helper Functions

transformers.apply_chunking_to_forward

< >

( forward_fn: Callable[..., torch.Tensor] chunk_size: int chunk_dim: int *input_tensors ) torch.Tensor

Parameters

  • forward_fn (Callable[..., torch.Tensor]) — The forward function of the model.
  • chunk_size (int) — The chunk size of a chunked tensor: num_chunks = len(input_tensors[0]) / chunk_size.
  • chunk_dim (int) — The dimension over which the input_tensors should be chunked.
  • input_tensors (Tuple[torch.Tensor]) — The input tensors of forward_fn which will be chunked

Returns

torch.Tensor

A tensor with the same shape as the forward_fn would have given if applied`.

This function chunks the input_tensors into smaller input tensor parts of size chunk_size over the dimension chunk_dim. It then applies a layer forward_fn to each chunk independently to save memory.

If the forward_fn is independent across the chunk_dim this function will yield the same result as directly applying forward_fn to input_tensors.

Examples:

# rename the usual forward() fn to forward_chunk()
def forward_chunk(self, hidden_states):
    hidden_states = self.decoder(hidden_states)
    return hidden_states


# implement a chunked forward function
def forward(self, hidden_states):
    return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)

transformers.pytorch_utils.find_pruneable_heads_and_indices

< >

( heads: list[int] n_heads: int head_size: int already_pruned_heads: set[int] ) Tuple[Set[int], torch.LongTensor]

Parameters

  • heads (List[int]) — List of the indices of heads to prune.
  • n_heads (int) — The number of heads in the model.
  • head_size (int) — The size of each head.
  • already_pruned_heads (Set[int]) — A set of already pruned heads.

Returns

Tuple[Set[int], torch.LongTensor]

A tuple with the indices of heads to prune taking already_pruned_heads into account and the indices of rows/columns to keep in the layer weight.

Finds the heads and their indices taking already_pruned_heads into account.

transformers.prune_layer

< >

( layer: nn.Linear | Conv1D index: torch.LongTensor dim: int | None = None ) torch.nn.Linear or Conv1D

Parameters

  • layer (Union[torch.nn.Linear, Conv1D]) — The layer to prune.
  • index (torch.LongTensor) — The indices to keep in the layer.
  • dim (int, optional) — The dimension on which to keep the indices.

Returns

torch.nn.Linear or Conv1D

The pruned layer as a new layer with requires_grad=True.

Prune a Conv1D or linear layer to keep only entries in index.

Used to remove heads.

transformers.pytorch_utils.prune_conv1d_layer

< >

( layer: Conv1D index: torch.LongTensor dim: int = 1 ) Conv1D

Parameters

  • layer (Conv1D) — The layer to prune.
  • index (torch.LongTensor) — The indices to keep in the layer.
  • dim (int, optional, defaults to 1) — The dimension on which to keep the indices.

Returns

Conv1D

The pruned layer as a new layer with requires_grad=True.

Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights are transposed.

Used to remove heads.

transformers.pytorch_utils.prune_linear_layer

< >

( layer: nn.Linear index: torch.LongTensor dim: int = 0 ) torch.nn.Linear

Parameters

  • layer (torch.nn.Linear) — The layer to prune.
  • index (torch.LongTensor) — The indices to keep in the layer.
  • dim (int, optional, defaults to 0) — The dimension on which to keep the indices.

Returns

torch.nn.Linear

The pruned layer as a new layer with requires_grad=True.

Prune a linear layer to keep only entries in index.

Used to remove heads.

TensorFlow custom layers

class transformers.modeling_tf_utils.TFConv1D

< >

( nf nx initializer_range = 0.02 **kwargs )

Parameters

  • nf (int) — The number of output features.
  • nx (int) — The number of input features.
  • initializer_range (float, optional, defaults to 0.02) — The standard deviation to use to initialize the weights.
  • kwargs (Dict[str, Any], optional) — Additional keyword arguments passed along to the __init__ of keras.layers.Layer.

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

class transformers.TFSequenceSummary

< >

( config: PretrainedConfig initializer_range: float = 0.02 **kwargs )

Parameters

  • config (PretrainedConfig) — The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):

    • summary_type (str) — The method to use to make this summary. Accepted values are:

      • "last" — Take the last token hidden state (like XLNet)
      • "first" — Take the first token hidden state (like Bert)
      • "mean" — Take the mean of all tokens hidden states
      • "cls_index" — Supply a Tensor of classification token position (GPT/GPT-2)
      • "attn" — Not implemented now, use multi-head attention
    • summary_use_proj (bool) — Add a projection after the vector extraction.

    • summary_proj_to_labels (bool) — If True, the projection outputs to config.num_labels classes (otherwise to config.hidden_size).

    • summary_activation (Optional[str]) — Set to "tanh" to add a tanh activation to the output, another string or None will add no activation.

    • summary_first_dropout (float) — Optional dropout probability before the projection and activation.

    • summary_last_dropout (float)— Optional dropout probability after the projection and activation.

  • initializer_range (float, optional, defaults to 0.02) — The standard deviation to use to initialize the weights.
  • kwargs (Dict[str, Any], optional) — Additional keyword arguments passed along to the __init__ of keras.layers.Layer.

Compute a single vector summary of a sequence hidden states.

TensorFlow loss functions

class transformers.modeling_tf_utils.TFCausalLanguageModelingLoss

< >

( )

Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss

< >

( )

Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMultipleChoiceLoss

< >

( )

Loss function suitable for multiple choice tasks.

class transformers.modeling_tf_utils.TFQuestionAnsweringLoss

< >

( )

Loss function suitable for question answering.

class transformers.modeling_tf_utils.TFSequenceClassificationLoss

< >

( )

Loss function suitable for sequence classification.

class transformers.modeling_tf_utils.TFTokenClassificationLoss

< >

( )

Loss function suitable for token classification.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

TensorFlow Helper Functions

transformers.modeling_tf_utils.get_initializer

< >

( initializer_range: float = 0.02 ) keras.initializers.TruncatedNormal

Parameters

  • initializer_range (float, defaults to 0.02) — Standard deviation of the initializer range.

Returns

keras.initializers.TruncatedNormal

The truncated normal initializer.

Creates a keras.initializers.TruncatedNormal with the given range.

transformers.modeling_tf_utils.keras_serializable

< >

( )

Parameters

  • cls (a keras.layers.Layers subclass) — Typically a TF.MainLayer class in this project, in general must accept a config argument to its initializer.

Decorate a Keras Layer class to support Keras serialization.

This is done by:

  1. Adding a transformers_config dict to the Keras config dictionary in get_config (called by Keras at serialization time.
  2. Wrapping __init__ to accept that transformers_config dict (passed by Keras at deserialization time) and convert it to a config object for the actual layer initializer.
  3. Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not need to be supplied in custom_objects in the call to keras.models.load_model.

transformers.shape_list

< >

( tensor: typing.Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray] ) List[int]

Parameters

  • tensor (tf.Tensor or np.ndarray) — The tensor we want the shape of.

Returns

List[int]

The shape of the tensor as a list.

Deal with dynamic shape in tensorflow cleanly.

< > Update on GitHub