AWS Trainium & Inferentia documentation
Sentence Transformers 🤗
Sentence Transformers 🤗
SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. It can be used to compute embeddings using Sentence Transformer models or to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models. This unlocks a wide range of applications, including semantic search, semantic textual similarity, and paraphrase mining. Optimum Neuron offer APIs to ease the use of SentenceTransformers on AWS Neuron devices.
Export to Neuron
Option 1: CLI
- Example - Text embeddings
optimum-cli export neuron -m BAAI/bge-large-en-v1.5 --sequence_length 384 --batch_size 1 --task feature-extraction bge_emb_neuron/
- Example - Image Search
optimum-cli export neuron -m sentence-transformers/clip-ViT-B-32 --sequence_length 64 --text_batch_size 3 --image_batch_size 1 --num_channels 3 --height 224 --width 224 --task feature-extraction --subfolder 0_CLIPModel clip_emb_neuron/
Option 2: Python API
- Example - Text embeddings
from optimum.neuron import NeuronModelForSentenceTransformers
# configs for compiling model
input_shapes = {
"batch_size": 1,
"sequence_length": 384,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
"BAAI/bge-large-en-v1.5",
export=True,
**input_shapes,
**compiler_args,
)
# Save locally
neuron_model.save_pretrained("bge_emb_neuron/")
# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
"bge_emb_neuron/", repository_id="optimum/bge-base-en-v1.5-neuronx" # Replace with your HF Hub repo id
)
- Example - Image Search
from optimum.neuron import NeuronModelForSentenceTransformers
# configs for compiling model
input_shapes = {
"num_channels": 3,
"height": 224,
"width": 224,
"text_batch_size": 3,
"image_batch_size": 1,
"sequence_length": 64,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
"sentence-transformers/clip-ViT-B-32",
subfolder="0_CLIPModel",
export=True,
dynamic_batch_size=False,
**input_shapes,
**compiler_args,
)
# Save locally
neuron_model.save_pretrained("clip_emb_neuron/")
# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
"clip_emb_neuron/", repository_id="optimum/clip_vit_emb_neuronx" # Replace with your HF Hub repo id
)
NeuronModelForSentenceTransformers
class optimum.neuron.NeuronModelForSentenceTransformers
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model for Sentence Transformers.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Sentence Transformers model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor pixel_values: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForSentenceTransformers
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Text Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSentenceTransformers
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> inputs = tokenizer("In the smouldering promise of the fall of Troy, a mythical world of gods and mortals rises from the ashes.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> token_embeddings = outputs.token_embeddings
>>> sentence_embedding = = outputs.sentence_embedding
Image Example:
>>> from PIL import Image
>>> from transformers import AutoProcessor
>>> from sentence_transformers import util
>>> from optimum.neuron import NeuronModelForSentenceTransformers
>>> processor = AutoProcessor.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> util.http_get("https://github.com/UKPLab/sentence-transformers/raw/master/examples/sentence_transformer/applications/image-search/two_dogs_in_snow.jpg", "two_dogs_in_snow.jpg")
>>> inputs = processor(
>>> text=["Two dogs in the snow", 'A cat on a table', 'A picture of London at night'], images=Image.open("two_dogs_in_snow.jpg"), return_tensors="pt", padding=True
>>> )
>>> outputs = model(**inputs)
>>> cos_scores = util.cos_sim(outputs.image_embeds, outputs.text_embeds) # Compute cosine similarities