modularStarEncoder
/

ModularStarEncoder-finetuned

Feature Extraction

ModularStarEncoder

Model card Files Files and versions Community

andreagurioli1995 commited on Feb 21

Commit

ae18bf0

·

verified ·

1 Parent(s): 4d2b397

Update README.md

Files changed (1) hide show

README.md +43 -1

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
@@ -11,6 +13,46 @@ tags: []
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->

 ---
 library_name: transformers
+datasets:
+- bigcode/the-stack-v2
+license: bigcode-openrail-m
 ---
 # Model Card for Model ID
 ## Model Details
+### How to use
+```python
+from transformers import AutoModel
+from transformers import AutoTokenizer
+#import the model
+model = AutoModel.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned", trust_remote_code=True)
+#import the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned")
+language = "yourlanguagelowercased"
+#instruction in case of code embedding in a code language
+instruction_code = f"Represent this {language} code snippet for retrieval:"
+#instruction in case of code embedding in English
+instruction_natural_language = "Represent this code description for retrieving supporting snippets of code:"
+code_snippet = "your code to embed here"
+#You should follow this pattern to embed a snippet of code or natural language queries
+sentence =  f"{tokenizer.sep_token}{instruction_code}{tokenizer.sep_token}{code_snippet)}{tokenizer.cls_token}"
+#Tokenizing your sentence
+tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, max_length=2048)
+#Embedding the tokenized sentence
+embedded_sentence = model(**sentence)
+```
+You will get as an output three elements:
+- projected_pooled_normalized: a list of the projected, pooled, and normalized embeddings from the five exit points;
+- raw_hidden_states: raw representation from all the hidden states of the model, without pooling, normalization, and projection
+- attentions: attention scores from the encoder
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->