Feature Extraction
Transformers
Safetensors
ModularStarEncoder
custom_code
andreagurioli1995 commited on
Commit
ae18bf0
·
verified ·
1 Parent(s): 4d2b397

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -11,6 +13,46 @@ tags: []
11
 
12
  ## Model Details
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ### Model Description
15
 
16
  <!-- Provide a longer summary of what this model is. -->
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - bigcode/the-stack-v2
5
+ license: bigcode-openrail-m
6
  ---
7
 
8
  # Model Card for Model ID
 
13
 
14
  ## Model Details
15
 
16
+
17
+
18
+ ### How to use
19
+ ```python
20
+ from transformers import AutoModel
21
+ from transformers import AutoTokenizer
22
+
23
+ #import the model
24
+ model = AutoModel.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned", trust_remote_code=True)
25
+
26
+ #import the tokenizer
27
+ tokenizer = AutoTokenizer.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned")
28
+
29
+
30
+ language = "yourlanguagelowercased"
31
+
32
+ #instruction in case of code embedding in a code language
33
+ instruction_code = f"Represent this {language} code snippet for retrieval:"
34
+
35
+ #instruction in case of code embedding in English
36
+ instruction_natural_language = "Represent this code description for retrieving supporting snippets of code:"
37
+
38
+ code_snippet = "your code to embed here"
39
+
40
+ #You should follow this pattern to embed a snippet of code or natural language queries
41
+ sentence = f"{tokenizer.sep_token}{instruction_code}{tokenizer.sep_token}{code_snippet)}{tokenizer.cls_token}"
42
+
43
+ #Tokenizing your sentence
44
+ tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, max_length=2048)
45
+
46
+ #Embedding the tokenized sentence
47
+ embedded_sentence = model(**sentence)
48
+ ```
49
+
50
+ You will get as an output three elements:
51
+
52
+ - projected_pooled_normalized: a list of the projected, pooled, and normalized embeddings from the five exit points;
53
+ - raw_hidden_states: raw representation from all the hidden states of the model, without pooling, normalization, and projection
54
+ - attentions: attention scores from the encoder
55
+
56
  ### Model Description
57
 
58
  <!-- Provide a longer summary of what this model is. -->