Spaces:
Running
Running
Upload 1308 files
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +1 -0
- data/index.faiss +3 -0
- data/index_to_id.pkl +3 -0
- data/index_to_metadata.pkl +3 -0
- data/model_data_json/AdamCodd_vit-base-nsfw-detector.json +20 -0
- data/model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json +0 -0
- data/model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json +0 -0
- data/model_data_json/Alibaba-NLP_gte-base-en-v1.5.json +27 -0
- data/model_data_json/Alibaba-NLP_gte-large-en-v1.5.json +28 -0
- data/model_data_json/Alibaba-NLP_gte-multilingual-base.json +104 -0
- data/model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json +94 -0
- data/model_data_json/AutonLab_MOMENT-1-large.json +25 -0
- data/model_data_json/BAAI_bge-base-en-v1.5.json +29 -0
- data/model_data_json/BAAI_bge-base-en.json +23 -0
- data/model_data_json/BAAI_bge-large-en-v1.5.json +29 -0
- data/model_data_json/BAAI_bge-large-en.json +23 -0
- data/model_data_json/BAAI_bge-large-zh-v1.5.json +25 -0
- data/model_data_json/BAAI_bge-m3.json +24 -0
- data/model_data_json/BAAI_bge-multilingual-gemma2.json +0 -0
- data/model_data_json/BAAI_bge-reranker-base.json +26 -0
- data/model_data_json/BAAI_bge-reranker-large.json +29 -0
- data/model_data_json/BAAI_bge-reranker-v2-m3.json +19 -0
- data/model_data_json/BAAI_bge-small-en-v1.5.json +29 -0
- data/model_data_json/BAAI_bge-small-en.json +24 -0
- data/model_data_json/BAAI_llm-embedder.json +18 -0
- data/model_data_json/Babelscape_t5-base-summarization-claim-extractor.json +19 -0
- data/model_data_json/Babelscape_wikineural-multilingual-ner.json +31 -0
- data/model_data_json/Bingsu_adetailer.json +15 -0
- data/model_data_json/Bingsu_yolo-world-mirror.json +11 -0
- data/model_data_json/ByteDance_AnimateDiff-Lightning.json +15 -0
- data/model_data_json/ByteDance_Hyper-SD.json +17 -0
- data/model_data_json/CAMeL-Lab_bert-base-arabic-camelbert-mix-sentiment.json +19 -0
- data/model_data_json/CIDAS_clipseg-rd64-refined.json +17 -0
- data/model_data_json/CompVis_stable-diffusion-safety-checker.json +15 -0
- data/model_data_json/CompVis_stable-diffusion-v1-4.json +23 -0
- data/model_data_json/Danswer_intent-model.json +14 -0
- data/model_data_json/DavidAU_L3-Dark-Planet-8B-GGUF.json +42 -0
- data/model_data_json/Davlan_bert-base-multilingual-cased-ner-hrl.json +19 -0
- data/model_data_json/Davlan_distilbert-base-multilingual-cased-ner-hrl.json +18 -0
- data/model_data_json/DeepPavlov_rubert-base-cased-conversational.json +16 -0
- data/model_data_json/DeepPavlov_rubert-base-cased.json +17 -0
- data/model_data_json/DiTy_cross-encoder-russian-msmarco.json +24 -0
- data/model_data_json/Diginsa_Plant-Disease-Detection-Project.json +19 -0
- data/model_data_json/DunnBC22_ibert-roberta-base-Abusive_Or_Threatening_Speech.json +17 -0
- data/model_data_json/Efficient-Large-Model_NVILA-15B.json +18 -0
- data/model_data_json/ElKulako_cryptobert.json +29 -0
- data/model_data_json/EleutherAI_gpt-j-6b.json +23 -0
- data/model_data_json/EleutherAI_gpt-neo-1.3B.json +24 -0
- data/model_data_json/EleutherAI_gpt-neo-125m.json +24 -0
- data/model_data_json/EleutherAI_gpt-neox-20b.json +25 -0
.gitattributes
CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
index.faiss filter=lfs diff=lfs merge=lfs -text
|
|
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
index.faiss filter=lfs diff=lfs merge=lfs -text
|
37 |
+
data/index.faiss filter=lfs diff=lfs merge=lfs -text
|
data/index.faiss
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:78a068ac98a5de614955c9c1e307b40f7b403bd46d315cf3b583f22466bf5e7a
|
3 |
+
size 3545133
|
data/index_to_id.pkl
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eacb7d3fdf34ce9ff0a5c89cc1799b981f7da6a6d4880269bed2af19eda91da7
|
3 |
+
size 55018
|
data/index_to_metadata.pkl
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fe076a7a46265654075c23b846b259d7beaef163f5a5c72d14246a0e3d73579f
|
3 |
+
size 530243
|
data/model_data_json/AdamCodd_vit-base-nsfw-detector.json
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "AdamCodd/vit-base-nsfw-detector",
|
3 |
+
"downloads": 1035027,
|
4 |
+
"tags": [
|
5 |
+
"transformers.js",
|
6 |
+
"onnx",
|
7 |
+
"safetensors",
|
8 |
+
"vit",
|
9 |
+
"image-classification",
|
10 |
+
"transformers",
|
11 |
+
"nlp",
|
12 |
+
"base_model:google/vit-base-patch16-384",
|
13 |
+
"base_model:quantized:google/vit-base-patch16-384",
|
14 |
+
"license:apache-2.0",
|
15 |
+
"model-index",
|
16 |
+
"region:us"
|
17 |
+
],
|
18 |
+
"description": "--- metrics: - accuracy pipeline_tag: image-classification base_model: google/vit-base-patch16-384 model-index: - name: AdamCodd/vit-base-nsfw-detector results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.9654 name: Accuracy - type: AUC value: 0.9948 - type: loss value: 0.0937 name: Loss license: apache-2.0 tags: - transformers.js - transformers - nlp --- # vit-base-nsfw-detector This model is a fine-tuned version of vit-base-patch16-384 on around 25_000 images (drawings, photos...). It achieves the following results on the evaluation set: - Loss: 0.0937 - Accuracy: 0.9654 **<u>New [07/30]</u>**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): **AdamCodd/vit-nsfw-stable-diffusion**. **Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea: - Loss: 0.3682 (↑ 292.95%) - Accuracy: 0.8600 (↓ 10.91%) - F1: 0.8654 - AUC: 0.9376 (↓ 5.75%) - Precision: 0.8350 - Recall: 0.8980 ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. ## Intended uses & limitations There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify \"sexy\" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal. Usage for a local image: Usage for a distant image: Usage with Transformers.js (Vanilla JS): The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy. You can find a toy implementation of this model with Transformers.js here. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 1 ### Training results - Validation Loss: 0.0937 - Accuracy: 0.9654, - AUC: 0.9948 Confusion matrix (eval): [1076 37] [ 60 1627] ### Framework versions - Transformers 4.36.2 - Evaluate 0.4.1 If you want to support me, you can here.",
|
19 |
+
"model_explanation_gemini": "Classifies images as SFW or NSFW with high accuracy, primarily trained on non-generative images like drawings and photos."
|
20 |
+
}
|
data/model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/model_data_json/Alibaba-NLP_gte-base-en-v1.5.json
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Alibaba-NLP/gte-base-en-v1.5",
|
3 |
+
"downloads": 1472071,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"onnx",
|
7 |
+
"safetensors",
|
8 |
+
"new",
|
9 |
+
"feature-extraction",
|
10 |
+
"sentence-transformers",
|
11 |
+
"gte",
|
12 |
+
"mteb",
|
13 |
+
"transformers.js",
|
14 |
+
"sentence-similarity",
|
15 |
+
"custom_code",
|
16 |
+
"en",
|
17 |
+
"arxiv:2407.19669",
|
18 |
+
"arxiv:2308.03281",
|
19 |
+
"license:apache-2.0",
|
20 |
+
"model-index",
|
21 |
+
"text-embeddings-inference",
|
22 |
+
"endpoints_compatible",
|
23 |
+
"region:us"
|
24 |
+
],
|
25 |
+
"description": "--- library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.7910447761194 - type: ap value: 37.053785713650626 - type: f1 value: 68.51101510998551 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.016875 - type: ap value: 89.17750268426342 - type: f1 value: 92.9970977240524 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.312000000000005 - type: f1 value: 52.98175784163017 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 38.193 - type: map_at_10 value: 54.848 - type: map_at_100 value: 55.388000000000005 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.427 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 39.047 - type: mrr_at_10 value: 55.153 - type: mrr_at_100 value: 55.686 - type: mrr_at_1000 value: 55.688 - type: mrr_at_3 value: 50.676 - type: mrr_at_5 value: 53.417 - type: ndcg_at_1 value: 38.193 - type: ndcg_at_10 value: 63.486 - type: ndcg_at_100 value: 65.58 - type: ndcg_at_1000 value: 65.61 - type: ndcg_at_3 value: 54.494 - type: ndcg_at_5 value: 59.339 - type: precision_at_1 value: 38.193 - type: precision_at_10 value: 9.075 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.096 - type: precision_at_5 value: 15.619 - type: recall_at_1 value: 38.193 - type: recall_at_10 value: 90.754 - type: recall_at_100 value: 99.431 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.28699999999999 - type: recall_at_5 value: 78.094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.508221208908964 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.04668382560096 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.828759903716815 - type: mrr value: 74.37343358395991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.03673698773017 - type: cos_sim_spearman value: 83.6470866785058 - type: euclidean_pearson value: 82.64048673096565 - type: euclidean_spearman value: 83.63142367101115 - type: manhattan_pearson value: 82.71493099760228 - type: manhattan_spearman value: 83.60491704294326 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.73376623376623 - type: f1 value: 86.70294049278262 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.31923804167062 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.552547125348454 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.567 - type: map_at_10 value: 41.269 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.84 - type: map_at_3 value: 37.567 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 46.900999999999996 - type: mrr_at_100 value: 47.662 - type: mrr_at_1000 value: 47.713 - type: mrr_at_3 value: 43.801 - type: mrr_at_5 value: 45.689 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 47.73 - type: ndcg_at_100 value: 53.128 - type: ndcg_at_1000 value: 55.300000000000004 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.782 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 9.142 - type: precision_at_100 value: 1.485 - type: precision_at_1000 value: 0.197 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.535 - type: recall_at_1 value: 30.567 - type: recall_at_10 value: 60.602999999999994 - type: recall_at_100 value: 83.22800000000001 - type: recall_at_1000 value: 96.696 - type: recall_at_3 value: 44.336999999999996 - type: recall_at_5 value: 51.949 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 28.538000000000004 - type: map_at_10 value: 38.757999999999996 - type: map_at_100 value: 40.129 - type: map_at_1000 value: 40.262 - type: map_at_3 value: 35.866 - type: map_at_5 value: 37.417 - type: mrr_at_1 value: 36.051 - type: mrr_at_10 value: 44.868 - type: mrr_at_100 value: 45.568999999999996 - type: mrr_at_1000 value: 45.615 - type: mrr_at_3 value: 42.558 - type: mrr_at_5 value: 43.883 - type: ndcg_at_1 value: 36.051 - type: ndcg_at_10 value: 44.584 - type: ndcg_at_100 value: 49.356 - type: ndcg_at_1000 value: 51.39 - type: ndcg_at_3 value: 40.389 - type: ndcg_at_5 value: 42.14 - type: precision_at_1 value: 36.051 - type: precision_at_10 value: 8.446 - type: precision_at_100 value: 1.411 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 19.639 - type: precision_at_5 value: 13.796 - type: recall_at_1 value: 28.538000000000004 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_100 value: 75.098 - type: recall_at_1000 value: 87.848 - type: recall_at_3 value: 42.236000000000004 - type: recall_at_5 value: 47.377 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 37.188 - type: map_at_10 value: 50.861000000000004 - type: map_at_100 value: 51.917 - type: map_at_1000 value: 51.964999999999996 - type: map_at_3 value: 47.144000000000005 - type: map_at_5 value: 49.417 - type: mrr_at_1 value: 42.571 - type: mrr_at_10 value: 54.086999999999996 - type: mrr_at_100 value: 54.739000000000004 - type: mrr_at_1000 value: 54.762 - type: mrr_at_3 value: 51.285000000000004 - type: mrr_at_5 value: 53.0 - type: ndcg_at_1 value: 42.571 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.426 - type: ndcg_at_3 value: 51.0 - type: ndcg_at_5 value: 54.346000000000004 - type: precision_at_1 value: 42.571 - type: precision_at_10 value: 9.467 - type: precision_at_100 value: 1.2550000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.114 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 37.188 - type: recall_at_10 value: 73.068 - type: recall_at_100 value: 91.203 - type: recall_at_1000 value: 97.916 - type: recall_at_3 value: 56.552 - type: recall_at_5 value: 64.567 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 25.041000000000004 - type: map_at_10 value: 33.86 - type: map_at_100 value: 34.988 - type: map_at_1000 value: 35.064 - type: map_at_3 value: 31.049 - type: map_at_5 value: 32.845 - type: mrr_at_1 value: 26.893 - type: mrr_at_10 value: 35.594 - type: mrr_at_100 value: 36.617 - type: mrr_at_1000 value: 36.671 - type: mrr_at_3 value: 33.051 - type: mrr_at_5 value: 34.61 - type: ndcg_at_1 value: 26.893 - type: ndcg_at_10 value: 38.674 - type: ndcg_at_100 value: 44.178 - type: ndcg_at_1000 value: 46.089999999999996 - type: ndcg_at_3 value: 33.485 - type: ndcg_at_5 value: 36.402 - type: precision_at_1 value: 26.893 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.2 - type: precision_at_5 value: 10.26 - type: recall_at_1 value: 25.041000000000004 - type: recall_at_10 value: 51.666000000000004 - type: recall_at_100 value: 76.896 - type: recall_at_1000 value: 91.243 - type: recall_at_3 value: 38.035999999999994 - type: recall_at_5 value: 44.999 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.909999999999998 - type: map_at_10 value: 23.901 - type: map_at_100 value: 25.165 - type: map_at_1000 value: 25.291000000000004 - type: map_at_3 value: 21.356 - type: map_at_5 value: 22.816 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 28.382 - type: mrr_at_100 value: 29.465000000000003 - type: mrr_at_1000 value: 29.535 - type: mrr_at_3 value: 25.933 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 29.099000000000004 - type: ndcg_at_100 value: 35.127 - type: ndcg_at_1000 value: 38.096000000000004 - type: ndcg_at_3 value: 24.464 - type: ndcg_at_5 value: 26.709 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.398 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.774 - type: precision_at_5 value: 8.632 - type: recall_at_1 value: 15.909999999999998 - type: recall_at_10 value: 40.672000000000004 - type: recall_at_100 value: 66.855 - type: recall_at_1000 value: 87.922 - type: recall_at_3 value: 28.069 - type: recall_at_5 value: 33.812 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 30.175 - type: map_at_10 value: 41.36 - type: map_at_100 value: 42.701 - type: map_at_1000 value: 42.817 - type: map_at_3 value: 37.931 - type: map_at_5 value: 39.943 - type: mrr_at_1 value: 35.611 - type: mrr_at_10 value: 46.346 - type: mrr_at_100 value: 47.160000000000004 - type: mrr_at_1000 value: 47.203 - type: mrr_at_3 value: 43.712 - type: mrr_at_5 value: 45.367000000000004 - type: ndcg_at_1 value: 35.611 - type: ndcg_at_10 value: 47.532000000000004 - type: ndcg_at_100 value: 53.003 - type: ndcg_at_1000 value: 55.007 - type: ndcg_at_3 value: 42.043 - type: ndcg_at_5 value: 44.86 - type: precision_at_1 value: 35.611 - type: precision_at_10 value: 8.624 - type: precision_at_100 value: 1.332 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 20.083000000000002 - type: precision_at_5 value: 14.437 - type: recall_at_1 value: 30.175 - type: recall_at_10 value: 60.5 - type: recall_at_100 value: 83.399 - type: recall_at_1000 value: 96.255 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 52.432 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 22.467000000000002 - type: map_at_10 value: 33.812999999999995 - type: map_at_100 value: 35.248000000000005 - type: map_at_1000 value: 35.359 - type: map_at_3 value: 30.316 - type: map_at_5 value: 32.233000000000004 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 38.979 - type: mrr_at_100 value: 39.937 - type: mrr_at_1000 value: 39.989999999999995 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.871 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 40.282000000000004 - type: ndcg_at_100 value: 46.22 - type: ndcg_at_1000 value: 48.507 - type: ndcg_at_3 value: 34.596 - type: ndcg_at_5 value: 37.267 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 1.257 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.275 - type: precision_at_5 value: 12.556999999999999 - type: recall_at_1 value: 22.467000000000002 - type: recall_at_10 value: 54.14099999999999 - type: recall_at_100 value: 79.593 - type: recall_at_1000 value: 95.063 - type: recall_at_3 value: 38.539 - type: recall_at_5 value: 45.403 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 24.18591666666667 - type: map_at_10 value: 33.84258333333333 - type: map_at_100 value: 35.11391666666666 - type: map_at_1000 value: 35.23258333333333 - type: map_at_3 value: 30.764249999999997 - type: map_at_5 value: 32.52333333333334 - type: mrr_at_1 value: 28.54733333333333 - type: mrr_at_10 value: 37.81725 - type: mrr_at_100 value: 38.716499999999996 - type: mrr_at_1000 value: 38.77458333333333 - type: mrr_at_3 value: 35.157833333333336 - type: mrr_at_5 value: 36.69816666666667 - type: ndcg_at_1 value: 28.54733333333333 - type: ndcg_at_10 value: 39.51508333333334 - type: ndcg_at_100 value: 44.95316666666666 - type: ndcg_at_1000 value: 47.257083333333334 - type: ndcg_at_3 value: 34.205833333333324 - type: ndcg_at_5 value: 36.78266666666667 - type: precision_at_1 value: 28.54733333333333 - type: precision_at_10 value: 7.082583333333334 - type: precision_at_100 value: 1.1590833333333332 - type: precision_at_1000 value: 0.15516666666666662 - type: precision_at_3 value: 15.908750000000001 - type: precision_at_5 value: 11.505416666666669 - type: recall_at_1 value: 24.18591666666667 - type: recall_at_10 value: 52.38758333333333 - type: recall_at_100 value: 76.13666666666667 - type: recall_at_1000 value: 91.99066666666667 - type: recall_at_3 value: 37.78333333333334 - type: recall_at_5 value: 44.30141666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 29.781000000000002 - type: map_at_100 value: 30.847 - type: map_at_1000 value: 30.94 - type: map_at_3 value: 27.167 - type: map_at_5 value: 28.633999999999997 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 32.476 - type: mrr_at_100 value: 33.337 - type: mrr_at_1000 value: 33.403 - type: mrr_at_3 value: 29.881999999999998 - type: mrr_at_5 value: 31.339 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 34.596 - type: ndcg_at_100 value: 39.635 - type: ndcg_at_1000 value: 42.079 - type: ndcg_at_3 value: 29.516 - type: ndcg_at_5 value: 31.959 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 9.171999999999999 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 46.826 - type: recall_at_100 value: 69.554 - type: recall_at_1000 value: 87.749 - type: recall_at_3 value: 33.016 - type: recall_at_5 value: 38.97 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 15.614 - type: map_at_10 value: 22.927 - type: map_at_100 value: 24.185000000000002 - type: map_at_1000 value: 24.319 - type: map_at_3 value: 20.596 - type: map_at_5 value: 21.854000000000003 - type: mrr_at_1 value: 18.858 - type: mrr_at_10 value: 26.535999999999998 - type: mrr_at_100 value: 27.582 - type: mrr_at_1000 value: 27.665 - type: mrr_at_3 value: 24.295 - type: mrr_at_5 value: 25.532 - type: ndcg_at_1 value: 18.858 - type: ndcg_at_10 value: 27.583000000000002 - type: ndcg_at_100 value: 33.635 - type: ndcg_at_1000 value: 36.647 - type: ndcg_at_3 value: 23.348 - type: ndcg_at_5 value: 25.257 - type: precision_at_1 value: 18.858 - type: precision_at_10 value: 5.158 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 11.092 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.614 - type: recall_at_10 value: 37.916 - type: recall_at_100 value: 65.205 - type: recall_at_1000 value: 86.453 - type: recall_at_3 value: 26.137 - type: recall_at_5 value: 31.087999999999997 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 23.078000000000003 - type: map_at_10 value: 31.941999999999997 - type: map_at_100 value: 33.196999999999996 - type: map_at_1000 value: 33.303 - type: map_at_3 value: 28.927000000000003 - type: map_at_5 value: 30.707 - type: mrr_at_1 value: 26.866 - type: mrr_at_10 value: 35.557 - type: mrr_at_100 value: 36.569 - type: mrr_at_1000 value: 36.632 - type: mrr_at_3 value: 32.897999999999996 - type: mrr_at_5 value: 34.437 - type: ndcg_at_1 value: 26.866 - type: ndcg_at_10 value: 37.372 - type: ndcg_at_100 value: 43.248 - type: ndcg_at_1000 value: 45.632 - type: ndcg_at_3 value: 31.852999999999998 - type: ndcg_at_5 value: 34.582 - type: precision_at_1 value: 26.866 - type: precision_at_10 value: 6.511 - type: precision_at_100 value: 1.078 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 14.582999999999998 - type: precision_at_5 value: 10.634 - type: recall_at_1 value: 23.078000000000003 - type: recall_at_10 value: 50.334 - type: recall_at_100 value: 75.787 - type: recall_at_1000 value: 92.485 - type: recall_at_3 value: 35.386 - type: recall_at_5 value: 42.225 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.203999999999997 - type: map_at_10 value: 31.276 - type: map_at_100 value: 32.844 - type: map_at_1000 value: 33.062999999999995 - type: map_at_3 value: 27.733999999999998 - type: map_at_5 value: 29.64 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 36.083 - type: mrr_at_100 value: 37.008 - type: mrr_at_1000 value: 37.076 - type: mrr_at_3 value: 33.004 - type: mrr_at_5 value: 34.664 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 37.763000000000005 - type: ndcg_at_100 value: 43.566 - type: ndcg_at_1000 value: 46.356 - type: ndcg_at_3 value: 31.673000000000002 - type: ndcg_at_5 value: 34.501 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 7.470000000000001 - type: precision_at_100 value: 1.502 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 14.756 - type: precision_at_5 value: 11.225 - type: recall_at_1 value: 22.203999999999997 - type: recall_at_10 value: 51.437999999999995 - type: recall_at_100 value: 76.845 - type: recall_at_1000 value: 94.38600000000001 - type: recall_at_3 value: 34.258 - type: recall_at_5 value: 41.512 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.474 - type: map_at_10 value: 26.362999999999996 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.567999999999998 - type: map_at_3 value: 23.518 - type: map_at_5 value: 25.068 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.998 - type: mrr_at_100 value: 28.953 - type: mrr_at_1000 value: 29.03 - type: mrr_at_3 value: 25.230999999999998 - type: mrr_at_5 value: 26.654 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.684 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.555 - type: ndcg_at_3 value: 26.057000000000002 - type: ndcg_at_5 value: 28.587 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 11.583 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.474 - type: recall_at_10 value: 46.497 - type: recall_at_100 value: 69.977 - type: recall_at_1000 value: 89.872 - type: recall_at_3 value: 31.385999999999996 - type: recall_at_5 value: 37.283 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.173 - type: map_at_10 value: 30.407 - type: map_at_100 value: 32.528 - type: map_at_1000 value: 32.698 - type: map_at_3 value: 25.523 - type: map_at_5 value: 28.038 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.515 - type: mrr_at_100 value: 52.214000000000006 - type: mrr_at_1000 value: 52.237 - type: mrr_at_3 value: 48.502 - type: mrr_at_5 value: 50.251000000000005 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 40.355000000000004 - type: ndcg_at_100 value: 47.68 - type: ndcg_at_1000 value: 50.370000000000005 - type: ndcg_at_3 value: 33.946 - type: ndcg_at_5 value: 36.057 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.508 - type: precision_at_100 value: 2.054 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 25.581 - type: precision_at_5 value: 19.256999999999998 - type: recall_at_1 value: 17.173 - type: recall_at_10 value: 46.967 - type: recall_at_100 value: 71.47200000000001 - type: recall_at_1000 value: 86.238 - type: recall_at_3 value: 30.961 - type: recall_at_5 value: 37.539 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.999 - type: map_at_10 value: 18.989 - type: map_at_100 value: 26.133 - type: map_at_1000 value: 27.666 - type: map_at_3 value: 13.918 - type: map_at_5 value: 16.473 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.161 - type: mrr_at_100 value: 74.516 - type: mrr_at_1000 value: 74.524 - type: mrr_at_3 value: 72.875 - type: mrr_at_5 value: 73.613 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 39.902 - type: ndcg_at_100 value: 44.212 - type: ndcg_at_1000 value: 51.62 - type: ndcg_at_3 value: 45.193 - type: ndcg_at_5 value: 42.541000000000004 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 30.425 - type: precision_at_100 value: 9.754999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 48.25 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.999 - type: recall_at_10 value: 24.133 - type: recall_at_100 value: 49.138999999999996 - type: recall_at_1000 value: 72.639 - type: recall_at_3 value: 15.287999999999998 - type: recall_at_5 value: 19.415 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.38999999999999 - type: f1 value: 41.444205512055234 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.35000000000001 - type: map_at_10 value: 92.837 - type: map_at_100 value: 92.996 - type: map_at_1000 value: 93.006 - type: map_at_3 value: 92.187 - type: map_at_5 value: 92.595 - type: mrr_at_1 value: 93.864 - type: mrr_at_10 value: 96.723 - type: mrr_at_100 value: 96.72500000000001 - type: mrr_at_1000 value: 96.72500000000001 - type: mrr_at_3 value: 96.64 - type: mrr_at_5 value: 96.71499999999999 - type: ndcg_at_1 value: 93.864 - type: ndcg_at_10 value: 94.813 - type: ndcg_at_100 value: 95.243 - type: ndcg_at_1000 value: 95.38600000000001 - type: ndcg_at_3 value: 94.196 - type: ndcg_at_5 value: 94.521 - type: precision_at_1 value: 93.864 - type: precision_at_10 value: 10.951 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 35.114000000000004 - type: precision_at_5 value: 21.476 - type: recall_at_1 value: 87.35000000000001 - type: recall_at_10 value: 96.941 - type: recall_at_100 value: 98.397 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 95.149 - type: recall_at_5 value: 96.131 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.476 - type: map_at_10 value: 40.11 - type: map_at_100 value: 42.229 - type: map_at_1000 value: 42.378 - type: map_at_3 value: 34.512 - type: map_at_5 value: 38.037 - type: mrr_at_1 value: 47.839999999999996 - type: mrr_at_10 value: 57.053 - type: mrr_at_100 value: 57.772 - type: mrr_at_1000 value: 57.799 - type: mrr_at_3 value: 54.552 - type: mrr_at_5 value: 56.011 - type: ndcg_at_1 value: 47.839999999999996 - type: ndcg_at_10 value: 48.650999999999996 - type: ndcg_at_100 value: 55.681000000000004 - type: ndcg_at_1000 value: 57.979 - type: ndcg_at_3 value: 43.923 - type: ndcg_at_5 value: 46.037 - type: precision_at_1 value: 47.839999999999996 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 2.0660000000000003 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 29.064 - type: precision_at_5 value: 22.006 - type: recall_at_1 value: 24.476 - type: recall_at_10 value: 56.216 - type: recall_at_100 value: 81.798 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 39.357 - type: recall_at_5 value: 47.802 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 42.728 - type: map_at_10 value: 57.737 - type: map_at_100 value: 58.531 - type: map_at_1000 value: 58.594 - type: map_at_3 value: 54.869 - type: map_at_5 value: 56.55 - type: mrr_at_1 value: 85.456 - type: mrr_at_10 value: 90.062 - type: mrr_at_100 value: 90.159 - type: mrr_at_1000 value: 90.16 - type: mrr_at_3 value: 89.37899999999999 - type: mrr_at_5 value: 89.81 - type: ndcg_at_1 value: 85.456 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.341 - type: ndcg_at_1000 value: 71.538 - type: ndcg_at_3 value: 63.735 - type: ndcg_at_5 value: 65.823 - type: precision_at_1 value: 85.456 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.545 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 38.861000000000004 - type: precision_at_5 value: 24.964 - type: recall_at_1 value: 42.728 - type: recall_at_10 value: 67.252 - type: recall_at_100 value: 77.265 - type: recall_at_1000 value: 85.246 - type: recall_at_3 value: 58.292 - type: recall_at_5 value: 62.41100000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 87.4836 - type: ap value: 82.29552224030336 - type: f1 value: 87.42791432227448 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.015 - type: map_at_10 value: 35.621 - type: map_at_100 value: 36.809 - type: map_at_1000 value: 36.853 - type: map_at_3 value: 31.832 - type: map_at_5 value: 34.006 - type: mrr_at_1 value: 23.738999999999997 - type: mrr_at_10 value: 36.309999999999995 - type: mrr_at_100 value: 37.422 - type: mrr_at_1000 value: 37.461 - type: mrr_at_3 value: 32.592999999999996 - type: mrr_at_5 value: 34.736 - type: ndcg_at_1 value: 23.724999999999998 - type: ndcg_at_10 value: 42.617 - type: ndcg_at_100 value: 48.217999999999996 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 34.905 - type: ndcg_at_5 value: 38.769 - type: precision_at_1 value: 23.724999999999998 - type: precision_at_10 value: 6.689 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.89 - type: precision_at_5 value: 10.897 - type: recall_at_1 value: 23.015 - type: recall_at_10 value: 64.041 - type: recall_at_100 value: 89.724 - type: recall_at_1000 value: 98.00999999999999 - type: recall_at_3 value: 43.064 - type: recall_at_5 value: 52.31099999999999 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.49794801641588 - type: f1 value: 96.28931114498003 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.81121751025992 - type: f1 value: 63.18740125901853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.66644250168123 - type: f1 value: 74.93211186867839 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.77202420981843 - type: f1 value: 81.63681969283554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.596687684870645 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.26965660101405 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.33619694846802 - type: mrr value: 32.53719657720334 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.0729999999999995 - type: map_at_10 value: 13.245999999999999 - type: map_at_100 value: 16.747999999999998 - type: map_at_1000 value: 18.163 - type: map_at_3 value: 10.064 - type: map_at_5 value: 11.513 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 58.092 - type: mrr_at_100 value: 58.752 - type: mrr_at_1000 value: 58.78 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 57.389 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 35.881 - type: ndcg_at_100 value: 32.751999999999995 - type: ndcg_at_1000 value: 41.498000000000005 - type: ndcg_at_3 value: 42.518 - type: ndcg_at_5 value: 39.550999999999995 - type: precision_at_1 value: 49.536 - type: precision_at_10 value: 26.316 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.081 - type: precision_at_3 value: 39.938 - type: precision_at_5 value: 34.056 - type: recall_at_1 value: 6.0729999999999995 - type: recall_at_10 value: 16.593 - type: recall_at_100 value: 32.883 - type: recall_at_1000 value: 64.654 - type: recall_at_3 value: 11.174000000000001 - type: recall_at_5 value: 13.528 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 30.043 - type: map_at_10 value: 45.318999999999996 - type: map_at_100 value: 46.381 - type: map_at_1000 value: 46.412 - type: map_at_3 value: 40.941 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 33.98 - type: mrr_at_10 value: 47.870000000000005 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.703 - type: mrr_at_3 value: 44.341 - type: mrr_at_5 value: 46.547 - type: ndcg_at_1 value: 33.98 - type: ndcg_at_10 value: 52.957 - type: ndcg_at_100 value: 57.434 - type: ndcg_at_1000 value: 58.103 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 49.353 - type: precision_at_1 value: 33.98 - type: precision_at_10 value: 8.786 - type: precision_at_100 value: 1.1280000000000001 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.577 - type: precision_at_5 value: 14.942 - type: recall_at_1 value: 30.043 - type: recall_at_10 value: 73.593 - type: recall_at_100 value: 93.026 - type: recall_at_1000 value: 97.943 - type: recall_at_3 value: 52.955 - type: recall_at_5 value: 63.132 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.808 - type: map_at_10 value: 84.675 - type: map_at_100 value: 85.322 - type: map_at_1000 value: 85.33800000000001 - type: map_at_3 value: 81.68900000000001 - type: map_at_5 value: 83.543 - type: mrr_at_1 value: 81.5 - type: mrr_at_10 value: 87.59700000000001 - type: mrr_at_100 value: 87.705 - type: mrr_at_1000 value: 87.70599999999999 - type: mrr_at_3 value: 86.607 - type: mrr_at_5 value: 87.289 - type: ndcg_at_1 value: 81.51 - type: ndcg_at_10 value: 88.41799999999999 - type: ndcg_at_100 value: 89.644 - type: ndcg_at_1000 value: 89.725 - type: ndcg_at_3 value: 85.49900000000001 - type: ndcg_at_5 value: 87.078 - type: precision_at_1 value: 81.51 - type: precision_at_10 value: 13.438 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.363 - type: precision_at_5 value: 24.57 - type: recall_at_1 value: 70.808 - type: recall_at_10 value: 95.575 - type: recall_at_100 value: 99.667 - type: recall_at_1000 value: 99.98899999999999 - type: recall_at_3 value: 87.223 - type: recall_at_5 value: 91.682 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 58.614831329137715 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.86580408560826 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 13.014000000000001 - type: map_at_100 value: 15.412999999999998 - type: map_at_1000 value: 15.756999999999998 - type: map_at_3 value: 9.216000000000001 - type: map_at_5 value: 11.036999999999999 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 37.133 - type: mrr_at_100 value: 38.165 - type: mrr_at_1000 value: 38.198 - type: mrr_at_3 value: 33.217 - type: mrr_at_5 value: 35.732 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.918000000000003 - type: ndcg_at_100 value: 30.983 - type: ndcg_at_1000 value: 36.629 - type: ndcg_at_3 value: 20.544999999999998 - type: ndcg_at_5 value: 18.192 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 11.44 - type: precision_at_100 value: 2.459 - type: precision_at_1000 value: 0.381 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 16.16 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 23.215 - type: recall_at_100 value: 49.902 - type: recall_at_1000 value: 77.403 - type: recall_at_3 value: 11.733 - type: recall_at_5 value: 16.372999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.9365442977452 - type: cos_sim_spearman value: 79.36960687383745 - type: euclidean_pearson value: 79.6045204840714 - type: euclidean_spearman value: 79.26382712751337 - type: manhattan_pearson value: 79.4805084789529 - type: manhattan_spearman value: 79.21847863209523 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.27906192961453 - type: cos_sim_spearman value: 74.38364712099211 - type: euclidean_pearson value: 78.54358927241223 - type: euclidean_spearman value: 74.22185560806376 - type: manhattan_pearson value: 78.50904327377751 - type: manhattan_spearman value: 74.2627500781748 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.66863742649639 - type: cos_sim_spearman value: 84.70630905216271 - type: euclidean_pearson value: 84.64498334705334 - type: euclidean_spearman value: 84.87204770690148 - type: manhattan_pearson value: 84.65774227976077 - type: manhattan_spearman value: 84.91251851797985 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.1577763924467 - type: cos_sim_spearman value: 80.10314039230198 - type: euclidean_pearson value: 81.51346991046043 - type: euclidean_spearman value: 80.08678485109435 - type: manhattan_pearson value: 81.57058914661894 - type: manhattan_spearman value: 80.1516230725106 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.40310839662533 - type: cos_sim_spearman value: 87.16293477217867 - type: euclidean_pearson value: 86.50688711184775 - type: euclidean_spearman value: 87.08651444923031 - type: manhattan_pearson value: 86.54674677557857 - type: manhattan_spearman value: 87.15079017870971 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.32886275207817 - type: cos_sim_spearman value: 85.0190460590732 - type: euclidean_pearson value: 84.42553652784679 - type: euclidean_spearman value: 85.20027364279328 - type: manhattan_pearson value: 84.42926246281078 - type: manhattan_spearman value: 85.20187419804306 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.76732216967812 - type: cos_sim_spearman value: 90.63701653633909 - type: euclidean_pearson value: 90.26678186114682 - type: euclidean_spearman value: 90.67288073455427 - type: manhattan_pearson value: 90.20772020584582 - type: manhattan_spearman value: 90.60764863983702 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 69.09280387698125 - type: cos_sim_spearman value: 68.62743151172162 - type: euclidean_pearson value: 69.89386398104689 - type: euclidean_spearman value: 68.71191066733556 - type: manhattan_pearson value: 69.92516500604872 - type: manhattan_spearman value: 68.80452846992576 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.13178592019887 - type: cos_sim_spearman value: 86.03947178806887 - type: euclidean_pearson value: 85.87029414285313 - type: euclidean_spearman value: 86.04960843306998 - type: manhattan_pearson value: 85.92946858580146 - type: manhattan_spearman value: 86.12575341860442 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.16657063002837 - type: mrr value: 95.73671063867141 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 63.510999999999996 - type: map_at_10 value: 72.76899999999999 - type: map_at_100 value: 73.303 - type: map_at_1000 value: 73.32499999999999 - type: map_at_3 value: 70.514 - type: map_at_5 value: 71.929 - type: mrr_at_1 value: 66.333 - type: mrr_at_10 value: 73.75 - type: mrr_at_100 value: 74.119 - type: mrr_at_1000 value: 74.138 - type: mrr_at_3 value: 72.222 - type: mrr_at_5 value: 73.122 - type: ndcg_at_1 value: 66.333 - type: ndcg_at_10 value: 76.774 - type: ndcg_at_100 value: 78.78500000000001 - type: ndcg_at_1000 value: 79.254 - type: ndcg_at_3 value: 73.088 - type: ndcg_at_5 value: 75.002 - type: precision_at_1 value: 66.333 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.222 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 63.510999999999996 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.86699999999999 - type: recall_at_5 value: 82.73899999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.78514851485149 - type: cos_sim_ap value: 94.94214383862038 - type: cos_sim_f1 value: 89.02255639097744 - type: cos_sim_precision value: 89.2462311557789 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.78217821782178 - type: dot_ap value: 94.69965247836805 - type: dot_f1 value: 88.78695208970439 - type: dot_precision value: 90.54054054054053 - type: dot_recall value: 87.1 - type: euclidean_accuracy value: 99.78118811881188 - type: euclidean_ap value: 94.9865187695411 - type: euclidean_f1 value: 88.99950223992036 - type: euclidean_precision value: 88.60257680872151 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.78811881188119 - type: manhattan_ap value: 95.0021236766459 - type: manhattan_f1 value: 89.12071535022356 - type: manhattan_precision value: 88.54886475814413 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.78811881188119 - type: max_ap value: 95.0021236766459 - type: max_f1 value: 89.12071535022356 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.93190546593995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 37.602808534760655 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.29214480978073 - type: mrr value: 53.123169722434426 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.967800769650022 - type: cos_sim_spearman value: 31.168490040206926 - type: dot_pearson value: 30.888603021128553 - type: dot_spearman value: 31.028241262520385 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22300000000000003 - type: map_at_10 value: 1.781 - type: map_at_100 value: 9.905999999999999 - type: map_at_1000 value: 23.455000000000002 - type: map_at_3 value: 0.569 - type: map_at_5 value: 0.918 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 55.32 - type: ndcg_at_1000 value: 49.532 - type: ndcg_at_3 value: 73.715 - type: ndcg_at_5 value: 72.74199999999999 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 78.8 - type: precision_at_100 value: 56.32 - type: precision_at_1000 value: 21.504 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.0 - type: recall_at_1 value: 0.22300000000000003 - type: recall_at_10 value: 2.049 - type: recall_at_100 value: 13.553 - type: recall_at_1000 value: 46.367999999999995 - type: recall_at_3 value: 0.604 - type: recall_at_5 value: 1.015 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0380000000000003 - type: map_at_10 value: 10.188 - type: map_at_100 value: 16.395 - type: map_at_1000 value: 18.024 - type: map_at_3 value: 6.236 - type: map_at_5 value: 7.276000000000001 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 46.292 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.446 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.32 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 25.219 - type: ndcg_at_100 value: 37.802 - type: ndcg_at_1000 value: 49.274 - type: ndcg_at_3 value: 28.605999999999998 - type: ndcg_at_5 value: 26.21 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.776 - type: precision_at_1000 value: 1.522 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 3.0380000000000003 - type: recall_at_10 value: 16.298000000000002 - type: recall_at_100 value: 48.712 - type: recall_at_1000 value: 83.16799999999999 - type: recall_at_3 value: 7.265000000000001 - type: recall_at_5 value: 9.551 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 83.978 - type: ap value: 24.751887949330015 - type: f1 value: 66.8685134049279 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.573288058856825 - type: f1 value: 61.973261751726604 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.75483298792469 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.36824223639506 - type: cos_sim_ap value: 75.53126388573047 - type: cos_sim_f1 value: 67.9912831688245 - type: cos_sim_precision value: 66.11817501869858 - type: cos_sim_recall value: 69.9736147757256 - type: dot_accuracy value: 86.39804494248078 - type: dot_ap value: 75.27598891718046 - type: dot_f1 value: 67.91146284159763 - type: dot_precision value: 63.90505003490807 - type: dot_recall value: 72.45382585751979 - type: euclidean_accuracy value: 86.36228169517793 - type: euclidean_ap value: 75.51438087434647 - type: euclidean_f1 value: 68.02370523061066 - type: euclidean_precision value: 66.46525679758308 - type: euclidean_recall value: 69.65699208443272 - type: manhattan_accuracy value: 86.46361089586935 - type: manhattan_ap value: 75.50800785730111 - type: manhattan_f1 value: 67.9220437187253 - type: manhattan_precision value: 67.79705573080967 - type: manhattan_recall value: 68.04749340369392 - type: max_accuracy value: 86.46361089586935 - type: max_ap value: 75.53126388573047 - type: max_f1 value: 68.02370523061066 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.80350836341057 - type: cos_sim_ap value: 85.51101933260743 - type: cos_sim_f1 value: 77.9152271629704 - type: cos_sim_precision value: 75.27815662910056 - type: cos_sim_recall value: 80.74376347397599 - type: dot_accuracy value: 88.84425815966158 - type: dot_ap value: 85.49726945962519 - type: dot_f1 value: 77.94445269567801 - type: dot_precision value: 75.27251864601261 - type: dot_recall value: 80.81305820757623 - type: euclidean_accuracy value: 88.80350836341057 - type: euclidean_ap value: 85.4882880790211 - type: euclidean_f1 value: 77.87063284615103 - type: euclidean_precision value: 74.61022927689595 - type: euclidean_recall value: 81.42901139513397 - type: manhattan_accuracy value: 88.7161873714441 - type: manhattan_ap value: 85.45753871906821 - type: manhattan_f1 value: 77.8686401480111 - type: manhattan_precision value: 74.95903683123174 - type: manhattan_recall value: 81.01324299353249 - type: max_accuracy value: 88.84425815966158 - type: max_ap value: 85.51101933260743 - type: max_f1 value: 77.94445269567801 --- <!-- **English** | 中文 --> # gte-base-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with : Use with : Use with infinity: Infinity is a MIT licensed server for OpenAI-compatible deployment. ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000 - CPT: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
|
26 |
+
"model_explanation_gemini": "Generates sentence embeddings for English text to perform tasks like classification, retrieval, clustering, and similarity scoring."
|
27 |
+
}
|
data/model_data_json/Alibaba-NLP_gte-large-en-v1.5.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Alibaba-NLP/gte-large-en-v1.5",
|
3 |
+
"downloads": 886564,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"onnx",
|
7 |
+
"safetensors",
|
8 |
+
"new",
|
9 |
+
"feature-extraction",
|
10 |
+
"sentence-transformers",
|
11 |
+
"gte",
|
12 |
+
"mteb",
|
13 |
+
"transformers.js",
|
14 |
+
"sentence-similarity",
|
15 |
+
"custom_code",
|
16 |
+
"en",
|
17 |
+
"dataset:allenai/c4",
|
18 |
+
"arxiv:2407.19669",
|
19 |
+
"arxiv:2308.03281",
|
20 |
+
"license:apache-2.0",
|
21 |
+
"model-index",
|
22 |
+
"text-embeddings-inference",
|
23 |
+
"endpoints_compatible",
|
24 |
+
"region:us"
|
25 |
+
],
|
26 |
+
"description": "--- datasets: - allenai/c4 library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.01492537313432 - type: ap value: 35.05341696659522 - type: f1 value: 66.71270310883853 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.97189999999999 - type: ap value: 90.5952493948908 - type: f1 value: 93.95848137716877 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 54.196 - type: f1 value: 53.80122334012787 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 47.297 - type: map_at_10 value: 64.303 - type: map_at_100 value: 64.541 - type: map_at_1000 value: 64.541 - type: map_at_3 value: 60.728 - type: map_at_5 value: 63.114000000000004 - type: mrr_at_1 value: 48.435 - type: mrr_at_10 value: 64.657 - type: mrr_at_100 value: 64.901 - type: mrr_at_1000 value: 64.901 - type: mrr_at_3 value: 61.06 - type: mrr_at_5 value: 63.514 - type: ndcg_at_1 value: 47.297 - type: ndcg_at_10 value: 72.107 - type: ndcg_at_100 value: 72.963 - type: ndcg_at_1000 value: 72.963 - type: ndcg_at_3 value: 65.063 - type: ndcg_at_5 value: 69.352 - type: precision_at_1 value: 47.297 - type: precision_at_10 value: 9.623 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 25.865 - type: precision_at_5 value: 17.596 - type: recall_at_1 value: 47.297 - type: recall_at_10 value: 96.23 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 77.596 - type: recall_at_5 value: 87.98 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.467787861077475 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.39198391914257 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.12794820591384 - type: mrr value: 75.9331442641692 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.85062993863319 - type: cos_sim_spearman value: 85.39049989733459 - type: euclidean_pearson value: 86.00222680278333 - type: euclidean_spearman value: 85.45556162077396 - type: manhattan_pearson value: 85.88769871785621 - type: manhattan_spearman value: 85.11760211290839 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.32792207792208 - type: f1 value: 87.29132945999555 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.5779328301945 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.94425623865118 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.978 - type: map_at_10 value: 44.45 - type: map_at_100 value: 46.19 - type: map_at_1000 value: 46.303 - type: map_at_3 value: 40.849000000000004 - type: map_at_5 value: 42.55 - type: mrr_at_1 value: 40.629 - type: mrr_at_10 value: 50.848000000000006 - type: mrr_at_100 value: 51.669 - type: mrr_at_1000 value: 51.705 - type: mrr_at_3 value: 47.997 - type: mrr_at_5 value: 49.506 - type: ndcg_at_1 value: 40.629 - type: ndcg_at_10 value: 51.102000000000004 - type: ndcg_at_100 value: 57.159000000000006 - type: ndcg_at_1000 value: 58.669000000000004 - type: ndcg_at_3 value: 45.738 - type: ndcg_at_5 value: 47.632999999999996 - type: precision_at_1 value: 40.629 - type: precision_at_10 value: 9.700000000000001 - type: precision_at_100 value: 1.5970000000000002 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.698 - type: precision_at_5 value: 15.393 - type: recall_at_1 value: 32.978 - type: recall_at_10 value: 63.711 - type: recall_at_100 value: 88.39399999999999 - type: recall_at_1000 value: 97.513 - type: recall_at_3 value: 48.025 - type: recall_at_5 value: 53.52 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 30.767 - type: map_at_10 value: 42.195 - type: map_at_100 value: 43.541999999999994 - type: map_at_1000 value: 43.673 - type: map_at_3 value: 38.561 - type: map_at_5 value: 40.532000000000004 - type: mrr_at_1 value: 38.79 - type: mrr_at_10 value: 48.021 - type: mrr_at_100 value: 48.735 - type: mrr_at_1000 value: 48.776 - type: mrr_at_3 value: 45.594 - type: mrr_at_5 value: 46.986 - type: ndcg_at_1 value: 38.79 - type: ndcg_at_10 value: 48.468 - type: ndcg_at_100 value: 53.037 - type: ndcg_at_1000 value: 55.001999999999995 - type: ndcg_at_3 value: 43.409 - type: ndcg_at_5 value: 45.654 - type: precision_at_1 value: 38.79 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.201 - type: precision_at_3 value: 21.21 - type: precision_at_5 value: 15.171999999999999 - type: recall_at_1 value: 30.767 - type: recall_at_10 value: 60.118 - type: recall_at_100 value: 79.271 - type: recall_at_1000 value: 91.43299999999999 - type: recall_at_3 value: 45.36 - type: recall_at_5 value: 51.705 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 40.007 - type: map_at_10 value: 53.529 - type: map_at_100 value: 54.602 - type: map_at_1000 value: 54.647 - type: map_at_3 value: 49.951 - type: map_at_5 value: 52.066 - type: mrr_at_1 value: 45.705 - type: mrr_at_10 value: 56.745000000000005 - type: mrr_at_100 value: 57.43899999999999 - type: mrr_at_1000 value: 57.462999999999994 - type: mrr_at_3 value: 54.25299999999999 - type: mrr_at_5 value: 55.842000000000006 - type: ndcg_at_1 value: 45.705 - type: ndcg_at_10 value: 59.809 - type: ndcg_at_100 value: 63.837999999999994 - type: ndcg_at_1000 value: 64.729 - type: ndcg_at_3 value: 53.994 - type: ndcg_at_5 value: 57.028 - type: precision_at_1 value: 45.705 - type: precision_at_10 value: 9.762 - type: precision_at_100 value: 1.275 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.368000000000002 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 40.007 - type: recall_at_10 value: 75.017 - type: recall_at_100 value: 91.99000000000001 - type: recall_at_1000 value: 98.265 - type: recall_at_3 value: 59.704 - type: recall_at_5 value: 67.109 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 26.639000000000003 - type: map_at_10 value: 35.926 - type: map_at_100 value: 37.126999999999995 - type: map_at_1000 value: 37.202 - type: map_at_3 value: 32.989000000000004 - type: map_at_5 value: 34.465 - type: mrr_at_1 value: 28.475 - type: mrr_at_10 value: 37.7 - type: mrr_at_100 value: 38.753 - type: mrr_at_1000 value: 38.807 - type: mrr_at_3 value: 35.066 - type: mrr_at_5 value: 36.512 - type: ndcg_at_1 value: 28.475 - type: ndcg_at_10 value: 41.245 - type: ndcg_at_100 value: 46.814 - type: ndcg_at_1000 value: 48.571 - type: ndcg_at_3 value: 35.528999999999996 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 28.475 - type: precision_at_10 value: 6.497 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 15.065999999999999 - type: precision_at_5 value: 10.599 - type: recall_at_1 value: 26.639000000000003 - type: recall_at_10 value: 55.759 - type: recall_at_100 value: 80.913 - type: recall_at_1000 value: 93.929 - type: recall_at_3 value: 40.454 - type: recall_at_5 value: 46.439 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.767999999999999 - type: map_at_10 value: 24.811 - type: map_at_100 value: 26.064999999999998 - type: map_at_1000 value: 26.186999999999998 - type: map_at_3 value: 21.736 - type: map_at_5 value: 23.283 - type: mrr_at_1 value: 19.527 - type: mrr_at_10 value: 29.179 - type: mrr_at_100 value: 30.153999999999996 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.223000000000003 - type: mrr_at_5 value: 27.733999999999998 - type: ndcg_at_1 value: 19.527 - type: ndcg_at_10 value: 30.786 - type: ndcg_at_100 value: 36.644 - type: ndcg_at_1000 value: 39.440999999999995 - type: ndcg_at_3 value: 24.958 - type: ndcg_at_5 value: 27.392 - type: precision_at_1 value: 19.527 - type: precision_at_10 value: 5.995 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.520999999999999 - type: precision_at_5 value: 9.129 - type: recall_at_1 value: 15.767999999999999 - type: recall_at_10 value: 44.824000000000005 - type: recall_at_100 value: 70.186 - type: recall_at_1000 value: 89.934 - type: recall_at_3 value: 28.607 - type: recall_at_5 value: 34.836 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 31.952 - type: map_at_10 value: 44.438 - type: map_at_100 value: 45.778 - type: map_at_1000 value: 45.883 - type: map_at_3 value: 41.044000000000004 - type: map_at_5 value: 42.986000000000004 - type: mrr_at_1 value: 39.172000000000004 - type: mrr_at_10 value: 49.76 - type: mrr_at_100 value: 50.583999999999996 - type: mrr_at_1000 value: 50.621 - type: mrr_at_3 value: 47.353 - type: mrr_at_5 value: 48.739 - type: ndcg_at_1 value: 39.172000000000004 - type: ndcg_at_10 value: 50.760000000000005 - type: ndcg_at_100 value: 56.084 - type: ndcg_at_1000 value: 57.865 - type: ndcg_at_3 value: 45.663 - type: ndcg_at_5 value: 48.178 - type: precision_at_1 value: 39.172000000000004 - type: precision_at_10 value: 9.22 - type: precision_at_100 value: 1.387 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 21.976000000000003 - type: precision_at_5 value: 15.457 - type: recall_at_1 value: 31.952 - type: recall_at_10 value: 63.900999999999996 - type: recall_at_100 value: 85.676 - type: recall_at_1000 value: 97.03699999999999 - type: recall_at_3 value: 49.781 - type: recall_at_5 value: 56.330000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 25.332 - type: map_at_10 value: 36.874 - type: map_at_100 value: 38.340999999999994 - type: map_at_1000 value: 38.452 - type: map_at_3 value: 33.068 - type: map_at_5 value: 35.324 - type: mrr_at_1 value: 30.822 - type: mrr_at_10 value: 41.641 - type: mrr_at_100 value: 42.519 - type: mrr_at_1000 value: 42.573 - type: mrr_at_3 value: 38.413000000000004 - type: mrr_at_5 value: 40.542 - type: ndcg_at_1 value: 30.822 - type: ndcg_at_10 value: 43.414 - type: ndcg_at_100 value: 49.196 - type: ndcg_at_1000 value: 51.237 - type: ndcg_at_3 value: 37.230000000000004 - type: ndcg_at_5 value: 40.405 - type: precision_at_1 value: 30.822 - type: precision_at_10 value: 8.379 - type: precision_at_100 value: 1.315 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 18.417 - type: precision_at_5 value: 13.744 - type: recall_at_1 value: 25.332 - type: recall_at_10 value: 57.774 - type: recall_at_100 value: 82.071 - type: recall_at_1000 value: 95.60600000000001 - type: recall_at_3 value: 40.722 - type: recall_at_5 value: 48.754999999999995 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.91033333333334 - type: map_at_10 value: 36.23225000000001 - type: map_at_100 value: 37.55766666666667 - type: map_at_1000 value: 37.672583333333336 - type: map_at_3 value: 32.95666666666667 - type: map_at_5 value: 34.73375 - type: mrr_at_1 value: 30.634 - type: mrr_at_10 value: 40.19449999999999 - type: mrr_at_100 value: 41.099250000000005 - type: mrr_at_1000 value: 41.15091666666667 - type: mrr_at_3 value: 37.4615 - type: mrr_at_5 value: 39.00216666666667 - type: ndcg_at_1 value: 30.634 - type: ndcg_at_10 value: 42.162166666666664 - type: ndcg_at_100 value: 47.60708333333333 - type: ndcg_at_1000 value: 49.68616666666666 - type: ndcg_at_3 value: 36.60316666666666 - type: ndcg_at_5 value: 39.15616666666668 - type: precision_at_1 value: 30.634 - type: precision_at_10 value: 7.6193333333333335 - type: precision_at_100 value: 1.2198333333333333 - type: precision_at_1000 value: 0.15975000000000003 - type: precision_at_3 value: 17.087 - type: precision_at_5 value: 12.298333333333334 - type: recall_at_1 value: 25.91033333333334 - type: recall_at_10 value: 55.67300000000001 - type: recall_at_100 value: 79.20608333333334 - type: recall_at_1000 value: 93.34866666666667 - type: recall_at_3 value: 40.34858333333333 - type: recall_at_5 value: 46.834083333333325 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.006 - type: map_at_10 value: 32.177 - type: map_at_100 value: 33.324999999999996 - type: map_at_1000 value: 33.419 - type: map_at_3 value: 29.952 - type: map_at_5 value: 31.095 - type: mrr_at_1 value: 28.066999999999997 - type: mrr_at_10 value: 34.995 - type: mrr_at_100 value: 35.978 - type: mrr_at_1000 value: 36.042 - type: mrr_at_3 value: 33.103 - type: mrr_at_5 value: 34.001 - type: ndcg_at_1 value: 28.066999999999997 - type: ndcg_at_10 value: 36.481 - type: ndcg_at_100 value: 42.022999999999996 - type: ndcg_at_1000 value: 44.377 - type: ndcg_at_3 value: 32.394 - type: ndcg_at_5 value: 34.108 - type: precision_at_1 value: 28.066999999999997 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 13.804 - type: precision_at_5 value: 9.508999999999999 - type: recall_at_1 value: 25.006 - type: recall_at_10 value: 46.972 - type: recall_at_100 value: 72.138 - type: recall_at_1000 value: 89.479 - type: recall_at_3 value: 35.793 - type: recall_at_5 value: 39.947 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.07 - type: map_at_10 value: 24.447 - type: map_at_100 value: 25.685999999999996 - type: map_at_1000 value: 25.813999999999997 - type: map_at_3 value: 21.634 - type: map_at_5 value: 23.133 - type: mrr_at_1 value: 19.580000000000002 - type: mrr_at_10 value: 28.127999999999997 - type: mrr_at_100 value: 29.119 - type: mrr_at_1000 value: 29.192 - type: mrr_at_3 value: 25.509999999999998 - type: mrr_at_5 value: 26.878 - type: ndcg_at_1 value: 19.580000000000002 - type: ndcg_at_10 value: 29.804000000000002 - type: ndcg_at_100 value: 35.555 - type: ndcg_at_1000 value: 38.421 - type: ndcg_at_3 value: 24.654999999999998 - type: ndcg_at_5 value: 26.881 - type: precision_at_1 value: 19.580000000000002 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 12.033000000000001 - type: precision_at_5 value: 8.871 - type: recall_at_1 value: 16.07 - type: recall_at_10 value: 42.364000000000004 - type: recall_at_100 value: 68.01899999999999 - type: recall_at_1000 value: 88.122 - type: recall_at_3 value: 27.846 - type: recall_at_5 value: 33.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 26.365 - type: map_at_10 value: 36.591 - type: map_at_100 value: 37.730000000000004 - type: map_at_1000 value: 37.84 - type: map_at_3 value: 33.403 - type: map_at_5 value: 35.272999999999996 - type: mrr_at_1 value: 30.503999999999998 - type: mrr_at_10 value: 39.940999999999995 - type: mrr_at_100 value: 40.818 - type: mrr_at_1000 value: 40.876000000000005 - type: mrr_at_3 value: 37.065 - type: mrr_at_5 value: 38.814 - type: ndcg_at_1 value: 30.503999999999998 - type: ndcg_at_10 value: 42.185 - type: ndcg_at_100 value: 47.416000000000004 - type: ndcg_at_1000 value: 49.705 - type: ndcg_at_3 value: 36.568 - type: ndcg_at_5 value: 39.416000000000004 - type: precision_at_1 value: 30.503999999999998 - type: precision_at_10 value: 7.276000000000001 - type: precision_at_100 value: 1.118 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 26.365 - type: recall_at_10 value: 55.616 - type: recall_at_100 value: 78.129 - type: recall_at_1000 value: 93.95599999999999 - type: recall_at_3 value: 40.686 - type: recall_at_5 value: 47.668 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.750999999999998 - type: map_at_10 value: 33.446 - type: map_at_100 value: 35.235 - type: map_at_1000 value: 35.478 - type: map_at_3 value: 29.358 - type: map_at_5 value: 31.525 - type: mrr_at_1 value: 27.668 - type: mrr_at_10 value: 37.694 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.779 - type: mrr_at_3 value: 34.223 - type: mrr_at_5 value: 36.08 - type: ndcg_at_1 value: 27.668 - type: ndcg_at_10 value: 40.557 - type: ndcg_at_100 value: 46.605999999999995 - type: ndcg_at_1000 value: 48.917 - type: ndcg_at_3 value: 33.677 - type: ndcg_at_5 value: 36.85 - type: precision_at_1 value: 27.668 - type: precision_at_10 value: 8.3 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.253 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 12.292 - type: recall_at_1 value: 22.750999999999998 - type: recall_at_10 value: 55.643 - type: recall_at_100 value: 82.151 - type: recall_at_1000 value: 95.963 - type: recall_at_3 value: 36.623 - type: recall_at_5 value: 44.708 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.288999999999998 - type: map_at_10 value: 25.903 - type: map_at_100 value: 27.071 - type: map_at_1000 value: 27.173000000000002 - type: map_at_3 value: 22.935 - type: map_at_5 value: 24.573 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.682000000000002 - type: mrr_at_100 value: 28.691 - type: mrr_at_1000 value: 28.761 - type: mrr_at_3 value: 24.738 - type: mrr_at_5 value: 26.392 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.335 - type: ndcg_at_100 value: 36.913000000000004 - type: ndcg_at_1000 value: 39.300000000000004 - type: ndcg_at_3 value: 25.423000000000002 - type: ndcg_at_5 value: 28.262999999999998 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.379 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 11.214 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.288999999999998 - type: recall_at_10 value: 46.377 - type: recall_at_100 value: 71.53500000000001 - type: recall_at_1000 value: 88.947 - type: recall_at_3 value: 30.581999999999997 - type: recall_at_5 value: 37.354 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 21.795 - type: map_at_10 value: 37.614999999999995 - type: map_at_100 value: 40.037 - type: map_at_1000 value: 40.184999999999995 - type: map_at_3 value: 32.221 - type: map_at_5 value: 35.154999999999994 - type: mrr_at_1 value: 50.358000000000004 - type: mrr_at_10 value: 62.129 - type: mrr_at_100 value: 62.613 - type: mrr_at_1000 value: 62.62 - type: mrr_at_3 value: 59.272999999999996 - type: mrr_at_5 value: 61.138999999999996 - type: ndcg_at_1 value: 50.358000000000004 - type: ndcg_at_10 value: 48.362 - type: ndcg_at_100 value: 55.932 - type: ndcg_at_1000 value: 58.062999999999995 - type: ndcg_at_3 value: 42.111 - type: ndcg_at_5 value: 44.063 - type: precision_at_1 value: 50.358000000000004 - type: precision_at_10 value: 14.677999999999999 - type: precision_at_100 value: 2.2950000000000004 - type: precision_at_1000 value: 0.271 - type: precision_at_3 value: 31.77 - type: precision_at_5 value: 23.375 - type: recall_at_1 value: 21.795 - type: recall_at_10 value: 53.846000000000004 - type: recall_at_100 value: 78.952 - type: recall_at_1000 value: 90.41900000000001 - type: recall_at_3 value: 37.257 - type: recall_at_5 value: 44.661 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.728 - type: map_at_10 value: 22.691 - type: map_at_100 value: 31.734 - type: map_at_1000 value: 33.464 - type: map_at_3 value: 16.273 - type: map_at_5 value: 19.016 - type: mrr_at_1 value: 73.25 - type: mrr_at_10 value: 80.782 - type: mrr_at_100 value: 81.01899999999999 - type: mrr_at_1000 value: 81.021 - type: mrr_at_3 value: 79.583 - type: mrr_at_5 value: 80.146 - type: ndcg_at_1 value: 59.62499999999999 - type: ndcg_at_10 value: 46.304 - type: ndcg_at_100 value: 51.23 - type: ndcg_at_1000 value: 58.048 - type: ndcg_at_3 value: 51.541000000000004 - type: ndcg_at_5 value: 48.635 - type: precision_at_1 value: 73.25 - type: precision_at_10 value: 36.375 - type: precision_at_100 value: 11.53 - type: precision_at_1000 value: 2.23 - type: precision_at_3 value: 55.583000000000006 - type: precision_at_5 value: 47.15 - type: recall_at_1 value: 9.728 - type: recall_at_10 value: 28.793999999999997 - type: recall_at_100 value: 57.885 - type: recall_at_1000 value: 78.759 - type: recall_at_3 value: 17.79 - type: recall_at_5 value: 21.733 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.775 - type: f1 value: 41.89794273264891 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 85.378 - type: map_at_10 value: 91.51 - type: map_at_100 value: 91.666 - type: map_at_1000 value: 91.676 - type: map_at_3 value: 90.757 - type: map_at_5 value: 91.277 - type: mrr_at_1 value: 91.839 - type: mrr_at_10 value: 95.49 - type: mrr_at_100 value: 95.493 - type: mrr_at_1000 value: 95.493 - type: mrr_at_3 value: 95.345 - type: mrr_at_5 value: 95.47200000000001 - type: ndcg_at_1 value: 91.839 - type: ndcg_at_10 value: 93.806 - type: ndcg_at_100 value: 94.255 - type: ndcg_at_1000 value: 94.399 - type: ndcg_at_3 value: 93.027 - type: ndcg_at_5 value: 93.51 - type: precision_at_1 value: 91.839 - type: precision_at_10 value: 10.93 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 34.873 - type: precision_at_5 value: 21.44 - type: recall_at_1 value: 85.378 - type: recall_at_10 value: 96.814 - type: recall_at_100 value: 98.386 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 94.643 - type: recall_at_5 value: 95.976 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 32.190000000000005 - type: map_at_10 value: 53.605000000000004 - type: map_at_100 value: 55.550999999999995 - type: map_at_1000 value: 55.665 - type: map_at_3 value: 46.62 - type: map_at_5 value: 50.517999999999994 - type: mrr_at_1 value: 60.34 - type: mrr_at_10 value: 70.775 - type: mrr_at_100 value: 71.238 - type: mrr_at_1000 value: 71.244 - type: mrr_at_3 value: 68.72399999999999 - type: mrr_at_5 value: 69.959 - type: ndcg_at_1 value: 60.34 - type: ndcg_at_10 value: 63.226000000000006 - type: ndcg_at_100 value: 68.60300000000001 - type: ndcg_at_1000 value: 69.901 - type: ndcg_at_3 value: 58.048 - type: ndcg_at_5 value: 59.789 - type: precision_at_1 value: 60.34 - type: precision_at_10 value: 17.130000000000003 - type: precision_at_100 value: 2.29 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 38.323 - type: precision_at_5 value: 27.87 - type: recall_at_1 value: 32.190000000000005 - type: recall_at_10 value: 73.041 - type: recall_at_100 value: 91.31 - type: recall_at_1000 value: 98.104 - type: recall_at_3 value: 53.70399999999999 - type: recall_at_5 value: 62.358999999999995 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 43.511 - type: map_at_10 value: 58.15 - type: map_at_100 value: 58.95399999999999 - type: map_at_1000 value: 59.018 - type: map_at_3 value: 55.31700000000001 - type: map_at_5 value: 57.04900000000001 - type: mrr_at_1 value: 87.022 - type: mrr_at_10 value: 91.32000000000001 - type: mrr_at_100 value: 91.401 - type: mrr_at_1000 value: 91.403 - type: mrr_at_3 value: 90.77 - type: mrr_at_5 value: 91.156 - type: ndcg_at_1 value: 87.022 - type: ndcg_at_10 value: 68.183 - type: ndcg_at_100 value: 70.781 - type: ndcg_at_1000 value: 72.009 - type: ndcg_at_3 value: 64.334 - type: ndcg_at_5 value: 66.449 - type: precision_at_1 value: 87.022 - type: precision_at_10 value: 13.406 - type: precision_at_100 value: 1.542 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 39.023 - type: precision_at_5 value: 25.080000000000002 - type: recall_at_1 value: 43.511 - type: recall_at_10 value: 67.02900000000001 - type: recall_at_100 value: 77.11 - type: recall_at_1000 value: 85.294 - type: recall_at_3 value: 58.535000000000004 - type: recall_at_5 value: 62.70099999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.0996 - type: ap value: 87.86206089096373 - type: f1 value: 92.07554547510763 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.179 - type: map_at_10 value: 35.86 - type: map_at_100 value: 37.025999999999996 - type: map_at_1000 value: 37.068 - type: map_at_3 value: 31.921 - type: map_at_5 value: 34.172000000000004 - type: mrr_at_1 value: 23.926 - type: mrr_at_10 value: 36.525999999999996 - type: mrr_at_100 value: 37.627 - type: mrr_at_1000 value: 37.665 - type: mrr_at_3 value: 32.653 - type: mrr_at_5 value: 34.897 - type: ndcg_at_1 value: 23.910999999999998 - type: ndcg_at_10 value: 42.927 - type: ndcg_at_100 value: 48.464 - type: ndcg_at_1000 value: 49.533 - type: ndcg_at_3 value: 34.910000000000004 - type: ndcg_at_5 value: 38.937 - type: precision_at_1 value: 23.910999999999998 - type: precision_at_10 value: 6.758 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.838000000000001 - type: precision_at_5 value: 10.934000000000001 - type: recall_at_1 value: 23.179 - type: recall_at_10 value: 64.622 - type: recall_at_100 value: 90.135 - type: recall_at_1000 value: 98.301 - type: recall_at_3 value: 42.836999999999996 - type: recall_at_5 value: 52.512 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.59598723210215 - type: f1 value: 96.41913500001952 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.89557683538533 - type: f1 value: 63.379319722356264 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.93745796906524 - type: f1 value: 75.71616541785902 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.41223940820443 - type: f1 value: 81.2877893719078 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 35.03682528325662 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.942529406124 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.459949660460317 - type: mrr value: 32.70509582031616 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.497 - type: map_at_10 value: 13.843 - type: map_at_100 value: 17.713 - type: map_at_1000 value: 19.241 - type: map_at_3 value: 10.096 - type: map_at_5 value: 11.85 - type: mrr_at_1 value: 48.916 - type: mrr_at_10 value: 57.764 - type: mrr_at_100 value: 58.251 - type: mrr_at_1000 value: 58.282999999999994 - type: mrr_at_3 value: 55.623999999999995 - type: mrr_at_5 value: 57.018 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 36.945 - type: ndcg_at_100 value: 34.06 - type: ndcg_at_1000 value: 43.05 - type: ndcg_at_3 value: 41.738 - type: ndcg_at_5 value: 39.330999999999996 - type: precision_at_1 value: 48.916 - type: precision_at_10 value: 27.43 - type: precision_at_100 value: 8.616 - type: precision_at_1000 value: 2.155 - type: precision_at_3 value: 39.112 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.497 - type: recall_at_10 value: 18.163 - type: recall_at_100 value: 34.566 - type: recall_at_1000 value: 67.15 - type: recall_at_3 value: 11.100999999999999 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 31.916 - type: map_at_10 value: 48.123 - type: map_at_100 value: 49.103 - type: map_at_1000 value: 49.131 - type: map_at_3 value: 43.711 - type: map_at_5 value: 46.323 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 50.617999999999995 - type: mrr_at_100 value: 51.329 - type: mrr_at_1000 value: 51.348000000000006 - type: mrr_at_3 value: 47.010999999999996 - type: mrr_at_5 value: 49.175000000000004 - type: ndcg_at_1 value: 36.181999999999995 - type: ndcg_at_10 value: 56.077999999999996 - type: ndcg_at_100 value: 60.037 - type: ndcg_at_1000 value: 60.63499999999999 - type: ndcg_at_3 value: 47.859 - type: ndcg_at_5 value: 52.178999999999995 - type: precision_at_1 value: 36.181999999999995 - type: precision_at_10 value: 9.284 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 22.006999999999998 - type: precision_at_5 value: 15.695 - type: recall_at_1 value: 31.916 - type: recall_at_10 value: 77.771 - type: recall_at_100 value: 94.602 - type: recall_at_1000 value: 98.967 - type: recall_at_3 value: 56.528 - type: recall_at_5 value: 66.527 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.486 - type: map_at_10 value: 85.978 - type: map_at_100 value: 86.587 - type: map_at_1000 value: 86.598 - type: map_at_3 value: 83.04899999999999 - type: map_at_5 value: 84.857 - type: mrr_at_1 value: 82.32000000000001 - type: mrr_at_10 value: 88.64 - type: mrr_at_100 value: 88.702 - type: mrr_at_1000 value: 88.702 - type: mrr_at_3 value: 87.735 - type: mrr_at_5 value: 88.36 - type: ndcg_at_1 value: 82.34 - type: ndcg_at_10 value: 89.67 - type: ndcg_at_100 value: 90.642 - type: ndcg_at_1000 value: 90.688 - type: ndcg_at_3 value: 86.932 - type: ndcg_at_5 value: 88.408 - type: precision_at_1 value: 82.34 - type: precision_at_10 value: 13.675999999999998 - type: precision_at_100 value: 1.544 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.24 - type: precision_at_5 value: 25.068 - type: recall_at_1 value: 71.486 - type: recall_at_10 value: 96.844 - type: recall_at_100 value: 99.843 - type: recall_at_1000 value: 99.996 - type: recall_at_3 value: 88.92099999999999 - type: recall_at_5 value: 93.215 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.75758437908334 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 68.03497914092789 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.808 - type: map_at_10 value: 16.059 - type: map_at_100 value: 19.048000000000002 - type: map_at_1000 value: 19.43 - type: map_at_3 value: 10.953 - type: map_at_5 value: 13.363 - type: mrr_at_1 value: 28.7 - type: mrr_at_10 value: 42.436 - type: mrr_at_100 value: 43.599 - type: mrr_at_1000 value: 43.62 - type: mrr_at_3 value: 38.45 - type: mrr_at_5 value: 40.89 - type: ndcg_at_1 value: 28.7 - type: ndcg_at_10 value: 26.346000000000004 - type: ndcg_at_100 value: 36.758 - type: ndcg_at_1000 value: 42.113 - type: ndcg_at_3 value: 24.254 - type: ndcg_at_5 value: 21.506 - type: precision_at_1 value: 28.7 - type: precision_at_10 value: 13.969999999999999 - type: precision_at_100 value: 2.881 - type: precision_at_1000 value: 0.414 - type: precision_at_3 value: 22.933 - type: precision_at_5 value: 19.220000000000002 - type: recall_at_1 value: 5.808 - type: recall_at_10 value: 28.310000000000002 - type: recall_at_100 value: 58.475 - type: recall_at_1000 value: 84.072 - type: recall_at_3 value: 13.957 - type: recall_at_5 value: 19.515 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.39274129958557 - type: cos_sim_spearman value: 79.78021235170053 - type: euclidean_pearson value: 79.35335401300166 - type: euclidean_spearman value: 79.7271870968275 - type: manhattan_pearson value: 79.35256263340601 - type: manhattan_spearman value: 79.76036386976321 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.99130429246708 - type: cos_sim_spearman value: 73.88322811171203 - type: euclidean_pearson value: 80.7569419170376 - type: euclidean_spearman value: 73.82542155409597 - type: manhattan_pearson value: 80.79468183847625 - type: manhattan_spearman value: 73.87027144047784 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.88548789489907 - type: cos_sim_spearman value: 85.07535893847255 - type: euclidean_pearson value: 84.6637222061494 - type: euclidean_spearman value: 85.14200626702456 - type: manhattan_pearson value: 84.75327892344734 - type: manhattan_spearman value: 85.24406181838596 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.88140039325008 - type: cos_sim_spearman value: 79.61211268112362 - type: euclidean_pearson value: 81.29639728816458 - type: euclidean_spearman value: 79.51284578041442 - type: manhattan_pearson value: 81.3381797137111 - type: manhattan_spearman value: 79.55683684039808 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.16716737270485 - type: cos_sim_spearman value: 86.14823841857738 - type: euclidean_pearson value: 85.36325733440725 - type: euclidean_spearman value: 86.04919691402029 - type: manhattan_pearson value: 85.3147511385052 - type: manhattan_spearman value: 86.00676205857764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.34266645861588 - type: cos_sim_spearman value: 81.59914035005882 - type: euclidean_pearson value: 81.15053076245988 - type: euclidean_spearman value: 81.52776915798489 - type: manhattan_pearson value: 81.1819647418673 - type: manhattan_spearman value: 81.57479527353556 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.38263326821439 - type: cos_sim_spearman value: 89.10946308202642 - type: euclidean_pearson value: 88.87831312540068 - type: euclidean_spearman value: 89.03615865973664 - type: manhattan_pearson value: 88.79835539970384 - type: manhattan_spearman value: 88.9766156339753 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 70.1574915581685 - type: cos_sim_spearman value: 70.59144980004054 - type: euclidean_pearson value: 71.43246306918755 - type: euclidean_spearman value: 70.5544189562984 - type: manhattan_pearson value: 71.4071414609503 - type: manhattan_spearman value: 70.31799126163712 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.36215796635351 - type: cos_sim_spearman value: 83.07276756467208 - type: euclidean_pearson value: 83.06690453635584 - type: euclidean_spearman value: 82.9635366303289 - type: manhattan_pearson value: 83.04994049700815 - type: manhattan_spearman value: 82.98120125356036 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.92530011616722 - type: mrr value: 96.21826793395421 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 65.75 - type: map_at_10 value: 77.701 - type: map_at_100 value: 78.005 - type: map_at_1000 value: 78.006 - type: map_at_3 value: 75.48 - type: map_at_5 value: 76.927 - type: mrr_at_1 value: 68.333 - type: mrr_at_10 value: 78.511 - type: mrr_at_100 value: 78.704 - type: mrr_at_1000 value: 78.704 - type: mrr_at_3 value: 77 - type: mrr_at_5 value: 78.083 - type: ndcg_at_1 value: 68.333 - type: ndcg_at_10 value: 82.42699999999999 - type: ndcg_at_100 value: 83.486 - type: ndcg_at_1000 value: 83.511 - type: ndcg_at_3 value: 78.96300000000001 - type: ndcg_at_5 value: 81.028 - type: precision_at_1 value: 68.333 - type: precision_at_10 value: 10.667 - type: precision_at_100 value: 1.127 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 20.133000000000003 - type: recall_at_1 value: 65.75 - type: recall_at_10 value: 95.578 - type: recall_at_100 value: 99.833 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 86.506 - type: recall_at_5 value: 91.75 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.75247524752476 - type: cos_sim_ap value: 94.16065078045173 - type: cos_sim_f1 value: 87.22986247544205 - type: cos_sim_precision value: 85.71428571428571 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.74554455445545 - type: dot_ap value: 93.90633887037264 - type: dot_f1 value: 86.9873417721519 - type: dot_precision value: 88.1025641025641 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.75247524752476 - type: euclidean_ap value: 94.17466319018055 - type: euclidean_f1 value: 87.3405299313052 - type: euclidean_precision value: 85.74181117533719 - type: euclidean_recall value: 89 - type: manhattan_accuracy value: 99.75445544554455 - type: manhattan_ap value: 94.27688371923577 - type: manhattan_f1 value: 87.74002954209749 - type: manhattan_precision value: 86.42095053346266 - type: manhattan_recall value: 89.1 - type: max_accuracy value: 99.75445544554455 - type: max_ap value: 94.27688371923577 - type: max_f1 value: 87.74002954209749 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 71.26500637517056 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 39.17507906280528 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.4848744828509 - type: mrr value: 53.33678168236992 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.599864323827887 - type: cos_sim_spearman value: 30.91116204665598 - type: dot_pearson value: 30.82637894269936 - type: dot_spearman value: 30.957573868416066 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.23600000000000002 - type: map_at_10 value: 1.892 - type: map_at_100 value: 11.586 - type: map_at_1000 value: 27.761999999999997 - type: map_at_3 value: 0.653 - type: map_at_5 value: 1.028 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 94 - type: mrr_at_100 value: 94 - type: mrr_at_1000 value: 94 - type: mrr_at_3 value: 94 - type: mrr_at_5 value: 94 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 77.48899999999999 - type: ndcg_at_100 value: 60.141 - type: ndcg_at_1000 value: 54.228 - type: ndcg_at_3 value: 82.358 - type: ndcg_at_5 value: 80.449 - type: precision_at_1 value: 88 - type: precision_at_10 value: 82.19999999999999 - type: precision_at_100 value: 61.760000000000005 - type: precision_at_1000 value: 23.684 - type: precision_at_3 value: 88 - type: precision_at_5 value: 85.6 - type: recall_at_1 value: 0.23600000000000002 - type: recall_at_10 value: 2.117 - type: recall_at_100 value: 14.985000000000001 - type: recall_at_1000 value: 51.107 - type: recall_at_3 value: 0.688 - type: recall_at_5 value: 1.1039999999999999 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 2.3040000000000003 - type: map_at_10 value: 9.025 - type: map_at_100 value: 15.312999999999999 - type: map_at_1000 value: 16.954 - type: map_at_3 value: 4.981 - type: map_at_5 value: 6.32 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 39.835 - type: mrr_at_100 value: 40.8 - type: mrr_at_1000 value: 40.8 - type: mrr_at_3 value: 35.034 - type: mrr_at_5 value: 37.687 - type: ndcg_at_1 value: 22.448999999999998 - type: ndcg_at_10 value: 22.545 - type: ndcg_at_100 value: 35.931999999999995 - type: ndcg_at_1000 value: 47.665 - type: ndcg_at_3 value: 23.311 - type: ndcg_at_5 value: 22.421 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 20.408 - type: precision_at_100 value: 7.815999999999999 - type: precision_at_1000 value: 1.553 - type: precision_at_3 value: 25.169999999999998 - type: precision_at_5 value: 23.265 - type: recall_at_1 value: 2.3040000000000003 - type: recall_at_10 value: 15.693999999999999 - type: recall_at_100 value: 48.917 - type: recall_at_1000 value: 84.964 - type: recall_at_3 value: 6.026 - type: recall_at_5 value: 9.066 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 82.6074 - type: ap value: 23.187467098602013 - type: f1 value: 65.36829506379657 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 63.16355404640635 - type: f1 value: 63.534725639863346 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.91004094411276 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.55301901412649 - type: cos_sim_ap value: 75.25312618556728 - type: cos_sim_f1 value: 68.76561719140429 - type: cos_sim_precision value: 65.3061224489796 - type: cos_sim_recall value: 72.61213720316623 - type: dot_accuracy value: 86.29671574178936 - type: dot_ap value: 75.11910195501207 - type: dot_f1 value: 68.44048376830045 - type: dot_precision value: 66.12546125461255 - type: dot_recall value: 70.92348284960423 - type: euclidean_accuracy value: 86.5828217202122 - type: euclidean_ap value: 75.22986344900924 - type: euclidean_f1 value: 68.81267797449549 - type: euclidean_precision value: 64.8238861674831 - type: euclidean_recall value: 73.3245382585752 - type: manhattan_accuracy value: 86.61262442629791 - type: manhattan_ap value: 75.24401608557328 - type: manhattan_f1 value: 68.80473982483257 - type: manhattan_precision value: 67.21187720181177 - type: manhattan_recall value: 70.47493403693932 - type: max_accuracy value: 86.61262442629791 - type: max_ap value: 75.25312618556728 - type: max_f1 value: 68.81267797449549 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.10688089416696 - type: cos_sim_ap value: 84.17862178779863 - type: cos_sim_f1 value: 76.17305208781748 - type: cos_sim_precision value: 71.31246641590543 - type: cos_sim_recall value: 81.74468740375731 - type: dot_accuracy value: 88.1844995536927 - type: dot_ap value: 84.33816725235876 - type: dot_f1 value: 76.43554032918746 - type: dot_precision value: 74.01557767200346 - type: dot_recall value: 79.0190945488143 - type: euclidean_accuracy value: 88.07001203089223 - type: euclidean_ap value: 84.12267000814985 - type: euclidean_f1 value: 76.12232600180778 - type: euclidean_precision value: 74.50604541433205 - type: euclidean_recall value: 77.81028641823221 - type: manhattan_accuracy value: 88.06419063142779 - type: manhattan_ap value: 84.11648917164187 - type: manhattan_f1 value: 76.20579953925474 - type: manhattan_precision value: 72.56772755762935 - type: manhattan_recall value: 80.22790267939637 - type: max_accuracy value: 88.1844995536927 - type: max_ap value: 84.33816725235876 - type: max_f1 value: 76.43554032918746 --- <!-- **English** | 中文 --> # gte-large-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with sentence-transformers: Use with : ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000 - CPT: max_len 512, lr 5e-5, batch_size 28672, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 409 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
|
27 |
+
"model_explanation_gemini": "Generates sentence embeddings for English text to perform tasks like classification, retrieval, clustering, and similarity scoring."
|
28 |
+
}
|
data/model_data_json/Alibaba-NLP_gte-multilingual-base.json
ADDED
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Alibaba-NLP/gte-multilingual-base",
|
3 |
+
"downloads": 1331518,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"safetensors",
|
7 |
+
"new",
|
8 |
+
"feature-extraction",
|
9 |
+
"mteb",
|
10 |
+
"transformers",
|
11 |
+
"multilingual",
|
12 |
+
"sentence-similarity",
|
13 |
+
"custom_code",
|
14 |
+
"af",
|
15 |
+
"ar",
|
16 |
+
"az",
|
17 |
+
"be",
|
18 |
+
"bg",
|
19 |
+
"bn",
|
20 |
+
"ca",
|
21 |
+
"ceb",
|
22 |
+
"cs",
|
23 |
+
"cy",
|
24 |
+
"da",
|
25 |
+
"de",
|
26 |
+
"el",
|
27 |
+
"en",
|
28 |
+
"es",
|
29 |
+
"et",
|
30 |
+
"eu",
|
31 |
+
"fa",
|
32 |
+
"fi",
|
33 |
+
"fr",
|
34 |
+
"gl",
|
35 |
+
"gu",
|
36 |
+
"he",
|
37 |
+
"hi",
|
38 |
+
"hr",
|
39 |
+
"ht",
|
40 |
+
"hu",
|
41 |
+
"hy",
|
42 |
+
"id",
|
43 |
+
"is",
|
44 |
+
"it",
|
45 |
+
"ja",
|
46 |
+
"jv",
|
47 |
+
"ka",
|
48 |
+
"kk",
|
49 |
+
"km",
|
50 |
+
"kn",
|
51 |
+
"ko",
|
52 |
+
"ky",
|
53 |
+
"lo",
|
54 |
+
"lt",
|
55 |
+
"lv",
|
56 |
+
"mk",
|
57 |
+
"ml",
|
58 |
+
"mn",
|
59 |
+
"mr",
|
60 |
+
"ms",
|
61 |
+
"my",
|
62 |
+
"ne",
|
63 |
+
"nl",
|
64 |
+
"no",
|
65 |
+
"pa",
|
66 |
+
"pl",
|
67 |
+
"pt",
|
68 |
+
"qu",
|
69 |
+
"ro",
|
70 |
+
"ru",
|
71 |
+
"si",
|
72 |
+
"sk",
|
73 |
+
"sl",
|
74 |
+
"so",
|
75 |
+
"sq",
|
76 |
+
"sr",
|
77 |
+
"sv",
|
78 |
+
"sw",
|
79 |
+
"ta",
|
80 |
+
"te",
|
81 |
+
"th",
|
82 |
+
"tl",
|
83 |
+
"tr",
|
84 |
+
"uk",
|
85 |
+
"ur",
|
86 |
+
"vi",
|
87 |
+
"yo",
|
88 |
+
"zh",
|
89 |
+
"arxiv:2407.19669",
|
90 |
+
"arxiv:2210.09984",
|
91 |
+
"arxiv:2402.03216",
|
92 |
+
"arxiv:2007.15207",
|
93 |
+
"arxiv:2104.08663",
|
94 |
+
"arxiv:2402.07440",
|
95 |
+
"license:apache-2.0",
|
96 |
+
"model-index",
|
97 |
+
"autotrain_compatible",
|
98 |
+
"text-embeddings-inference",
|
99 |
+
"endpoints_compatible",
|
100 |
+
"region:us"
|
101 |
+
],
|
102 |
+
"description": "--- tags: - mteb - sentence-transformers - transformers - multilingual - sentence-similarity license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh model-index: - name: gte-multilingual-base (dense) results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 33.66681726329994 - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_spearman value: 43.54760696384009 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_spearman value: 48.91186363417501 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 41.689860834990064 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 54.20241337977897 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 44.34083695608643 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: 666fdacebe0291776e86f29345663dfaf80a0db9 metrics: - type: map value: 64.91495250072002 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: ndcg_at_10 value: 53.638 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.95522388059702 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 80.717625 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.64199999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.108 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.169999999999995 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 39.56799999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 35.75000000000001 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.342000000000006 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: ndcg_at_10 value: 58.231 - task: type: Retrieval dataset: type: clarin-knext/arguana-pl name: MTEB ArguAna-PL config: default split: test revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 metrics: - type: ndcg_at_10 value: 53.166000000000004 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.01900557959478 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 41.06626465345723 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.87514497610431 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 81.21450112991194 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_spearman value: 51.71589543397271 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: ndcg_at_10 value: 26.115 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.6169102296451 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.89603052314916 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.12388869645537 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.15692469720906 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.36038961038962 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.5903826674123 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.21474277151329 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 62.519999999999996 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_ap value: 74.90132799162956 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_spearman value: 90.30727955142524 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 37.94850105022274 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 38.11958675421534 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.10950950485399 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.28038294231966 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: ndcg_at_10 value: 47.099000000000004 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: ndcg_at_10 value: 45.973000000000006 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: ndcg_at_10 value: 55.606 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: ndcg_at_10 value: 36.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: ndcg_at_10 value: 30.711 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: ndcg_at_10 value: 44.523 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: ndcg_at_10 value: 37.940000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 38.12183333333333 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: ndcg_at_10 value: 32.684000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: ndcg_at_10 value: 26.735 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: ndcg_at_10 value: 36.933 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: ndcg_at_10 value: 33.747 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 28.872999999999998 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: ndcg_at_10 value: 34.833 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: ndcg_at_10 value: 43.78 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_ap value: 84.00640599186677 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: ndcg_at_10 value: 80.60000000000001 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: ndcg_at_10 value: 40.116 - task: type: Retrieval dataset: type: clarin-knext/dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: 76afe41d9af165cc40999fcaa92312b8b012064a metrics: - type: ndcg_at_10 value: 32.498 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: ndcg_at_10 value: 87.547 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: ndcg_at_10 value: 64.85 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.949999999999996 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: ndcg_at_10 value: 92.111 - task: type: Retrieval dataset: type: clarin-knext/fiqa-pl name: MTEB FiQA-PL config: default split: test revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e metrics: - type: ndcg_at_10 value: 28.962 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: ndcg_at_10 value: 45.005 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 25.133776435657595 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: ndcg_at_10 value: 63.036 - task: type: Retrieval dataset: type: clarin-knext/hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 metrics: - type: ndcg_at_10 value: 56.904999999999994 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 44.59407464409388 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 74.912 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 79.26829268292683 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_spearman value: 74.8601229809791 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 42.331902754246556 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 40.92029335502153 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 32.19266316591337 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: ndcg_at_10 value: 79.346 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: ndcg_at_10 value: 39.922999999999995 - task: type: Retrieval dataset: type: clarin-knext/msmarco-pl name: MTEB MSMARCO-PL config: default split: test revision: 8634c07806d5cce3a6138e260e59b81760a0a640 metrics: - type: ndcg_at_10 value: 55.620999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.53989968080255 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.26993519301212 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.87725150100067 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.48512370811149 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.45141627823591 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 83.45750452079565 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.57637938896488 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.50803043110736 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.6577718478986 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 64.05887879736925 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 65.27070634636071 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.04520795660037 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 80.66350710900474 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 44.016506455899425 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 40.67730129573544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.94552790854068 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 49.273705447209146 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.490921318090116 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.97511768661733 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.5689307330195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.34902488231337 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.6684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.54539340954942 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.08675184936112 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.12508406186953 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.41425689307331 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.59515803631474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.90517821116342 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.91526563550774 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.198386012104905 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.04371217215869 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.31203765971756 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.521183591123055 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.06254203093476 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.01546738399461 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.27975790181574 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.79556153328849 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.18493611297915 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.888365837256224 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.79690652320108 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.225958305312716 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.58641560188299 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.08204438466711 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.54606590450572 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.443174176193665 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.45662407531944 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.739071956960316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.36180228648286 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.3920645595158 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.06993947545395 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.123739071956955 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.46133154001346 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.54472091459314 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.204438466711494 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.69603227975792 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.523873570948226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.53396099529253 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.88298587760591 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.65097511768662 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.8453261600538 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.6247478143914 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.16274377942166 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.61667787491594 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.17283120376598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.89912575655683 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.27975790181573 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.269670477471415 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.10423671822461 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.40753194351043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.369872225958304 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.60726294552792 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.30262273032952 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.52925353059851 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.28446536650976 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.45460659045058 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.26563550773368 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.20578345662408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.64963012777405 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.698049764626774 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.14458641560188 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.51445864156018 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.13786146603901 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.61533288500337 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.526563550773375 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.99731002017484 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.59381304640216 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.010759919300604 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 53.26160053799597 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.800941492938804 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.387357094821795 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.5359784801614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.36919973100203 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.81506388702084 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.35104236718225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.67787491593813 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.4250168123739 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.49630127774043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.95696032279758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.11768661735036 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.86953597848016 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.51042367182247 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.81573638197713 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.26227303295225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.51513113651646 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.29858776059179 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.72696704774714 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.57700067249496 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.22797579018157 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.97041022192333 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.72629455279085 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.16072629455278 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.92199058507062 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.40484196368527 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.61398789509079 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: ndcg_at_10 value: 61.934999999999995 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.052031054565205 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.969909524076794 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.7530992892652 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: ndcg_at_10 value: 34.705999999999996 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ar) config: ar split: test revision: None metrics: - type: ndcg_at_10 value: 55.166000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (de) config: de split: test revision: None metrics: - type: ndcg_at_10 value: 55.155 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (en) config: en split: test revision: None metrics: - type: ndcg_at_10 value: 50.993 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (es) config: es split: test revision: None metrics: - type: ndcg_at_10 value: 81.228 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (fr) config: fr split: test revision: None metrics: - type: ndcg_at_10 value: 76.19 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (hi) config: hi split: test revision: None metrics: - type: ndcg_at_10 value: 45.206 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (it) config: it split: test revision: None metrics: - type: ndcg_at_10 value: 66.741 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ja) config: ja split: test revision: None metrics: - type: ndcg_at_10 value: 52.111 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ko) config: ko split: test revision: None metrics: - type: ndcg_at_10 value: 46.733000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (pt) config: pt split: test revision: None metrics: - type: ndcg_at_10 value: 79.105 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ru) config: ru split: test revision: None metrics: - type: ndcg_at_10 value: 64.21 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (th) config: th split: test revision: None metrics: - type: ndcg_at_10 value: 35.467 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (zh) config: zh split: test revision: None metrics: - type: ndcg_at_10 value: 27.419 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 61.02000000000001 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: ndcg_at_10 value: 36.65 - task: type: Retrieval dataset: type: clarin-knext/nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 metrics: - type: ndcg_at_10 value: 26.831 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: ndcg_at_10 value: 58.111000000000004 - task: type: Retrieval dataset: type: clarin-knext/nq-pl name: MTEB NQ-PL config: default split: test revision: f171245712cf85dd4700b06bef18001578d0ca8d metrics: - type: ndcg_at_10 value: 43.126999999999995 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_ap value: 72.67630697316041 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 84.85000000000001 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_ap value: 100 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 65.99189110918043 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_spearman value: 16.124364530596228 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_ap value: 92.43431057460192 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_ap value: 99.06090138049724 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_ap value: 58.9314954874314 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 69.59833795013851 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 44.73684210526315 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_spearman value: 39.36450754137984 - task: type: Retrieval dataset: type: clarin-knext/quora-pl name: MTEB Quora-PL config: default split: test revision: 0be27e93455051e531182b85e85e425aba12e9d4 metrics: - type: ndcg_at_10 value: 80.76299999999999 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: ndcg_at_10 value: 88.022 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.719165988934385 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.25390069273025 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: ndcg_at_10 value: 18.243000000000002 - task: type: Retrieval dataset: type: clarin-knext/scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: 45452b03f05560207ef19149545f168e596c9337 metrics: - type: ndcg_at_10 value: 14.219000000000001 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_ap value: 75.4022630307816 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 79.34269390198548 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_spearman value: 74.0651660446132 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_spearman value: 78.62693119733123 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 77.50660544631359 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 85.55415077723738 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.67550814479077 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 88.94601412322764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 84.33844259337481 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 81.58650681159105 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 78.82472265884256 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.43637938260397 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.71008299464059 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 88.88074713413747 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.36405640457285 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.84737910084762 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 87.03931621433031 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.43335591752246 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.85268648747021 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 82.45786516224341 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 67.20227303970304 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 60.892838305537126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.01876318464508 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 42.3879320510127 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 65.54048784845729 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 58.55244068334867 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.48710288440624 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.585754901838 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 81.03001290557805 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 62.28001859884359 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 79.64106342105019 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.27915339361124 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.28574268257462 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.92658860751482 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 74.83418886368217 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 56.01064022625769 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 53.64332829635126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 73.24670207647144 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_spearman value: 80.7157790971544 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.45763616928973 - task: type: STS dataset: type: stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_spearman value: 84.4335500335282 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.15276484499303 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: ndcg_at_10 value: 73.433 - task: type: Retrieval dataset: type: clarin-knext/scifact-pl name: MTEB SciFact-PL config: default split: test revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e metrics: - type: ndcg_at_10 value: 58.919999999999995 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_ap value: 95.40564890916419 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 63.41856697730145 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 31.709285904909112 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.09341030060322 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_spearman value: 30.58262517835034 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_spearman value: 29.744542072951358 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: b205c5084a0934ce8af14338bf03feb19499c84d metrics: - type: map value: 88.03333333333333 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff metrics: - type: ndcg_at_10 value: 83.043 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 67.08577894804324 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: ndcg_at_10 value: 84.718 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 48.726 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: ndcg_at_10 value: 57.56 - task: type: Retrieval dataset: type: clarin-knext/trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd metrics: - type: ndcg_at_10 value: 59.355999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.765 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.69942196531792 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.86585365853657 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.81666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.75 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.78333333333335 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.72333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.45202558635395 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.59238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 35.69686411149825 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.59333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.1456922987907 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.47462133594857 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.62965440356746 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.48412698412699 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 75.85 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 27.32600866497127 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.38 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.98888712165028 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 85.55690476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 46.68466031323174 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.73071428571428 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.26333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 96.61666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 70.03714285714285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.09 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 59.570476190476185 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.68333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 80.40880503144653 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.7008547008547 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.84833333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 71.69696969696969 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 55.76985790822269 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.66666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 68.36668519547896 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 36.73992673992674 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.420952380952365 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.95392490046146 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.58936507936508 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.563650793650794 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.35 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.43 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.73333333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.38666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.64 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 21.257184628237262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 13.592316017316017 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.22666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.711309523809526 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 24.98790634904795 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 17.19218192918193 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.26666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.57333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.35127206127206 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.12318903318903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 23.856320290390055 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.52833333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.93333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.75333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 30.802919708029197 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 15.984076294076294 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.82666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 76.36054421768706 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.232711399711398 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 45.640803181175855 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 86.29 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.90833333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 11.11880248978075 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 48.45839345839346 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 65.68157033805888 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.63852498786997 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.67904761904761 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.35969868173258 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 5.957229437229437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.50333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.75498778998778 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.99190476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.95 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.054042624042623 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 72.77064981488574 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.14 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 29.976786498525627 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.6525821596244 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 33.12964812964813 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 34.36077879427633 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.571845212690285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 58.13107263107262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.87370133925458 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 20.394327616827614 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.29967426710098 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.80666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.23062271062273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 78.08398950131233 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.85166666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.63004001231148 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.77000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.2654503616042 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 83.90333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.80666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.08 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 60.43098607367475 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.19333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.55352798053529 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.44999999999999 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 57.25416429643288 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 56.616646560243524 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: ndcg_at_10 value: 22.819 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.02579999999999 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.60045274476514 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.346666699466205 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_ap value: 71.88199004440489 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_ap value: 85.41587779677383 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: ndcg_at_10 value: 72.792 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 82.58000000000001 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: ndcg_at_10 value: 67.327 --- ## gte-multilingual-base The **gte-multilingual-base** model is the latest in the GTE (General Text Embedding) family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. - **Elastic Dense Embedding**: Support elastic output dense representation while maintaining the effectiveness of downstream tasks, which significantly reduces storage costs and improves execution efficiency. - **Sparse Vectors**: In addition to dense representations, it can also generate sparse vectors. **Paper**: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Model Information - Model Size: 305M - Embedding Dimension: 768 - Max Input Tokens: 8192 ## Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** - **How to use with TEI: refs/pr/7** ### Get Dense Embeddings with Transformers ### Use with sentence-transformers ### Use with infinity Usage via docker and infinity, MIT Licensed. ### Use with custom code to get dense embeddings and sparse token weights ## Evaluation We validated the performance of the **gte-multilingual-base** model on multiple downstream tasks, including multilingual retrieval, cross-lingual retrieval, long text retrieval, and general text representation evaluation on the MTEB Leaderboard, among others. ### Retrieval Task Retrieval results on MIRACL and MLDR (multilingual), MKQA (crosslingual), BEIR and LoCo (English). !image - Detail results on MLDR !image - Detail results on LoCo ### MTEB Results on MTEB English, Chinese, French, Polish !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
|
103 |
+
"model_explanation_gemini": "Multilingual sentence embedding model for tasks like sentence similarity, clustering, classification, and retrieval across numerous languages."
|
104 |
+
}
|
data/model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json
ADDED
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Alibaba-NLP/gte-multilingual-reranker-base",
|
3 |
+
"downloads": 228024,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"safetensors",
|
7 |
+
"new",
|
8 |
+
"text-classification",
|
9 |
+
"transformers",
|
10 |
+
"text-embeddings-inference",
|
11 |
+
"text-ranking",
|
12 |
+
"custom_code",
|
13 |
+
"af",
|
14 |
+
"ar",
|
15 |
+
"az",
|
16 |
+
"be",
|
17 |
+
"bg",
|
18 |
+
"bn",
|
19 |
+
"ca",
|
20 |
+
"ceb",
|
21 |
+
"cs",
|
22 |
+
"cy",
|
23 |
+
"da",
|
24 |
+
"de",
|
25 |
+
"el",
|
26 |
+
"en",
|
27 |
+
"es",
|
28 |
+
"et",
|
29 |
+
"eu",
|
30 |
+
"fa",
|
31 |
+
"fi",
|
32 |
+
"fr",
|
33 |
+
"gl",
|
34 |
+
"gu",
|
35 |
+
"he",
|
36 |
+
"hi",
|
37 |
+
"hr",
|
38 |
+
"ht",
|
39 |
+
"hu",
|
40 |
+
"hy",
|
41 |
+
"id",
|
42 |
+
"is",
|
43 |
+
"it",
|
44 |
+
"ja",
|
45 |
+
"jv",
|
46 |
+
"ka",
|
47 |
+
"kk",
|
48 |
+
"km",
|
49 |
+
"kn",
|
50 |
+
"ko",
|
51 |
+
"ky",
|
52 |
+
"lo",
|
53 |
+
"lt",
|
54 |
+
"lv",
|
55 |
+
"mk",
|
56 |
+
"ml",
|
57 |
+
"mn",
|
58 |
+
"mr",
|
59 |
+
"ms",
|
60 |
+
"my",
|
61 |
+
"ne",
|
62 |
+
"nl",
|
63 |
+
"no",
|
64 |
+
"pa",
|
65 |
+
"pl",
|
66 |
+
"pt",
|
67 |
+
"qu",
|
68 |
+
"ro",
|
69 |
+
"ru",
|
70 |
+
"si",
|
71 |
+
"sk",
|
72 |
+
"sl",
|
73 |
+
"so",
|
74 |
+
"sq",
|
75 |
+
"sr",
|
76 |
+
"sv",
|
77 |
+
"sw",
|
78 |
+
"ta",
|
79 |
+
"te",
|
80 |
+
"th",
|
81 |
+
"tl",
|
82 |
+
"tr",
|
83 |
+
"uk",
|
84 |
+
"ur",
|
85 |
+
"vi",
|
86 |
+
"yo",
|
87 |
+
"zh",
|
88 |
+
"arxiv:2407.19669",
|
89 |
+
"license:apache-2.0",
|
90 |
+
"region:us"
|
91 |
+
],
|
92 |
+
"description": "--- license: apache-2.0 pipeline_tag: text-ranking tags: - transformers - sentence-transformers - text-embeddings-inference language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh library_name: sentence-transformers --- ## gte-multilingual-reranker-base The **gte-multilingual-reranker-base** model is the first reranker model in the GTE family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. ## Model Information - Model Size: 306M - Max Input Tokens: 8192 ### Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** Using Huggingface transformers (transformers>=4.36.0) Usage with infinity: Infinity, a MIT Licensed Inference RestAPI Server. ## Evaluation Results of reranking based on multiple text retreival datasets !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
|
93 |
+
"model_explanation_gemini": "Multilingual text reranking model supporting 70+ languages with high performance, long-context handling (8192 tokens), and efficient inference for retrieval tasks."
|
94 |
+
}
|
data/model_data_json/AutonLab_MOMENT-1-large.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "AutonLab/MOMENT-1-large",
|
3 |
+
"downloads": 147972,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"time series",
|
9 |
+
"forecasting",
|
10 |
+
"classification",
|
11 |
+
"anomaly detection",
|
12 |
+
"imputation",
|
13 |
+
"pretrained models",
|
14 |
+
"foundation models",
|
15 |
+
"time-series",
|
16 |
+
"time-series-forecasting",
|
17 |
+
"dataset:AutonLab/Timeseries-PILE",
|
18 |
+
"arxiv:2402.03885",
|
19 |
+
"license:mit",
|
20 |
+
"endpoints_compatible",
|
21 |
+
"region:us"
|
22 |
+
],
|
23 |
+
"description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Large MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources <!-- Provide the basic links for the model. --> - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 404 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.",
|
24 |
+
"model_explanation_gemini": "A foundation model for general-purpose time-series analysis, handling tasks like forecasting, classification, anomaly detection, and imputation with out-of-the-box effectiveness and tunability."
|
25 |
+
}
|
data/model_data_json/BAAI_bge-base-en-v1.5.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-base-en-v1.5",
|
3 |
+
"downloads": 2100857,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"bert",
|
10 |
+
"feature-extraction",
|
11 |
+
"sentence-similarity",
|
12 |
+
"transformers",
|
13 |
+
"mteb",
|
14 |
+
"en",
|
15 |
+
"arxiv:2401.03462",
|
16 |
+
"arxiv:2312.15503",
|
17 |
+
"arxiv:2311.13534",
|
18 |
+
"arxiv:2310.07554",
|
19 |
+
"arxiv:2309.07597",
|
20 |
+
"license:mit",
|
21 |
+
"model-index",
|
22 |
+
"autotrain_compatible",
|
23 |
+
"text-embeddings-inference",
|
24 |
+
"endpoints_compatible",
|
25 |
+
"region:us"
|
26 |
+
],
|
27 |
+
"description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.14925373134328 - type: ap value: 39.32336517995478 - type: f1 value: 70.16902252611425 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.386825 - type: ap value: 90.21276917991995 - type: f1 value: 93.37741030006174 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.846000000000004 - type: f1 value: 48.14646269778261 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.754000000000005 - type: map_at_10 value: 55.761 - type: map_at_100 value: 56.330999999999996 - type: map_at_1000 value: 56.333999999999996 - type: map_at_3 value: 51.92 - type: map_at_5 value: 54.010999999999996 - type: mrr_at_1 value: 41.181 - type: mrr_at_10 value: 55.967999999999996 - type: mrr_at_100 value: 56.538 - type: mrr_at_1000 value: 56.542 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.208999999999996 - type: ndcg_at_1 value: 40.754000000000005 - type: ndcg_at_10 value: 63.605000000000004 - type: ndcg_at_100 value: 66.05199999999999 - type: ndcg_at_1000 value: 66.12 - type: ndcg_at_3 value: 55.708 - type: ndcg_at_5 value: 59.452000000000005 - type: precision_at_1 value: 40.754000000000005 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.149000000000001 - type: recall_at_1 value: 40.754000000000005 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 75.747 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.74884539679369 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.8075893810716 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.128470519187736 - type: mrr value: 74.28065778481289 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.24629081484655 - type: cos_sim_spearman value: 86.93752309911496 - type: euclidean_pearson value: 87.58589628573816 - type: euclidean_spearman value: 88.05622328825284 - type: manhattan_pearson value: 87.5594959805773 - type: manhattan_spearman value: 88.19658793233961 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.9512987012987 - type: f1 value: 86.92515357973708 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.10263762928872 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.69711517426737 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.327 - type: map_at_10 value: 44.099 - type: map_at_100 value: 45.525 - type: map_at_1000 value: 45.641999999999996 - type: map_at_3 value: 40.47 - type: map_at_5 value: 42.36 - type: mrr_at_1 value: 39.199 - type: mrr_at_10 value: 49.651 - type: mrr_at_100 value: 50.29 - type: mrr_at_1000 value: 50.329 - type: mrr_at_3 value: 46.924 - type: mrr_at_5 value: 48.548 - type: ndcg_at_1 value: 39.199 - type: ndcg_at_10 value: 50.773 - type: ndcg_at_100 value: 55.67999999999999 - type: ndcg_at_1000 value: 57.495 - type: ndcg_at_3 value: 45.513999999999996 - type: ndcg_at_5 value: 47.703 - type: precision_at_1 value: 39.199 - type: precision_at_10 value: 9.914000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.984 - type: precision_at_5 value: 15.737000000000002 - type: recall_at_1 value: 32.327 - type: recall_at_10 value: 63.743 - type: recall_at_100 value: 84.538 - type: recall_at_1000 value: 96.089 - type: recall_at_3 value: 48.065000000000005 - type: recall_at_5 value: 54.519 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 42.954 - type: map_at_100 value: 44.151 - type: map_at_1000 value: 44.287 - type: map_at_3 value: 39.912 - type: map_at_5 value: 41.798 - type: mrr_at_1 value: 41.465 - type: mrr_at_10 value: 49.351 - type: mrr_at_100 value: 49.980000000000004 - type: mrr_at_1000 value: 50.016000000000005 - type: mrr_at_3 value: 47.144000000000005 - type: mrr_at_5 value: 48.592999999999996 - type: ndcg_at_1 value: 41.465 - type: ndcg_at_10 value: 48.565999999999995 - type: ndcg_at_100 value: 52.76499999999999 - type: ndcg_at_1000 value: 54.749 - type: ndcg_at_3 value: 44.57 - type: ndcg_at_5 value: 46.759 - type: precision_at_1 value: 41.465 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.433 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 21.423000000000002 - type: precision_at_5 value: 15.414 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 57.738 - type: recall_at_100 value: 75.86500000000001 - type: recall_at_1000 value: 88.36 - type: recall_at_3 value: 45.626 - type: recall_at_5 value: 51.812000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.185 - type: map_at_10 value: 53.929 - type: map_at_100 value: 54.92 - type: map_at_1000 value: 54.967999999999996 - type: map_at_3 value: 50.70400000000001 - type: map_at_5 value: 52.673 - type: mrr_at_1 value: 47.398 - type: mrr_at_10 value: 57.303000000000004 - type: mrr_at_100 value: 57.959 - type: mrr_at_1000 value: 57.985 - type: mrr_at_3 value: 54.932 - type: mrr_at_5 value: 56.464999999999996 - type: ndcg_at_1 value: 47.398 - type: ndcg_at_10 value: 59.653 - type: ndcg_at_100 value: 63.627 - type: ndcg_at_1000 value: 64.596 - type: ndcg_at_3 value: 54.455 - type: ndcg_at_5 value: 57.245000000000005 - type: precision_at_1 value: 47.398 - type: precision_at_10 value: 9.524000000000001 - type: precision_at_100 value: 1.243 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.389 - type: precision_at_5 value: 16.752 - type: recall_at_1 value: 41.185 - type: recall_at_10 value: 73.193 - type: recall_at_100 value: 90.357 - type: recall_at_1000 value: 97.253 - type: recall_at_3 value: 59.199999999999996 - type: recall_at_5 value: 66.118 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.27 - type: map_at_10 value: 36.223 - type: map_at_100 value: 37.218 - type: map_at_1000 value: 37.293 - type: map_at_3 value: 33.503 - type: map_at_5 value: 35.097 - type: mrr_at_1 value: 29.492 - type: mrr_at_10 value: 38.352000000000004 - type: mrr_at_100 value: 39.188 - type: mrr_at_1000 value: 39.247 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.401 - type: ndcg_at_1 value: 29.492 - type: ndcg_at_10 value: 41.239 - type: ndcg_at_100 value: 46.066 - type: ndcg_at_1000 value: 47.992000000000004 - type: ndcg_at_3 value: 36.11 - type: ndcg_at_5 value: 38.772 - type: precision_at_1 value: 29.492 - type: precision_at_10 value: 6.260000000000001 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.104000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.27 - type: recall_at_10 value: 54.589 - type: recall_at_100 value: 76.70700000000001 - type: recall_at_1000 value: 91.158 - type: recall_at_3 value: 40.974 - type: recall_at_5 value: 47.327000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.848 - type: map_at_10 value: 26.207 - type: map_at_100 value: 27.478 - type: map_at_1000 value: 27.602 - type: map_at_3 value: 23.405 - type: map_at_5 value: 24.98 - type: mrr_at_1 value: 21.891 - type: mrr_at_10 value: 31.041999999999998 - type: mrr_at_100 value: 32.092 - type: mrr_at_1000 value: 32.151999999999994 - type: mrr_at_3 value: 28.358 - type: mrr_at_5 value: 29.969 - type: ndcg_at_1 value: 21.891 - type: ndcg_at_10 value: 31.585 - type: ndcg_at_100 value: 37.531 - type: ndcg_at_1000 value: 40.256 - type: ndcg_at_3 value: 26.508 - type: ndcg_at_5 value: 28.894 - type: precision_at_1 value: 21.891 - type: precision_at_10 value: 5.795999999999999 - type: precision_at_100 value: 0.9990000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.769 - type: precision_at_5 value: 9.279 - type: recall_at_1 value: 17.848 - type: recall_at_10 value: 43.452 - type: recall_at_100 value: 69.216 - type: recall_at_1000 value: 88.102 - type: recall_at_3 value: 29.18 - type: recall_at_5 value: 35.347 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.94 - type: map_at_10 value: 41.248000000000005 - type: map_at_100 value: 42.495 - type: map_at_1000 value: 42.602000000000004 - type: map_at_3 value: 37.939 - type: map_at_5 value: 39.924 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 47.041 - type: mrr_at_100 value: 47.83 - type: mrr_at_1000 value: 47.878 - type: mrr_at_3 value: 44.466 - type: mrr_at_5 value: 46.111999999999995 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 47.223 - type: ndcg_at_100 value: 52.394 - type: ndcg_at_1000 value: 54.432 - type: ndcg_at_3 value: 42.032000000000004 - type: ndcg_at_5 value: 44.772 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 1.2890000000000001 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 19.698 - type: precision_at_5 value: 14.013 - type: recall_at_1 value: 30.94 - type: recall_at_10 value: 59.316 - type: recall_at_100 value: 80.783 - type: recall_at_1000 value: 94.15400000000001 - type: recall_at_3 value: 44.712 - type: recall_at_5 value: 51.932 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.104 - type: map_at_10 value: 36.675999999999995 - type: map_at_100 value: 38.076 - type: map_at_1000 value: 38.189 - type: map_at_3 value: 33.733999999999995 - type: map_at_5 value: 35.287 - type: mrr_at_1 value: 33.904 - type: mrr_at_10 value: 42.55 - type: mrr_at_100 value: 43.434 - type: mrr_at_1000 value: 43.494 - type: mrr_at_3 value: 40.126 - type: mrr_at_5 value: 41.473 - type: ndcg_at_1 value: 33.904 - type: ndcg_at_10 value: 42.414 - type: ndcg_at_100 value: 48.203 - type: ndcg_at_1000 value: 50.437 - type: ndcg_at_3 value: 37.633 - type: ndcg_at_5 value: 39.67 - type: precision_at_1 value: 33.904 - type: precision_at_10 value: 7.82 - type: precision_at_100 value: 1.2409999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 17.884 - type: precision_at_5 value: 12.648000000000001 - type: recall_at_1 value: 27.104 - type: recall_at_10 value: 53.563 - type: recall_at_100 value: 78.557 - type: recall_at_1000 value: 93.533 - type: recall_at_3 value: 39.92 - type: recall_at_5 value: 45.457 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.707749999999997 - type: map_at_10 value: 36.961 - type: map_at_100 value: 38.158833333333334 - type: map_at_1000 value: 38.270333333333326 - type: map_at_3 value: 34.07183333333334 - type: map_at_5 value: 35.69533333333334 - type: mrr_at_1 value: 32.81875 - type: mrr_at_10 value: 41.293 - type: mrr_at_100 value: 42.116499999999995 - type: mrr_at_1000 value: 42.170249999999996 - type: mrr_at_3 value: 38.83983333333333 - type: mrr_at_5 value: 40.29775 - type: ndcg_at_1 value: 32.81875 - type: ndcg_at_10 value: 42.355 - type: ndcg_at_100 value: 47.41374999999999 - type: ndcg_at_1000 value: 49.5805 - type: ndcg_at_3 value: 37.52825 - type: ndcg_at_5 value: 39.83266666666667 - type: precision_at_1 value: 32.81875 - type: precision_at_10 value: 7.382416666666666 - type: precision_at_100 value: 1.1640833333333334 - type: precision_at_1000 value: 0.15383333333333335 - type: precision_at_3 value: 17.134166666666665 - type: precision_at_5 value: 12.174833333333336 - type: recall_at_1 value: 27.707749999999997 - type: recall_at_10 value: 53.945 - type: recall_at_100 value: 76.191 - type: recall_at_1000 value: 91.101 - type: recall_at_3 value: 40.39083333333334 - type: recall_at_5 value: 46.40083333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.482 - type: map_at_10 value: 33.201 - type: map_at_100 value: 34.107 - type: map_at_1000 value: 34.197 - type: map_at_3 value: 31.174000000000003 - type: map_at_5 value: 32.279 - type: mrr_at_1 value: 29.908 - type: mrr_at_10 value: 36.235 - type: mrr_at_100 value: 37.04 - type: mrr_at_1000 value: 37.105 - type: mrr_at_3 value: 34.355999999999995 - type: mrr_at_5 value: 35.382999999999996 - type: ndcg_at_1 value: 29.908 - type: ndcg_at_10 value: 37.325 - type: ndcg_at_100 value: 41.795 - type: ndcg_at_1000 value: 44.105 - type: ndcg_at_3 value: 33.555 - type: ndcg_at_5 value: 35.266999999999996 - type: precision_at_1 value: 29.908 - type: precision_at_10 value: 5.721 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 14.008000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 26.482 - type: recall_at_10 value: 47.072 - type: recall_at_100 value: 67.27 - type: recall_at_1000 value: 84.371 - type: recall_at_3 value: 36.65 - type: recall_at_5 value: 40.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.815 - type: map_at_10 value: 26.369999999999997 - type: map_at_100 value: 27.458 - type: map_at_1000 value: 27.588 - type: map_at_3 value: 23.990000000000002 - type: map_at_5 value: 25.345000000000002 - type: mrr_at_1 value: 22.953000000000003 - type: mrr_at_10 value: 30.342999999999996 - type: mrr_at_100 value: 31.241000000000003 - type: mrr_at_1000 value: 31.319000000000003 - type: mrr_at_3 value: 28.16 - type: mrr_at_5 value: 29.406 - type: ndcg_at_1 value: 22.953000000000003 - type: ndcg_at_10 value: 31.151 - type: ndcg_at_100 value: 36.309000000000005 - type: ndcg_at_1000 value: 39.227000000000004 - type: ndcg_at_3 value: 26.921 - type: ndcg_at_5 value: 28.938000000000002 - type: precision_at_1 value: 22.953000000000003 - type: precision_at_10 value: 5.602 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 12.606 - type: precision_at_5 value: 9.119 - type: recall_at_1 value: 18.815 - type: recall_at_10 value: 41.574 - type: recall_at_100 value: 64.84400000000001 - type: recall_at_1000 value: 85.406 - type: recall_at_3 value: 29.694 - type: recall_at_5 value: 34.935 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.840999999999998 - type: map_at_10 value: 36.797999999999995 - type: map_at_100 value: 37.993 - type: map_at_1000 value: 38.086999999999996 - type: map_at_3 value: 34.050999999999995 - type: map_at_5 value: 35.379 - type: mrr_at_1 value: 32.649 - type: mrr_at_10 value: 41.025 - type: mrr_at_100 value: 41.878 - type: mrr_at_1000 value: 41.929 - type: mrr_at_3 value: 38.573 - type: mrr_at_5 value: 39.715 - type: ndcg_at_1 value: 32.649 - type: ndcg_at_10 value: 42.142 - type: ndcg_at_100 value: 47.558 - type: ndcg_at_1000 value: 49.643 - type: ndcg_at_3 value: 37.12 - type: ndcg_at_5 value: 38.983000000000004 - type: precision_at_1 value: 32.649 - type: precision_at_10 value: 7.08 - type: precision_at_100 value: 1.1039999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.698 - type: precision_at_5 value: 11.511000000000001 - type: recall_at_1 value: 27.840999999999998 - type: recall_at_10 value: 54.245 - type: recall_at_100 value: 77.947 - type: recall_at_1000 value: 92.36999999999999 - type: recall_at_3 value: 40.146 - type: recall_at_5 value: 44.951 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.529000000000003 - type: map_at_10 value: 35.010000000000005 - type: map_at_100 value: 36.647 - type: map_at_1000 value: 36.857 - type: map_at_3 value: 31.968000000000004 - type: map_at_5 value: 33.554 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 39.550999999999995 - type: mrr_at_100 value: 40.54 - type: mrr_at_1000 value: 40.596 - type: mrr_at_3 value: 36.726 - type: mrr_at_5 value: 38.416 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 40.675 - type: ndcg_at_100 value: 46.548 - type: ndcg_at_1000 value: 49.126 - type: ndcg_at_3 value: 35.829 - type: ndcg_at_5 value: 38.0 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 7.826 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 16.601 - type: precision_at_5 value: 12.095 - type: recall_at_1 value: 26.529000000000003 - type: recall_at_10 value: 51.03 - type: recall_at_100 value: 77.556 - type: recall_at_1000 value: 93.804 - type: recall_at_3 value: 36.986000000000004 - type: recall_at_5 value: 43.096000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.480999999999998 - type: map_at_10 value: 30.817 - type: map_at_100 value: 31.838 - type: map_at_1000 value: 31.932 - type: map_at_3 value: 28.011999999999997 - type: map_at_5 value: 29.668 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 33.072 - type: mrr_at_100 value: 33.926 - type: mrr_at_1000 value: 33.993 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 32.092 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.514 - type: ndcg_at_100 value: 40.489000000000004 - type: ndcg_at_1000 value: 42.908 - type: ndcg_at_3 value: 30.092000000000002 - type: ndcg_at_5 value: 32.989000000000004 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.545 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.446 - type: precision_at_5 value: 9.131 - type: recall_at_1 value: 23.480999999999998 - type: recall_at_10 value: 47.825 - type: recall_at_100 value: 70.652 - type: recall_at_1000 value: 88.612 - type: recall_at_3 value: 33.537 - type: recall_at_5 value: 40.542 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.333999999999998 - type: map_at_10 value: 22.524 - type: map_at_100 value: 24.506 - type: map_at_1000 value: 24.715 - type: map_at_3 value: 19.022 - type: map_at_5 value: 20.693 - type: mrr_at_1 value: 29.186 - type: mrr_at_10 value: 41.22 - type: mrr_at_100 value: 42.16 - type: mrr_at_1000 value: 42.192 - type: mrr_at_3 value: 38.013000000000005 - type: mrr_at_5 value: 39.704 - type: ndcg_at_1 value: 29.186 - type: ndcg_at_10 value: 31.167 - type: ndcg_at_100 value: 38.879000000000005 - type: ndcg_at_1000 value: 42.376000000000005 - type: ndcg_at_3 value: 25.817 - type: ndcg_at_5 value: 27.377000000000002 - type: precision_at_1 value: 29.186 - type: precision_at_10 value: 9.693999999999999 - type: precision_at_100 value: 1.8030000000000002 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 19.11 - type: precision_at_5 value: 14.344999999999999 - type: recall_at_1 value: 13.333999999999998 - type: recall_at_10 value: 37.092000000000006 - type: recall_at_100 value: 63.651 - type: recall_at_1000 value: 83.05 - type: recall_at_3 value: 23.74 - type: recall_at_5 value: 28.655 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.151 - type: map_at_10 value: 19.653000000000002 - type: map_at_100 value: 28.053 - type: map_at_1000 value: 29.709000000000003 - type: map_at_3 value: 14.191 - type: map_at_5 value: 16.456 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.4 - type: mrr_at_100 value: 74.715 - type: mrr_at_1000 value: 74.726 - type: mrr_at_3 value: 72.417 - type: mrr_at_5 value: 73.667 - type: ndcg_at_1 value: 54.25 - type: ndcg_at_10 value: 40.77 - type: ndcg_at_100 value: 46.359 - type: ndcg_at_1000 value: 54.193000000000005 - type: ndcg_at_3 value: 44.832 - type: ndcg_at_5 value: 42.63 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 32.175 - type: precision_at_100 value: 10.668 - type: precision_at_1000 value: 2.067 - type: precision_at_3 value: 47.667 - type: precision_at_5 value: 41.3 - type: recall_at_1 value: 9.151 - type: recall_at_10 value: 25.003999999999998 - type: recall_at_100 value: 52.976 - type: recall_at_1000 value: 78.315 - type: recall_at_3 value: 15.487 - type: recall_at_5 value: 18.999 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.89999999999999 - type: f1 value: 46.47777925067403 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.706 - type: map_at_10 value: 82.423 - type: map_at_100 value: 82.67999999999999 - type: map_at_1000 value: 82.694 - type: map_at_3 value: 81.328 - type: map_at_5 value: 82.001 - type: mrr_at_1 value: 79.613 - type: mrr_at_10 value: 87.07000000000001 - type: mrr_at_100 value: 87.169 - type: mrr_at_1000 value: 87.17 - type: mrr_at_3 value: 86.404 - type: mrr_at_5 value: 86.856 - type: ndcg_at_1 value: 79.613 - type: ndcg_at_10 value: 86.289 - type: ndcg_at_100 value: 87.201 - type: ndcg_at_1000 value: 87.428 - type: ndcg_at_3 value: 84.625 - type: ndcg_at_5 value: 85.53699999999999 - type: precision_at_1 value: 79.613 - type: precision_at_10 value: 10.399 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.473 - type: precision_at_5 value: 20.132 - type: recall_at_1 value: 73.706 - type: recall_at_10 value: 93.559 - type: recall_at_100 value: 97.188 - type: recall_at_1000 value: 98.555 - type: recall_at_3 value: 88.98700000000001 - type: recall_at_5 value: 91.373 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.841 - type: map_at_10 value: 32.643 - type: map_at_100 value: 34.575 - type: map_at_1000 value: 34.736 - type: map_at_3 value: 28.317999999999998 - type: map_at_5 value: 30.964000000000002 - type: mrr_at_1 value: 39.660000000000004 - type: mrr_at_10 value: 48.620000000000005 - type: mrr_at_100 value: 49.384 - type: mrr_at_1000 value: 49.415 - type: mrr_at_3 value: 45.988 - type: mrr_at_5 value: 47.361 - type: ndcg_at_1 value: 39.660000000000004 - type: ndcg_at_10 value: 40.646 - type: ndcg_at_100 value: 47.657 - type: ndcg_at_1000 value: 50.428 - type: ndcg_at_3 value: 36.689 - type: ndcg_at_5 value: 38.211 - type: precision_at_1 value: 39.660000000000004 - type: precision_at_10 value: 11.235000000000001 - type: precision_at_100 value: 1.8530000000000002 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.587999999999997 - type: precision_at_5 value: 18.395 - type: recall_at_1 value: 19.841 - type: recall_at_10 value: 48.135 - type: recall_at_100 value: 74.224 - type: recall_at_1000 value: 90.826 - type: recall_at_3 value: 33.536 - type: recall_at_5 value: 40.311 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.358 - type: map_at_10 value: 64.497 - type: map_at_100 value: 65.362 - type: map_at_1000 value: 65.41900000000001 - type: map_at_3 value: 61.06700000000001 - type: map_at_5 value: 63.317 - type: mrr_at_1 value: 80.716 - type: mrr_at_10 value: 86.10799999999999 - type: mrr_at_100 value: 86.265 - type: mrr_at_1000 value: 86.27 - type: mrr_at_3 value: 85.271 - type: mrr_at_5 value: 85.82499999999999 - type: ndcg_at_1 value: 80.716 - type: ndcg_at_10 value: 72.597 - type: ndcg_at_100 value: 75.549 - type: ndcg_at_1000 value: 76.61 - type: ndcg_at_3 value: 67.874 - type: ndcg_at_5 value: 70.655 - type: precision_at_1 value: 80.716 - type: precision_at_10 value: 15.148 - type: precision_at_100 value: 1.745 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.351 - type: recall_at_1 value: 40.358 - type: recall_at_10 value: 75.739 - type: recall_at_100 value: 87.259 - type: recall_at_1000 value: 94.234 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.878 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.80799999999998 - type: ap value: 86.81350378180757 - type: f1 value: 90.79901248314215 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.096 - type: map_at_10 value: 34.384 - type: map_at_100 value: 35.541 - type: map_at_1000 value: 35.589999999999996 - type: map_at_3 value: 30.496000000000002 - type: map_at_5 value: 32.718 - type: mrr_at_1 value: 22.750999999999998 - type: mrr_at_10 value: 35.024 - type: mrr_at_100 value: 36.125 - type: mrr_at_1000 value: 36.168 - type: mrr_at_3 value: 31.225 - type: mrr_at_5 value: 33.416000000000004 - type: ndcg_at_1 value: 22.750999999999998 - type: ndcg_at_10 value: 41.351 - type: ndcg_at_100 value: 46.92 - type: ndcg_at_1000 value: 48.111 - type: ndcg_at_3 value: 33.439 - type: ndcg_at_5 value: 37.407000000000004 - type: precision_at_1 value: 22.750999999999998 - type: precision_at_10 value: 6.564 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.288 - type: precision_at_5 value: 10.581999999999999 - type: recall_at_1 value: 22.096 - type: recall_at_10 value: 62.771 - type: recall_at_100 value: 88.529 - type: recall_at_1000 value: 97.55 - type: recall_at_3 value: 41.245 - type: recall_at_5 value: 50.788 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.16780665754673 - type: f1 value: 93.96331194859894 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.90606475148198 - type: f1 value: 58.58344986604187 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.14660390047075 - type: f1 value: 74.31533923533614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.16139878950908 - type: f1 value: 80.18532656824924 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.949880906135085 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.56300351524862 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.196521894371315 - type: mrr value: 32.22644231694389 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.783 - type: map_at_10 value: 14.549000000000001 - type: map_at_100 value: 18.433 - type: map_at_1000 value: 19.949 - type: map_at_3 value: 10.936 - type: map_at_5 value: 12.514 - type: mrr_at_1 value: 47.368 - type: mrr_at_10 value: 56.42 - type: mrr_at_100 value: 56.908 - type: mrr_at_1000 value: 56.95 - type: mrr_at_3 value: 54.283 - type: mrr_at_5 value: 55.568 - type: ndcg_at_1 value: 45.666000000000004 - type: ndcg_at_10 value: 37.389 - type: ndcg_at_100 value: 34.253 - type: ndcg_at_1000 value: 43.059999999999995 - type: ndcg_at_3 value: 42.725 - type: ndcg_at_5 value: 40.193 - type: precision_at_1 value: 47.368 - type: precision_at_10 value: 27.988000000000003 - type: precision_at_100 value: 8.672 - type: precision_at_1000 value: 2.164 - type: precision_at_3 value: 40.248 - type: precision_at_5 value: 34.737 - type: recall_at_1 value: 6.783 - type: recall_at_10 value: 17.838 - type: recall_at_100 value: 33.672000000000004 - type: recall_at_1000 value: 66.166 - type: recall_at_3 value: 11.849 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.698999999999998 - type: map_at_10 value: 46.556 - type: map_at_100 value: 47.652 - type: map_at_1000 value: 47.68 - type: map_at_3 value: 42.492000000000004 - type: map_at_5 value: 44.763999999999996 - type: mrr_at_1 value: 35.747 - type: mrr_at_10 value: 49.242999999999995 - type: mrr_at_100 value: 50.052 - type: mrr_at_1000 value: 50.068 - type: mrr_at_3 value: 45.867000000000004 - type: mrr_at_5 value: 47.778999999999996 - type: ndcg_at_1 value: 35.717999999999996 - type: ndcg_at_10 value: 54.14600000000001 - type: ndcg_at_100 value: 58.672999999999995 - type: ndcg_at_1000 value: 59.279 - type: ndcg_at_3 value: 46.407 - type: ndcg_at_5 value: 50.181 - type: precision_at_1 value: 35.717999999999996 - type: precision_at_10 value: 8.844000000000001 - type: precision_at_100 value: 1.139 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.993000000000002 - type: precision_at_5 value: 14.791000000000002 - type: recall_at_1 value: 31.698999999999998 - type: recall_at_10 value: 74.693 - type: recall_at_100 value: 94.15299999999999 - type: recall_at_1000 value: 98.585 - type: recall_at_3 value: 54.388999999999996 - type: recall_at_5 value: 63.08200000000001 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.283 - type: map_at_10 value: 85.24000000000001 - type: map_at_100 value: 85.882 - type: map_at_1000 value: 85.897 - type: map_at_3 value: 82.326 - type: map_at_5 value: 84.177 - type: mrr_at_1 value: 82.21000000000001 - type: mrr_at_10 value: 88.228 - type: mrr_at_100 value: 88.32 - type: mrr_at_1000 value: 88.32 - type: mrr_at_3 value: 87.323 - type: mrr_at_5 value: 87.94800000000001 - type: ndcg_at_1 value: 82.17999999999999 - type: ndcg_at_10 value: 88.9 - type: ndcg_at_100 value: 90.079 - type: ndcg_at_1000 value: 90.158 - type: ndcg_at_3 value: 86.18299999999999 - type: ndcg_at_5 value: 87.71799999999999 - type: precision_at_1 value: 82.17999999999999 - type: precision_at_10 value: 13.464 - type: precision_at_100 value: 1.533 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.693 - type: precision_at_5 value: 24.792 - type: recall_at_1 value: 71.283 - type: recall_at_10 value: 95.742 - type: recall_at_100 value: 99.67200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.888 - type: recall_at_5 value: 92.24 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.24267063669042 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.88056988932578 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.903 - type: map_at_10 value: 13.202 - type: map_at_100 value: 15.5 - type: map_at_1000 value: 15.870999999999999 - type: map_at_3 value: 9.407 - type: map_at_5 value: 11.238 - type: mrr_at_1 value: 24.2 - type: mrr_at_10 value: 35.867 - type: mrr_at_100 value: 37.001 - type: mrr_at_1000 value: 37.043 - type: mrr_at_3 value: 32.5 - type: mrr_at_5 value: 34.35 - type: ndcg_at_1 value: 24.2 - type: ndcg_at_10 value: 21.731 - type: ndcg_at_100 value: 30.7 - type: ndcg_at_1000 value: 36.618 - type: ndcg_at_3 value: 20.72 - type: ndcg_at_5 value: 17.954 - type: precision_at_1 value: 24.2 - type: precision_at_10 value: 11.33 - type: precision_at_100 value: 2.4410000000000003 - type: precision_at_1000 value: 0.386 - type: precision_at_3 value: 19.667 - type: precision_at_5 value: 15.86 - type: recall_at_1 value: 4.903 - type: recall_at_10 value: 22.962 - type: recall_at_100 value: 49.563 - type: recall_at_1000 value: 78.238 - type: recall_at_3 value: 11.953 - type: recall_at_5 value: 16.067999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.12694254604078 - type: cos_sim_spearman value: 80.30141815181918 - type: euclidean_pearson value: 81.34015449877128 - type: euclidean_spearman value: 80.13984197010849 - type: manhattan_pearson value: 81.31767068124086 - type: manhattan_spearman value: 80.11720513114103 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.13112984010417 - type: cos_sim_spearman value: 78.03063573402875 - type: euclidean_pearson value: 83.51928418844804 - type: euclidean_spearman value: 78.4045235411144 - type: manhattan_pearson value: 83.49981637388689 - type: manhattan_spearman value: 78.4042575139372 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.50327987379504 - type: cos_sim_spearman value: 84.18556767756205 - type: euclidean_pearson value: 82.69684424327679 - type: euclidean_spearman value: 83.5368106038335 - type: manhattan_pearson value: 82.57967581007374 - type: manhattan_spearman value: 83.43009053133697 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.50756863007814 - type: cos_sim_spearman value: 82.27204331279108 - type: euclidean_pearson value: 81.39535251429741 - type: euclidean_spearman value: 81.84386626336239 - type: manhattan_pearson value: 81.34281737280695 - type: manhattan_spearman value: 81.81149375673166 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.8727714856726 - type: cos_sim_spearman value: 87.95738287792312 - type: euclidean_pearson value: 86.62920602795887 - type: euclidean_spearman value: 87.05207355381243 - type: manhattan_pearson value: 86.53587918472225 - type: manhattan_spearman value: 86.95382961029586 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.52240359769479 - type: cos_sim_spearman value: 85.47685776238286 - type: euclidean_pearson value: 84.25815333483058 - type: euclidean_spearman value: 85.27415639683198 - type: manhattan_pearson value: 84.29127757025637 - type: manhattan_spearman value: 85.30226224917351 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.42501708915708 - type: cos_sim_spearman value: 86.42276182795041 - type: euclidean_pearson value: 86.5408207354761 - type: euclidean_spearman value: 85.46096321750838 - type: manhattan_pearson value: 86.54177303026881 - type: manhattan_spearman value: 85.50313151916117 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.86521089250766 - type: cos_sim_spearman value: 65.94868540323003 - type: euclidean_pearson value: 67.16569626533084 - type: euclidean_spearman value: 66.37667004134917 - type: manhattan_pearson value: 67.1482365102333 - type: manhattan_spearman value: 66.53240122580029 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.64746265365318 - type: cos_sim_spearman value: 86.41888825906786 - type: euclidean_pearson value: 85.27453642725811 - type: euclidean_spearman value: 85.94095796602544 - type: manhattan_pearson value: 85.28643660505334 - type: manhattan_spearman value: 85.95028003260744 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.48903153618527 - type: mrr value: 96.41081503826601 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 58.594 - type: map_at_10 value: 69.296 - type: map_at_100 value: 69.782 - type: map_at_1000 value: 69.795 - type: map_at_3 value: 66.23 - type: map_at_5 value: 68.293 - type: mrr_at_1 value: 61.667 - type: mrr_at_10 value: 70.339 - type: mrr_at_100 value: 70.708 - type: mrr_at_1000 value: 70.722 - type: mrr_at_3 value: 68.0 - type: mrr_at_5 value: 69.56700000000001 - type: ndcg_at_1 value: 61.667 - type: ndcg_at_10 value: 74.039 - type: ndcg_at_100 value: 76.103 - type: ndcg_at_1000 value: 76.47800000000001 - type: ndcg_at_3 value: 68.967 - type: ndcg_at_5 value: 71.96900000000001 - type: precision_at_1 value: 61.667 - type: precision_at_10 value: 9.866999999999999 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 18.2 - type: recall_at_1 value: 58.594 - type: recall_at_10 value: 87.422 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 74.217 - type: recall_at_5 value: 81.539 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85049504950496 - type: cos_sim_ap value: 96.33111544137081 - type: cos_sim_f1 value: 92.35443037974684 - type: cos_sim_precision value: 93.53846153846153 - type: cos_sim_recall value: 91.2 - type: dot_accuracy value: 99.82376237623762 - type: dot_ap value: 95.38082527310888 - type: dot_f1 value: 90.90909090909092 - type: dot_precision value: 92.90187891440502 - type: dot_recall value: 89.0 - type: euclidean_accuracy value: 99.84851485148515 - type: euclidean_ap value: 96.32316003996347 - type: euclidean_f1 value: 92.2071392659628 - type: euclidean_precision value: 92.71991911021233 - type: euclidean_recall value: 91.7 - type: manhattan_accuracy value: 99.84851485148515 - type: manhattan_ap value: 96.3655668249217 - type: manhattan_f1 value: 92.18356026222895 - type: manhattan_precision value: 92.98067141403867 - type: manhattan_recall value: 91.4 - type: max_accuracy value: 99.85049504950496 - type: max_ap value: 96.3655668249217 - type: max_f1 value: 92.35443037974684 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.94861371629051 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.009430451385 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.61164066427969 - type: mrr value: 55.49710603938544 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.622620124907662 - type: cos_sim_spearman value: 31.0678351356163 - type: dot_pearson value: 30.863727693306814 - type: dot_spearman value: 31.230306567021255 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 2.011 - type: map_at_100 value: 10.974 - type: map_at_1000 value: 25.819 - type: map_at_3 value: 0.6649999999999999 - type: map_at_5 value: 1.076 - type: mrr_at_1 value: 86.0 - type: mrr_at_10 value: 91.8 - type: mrr_at_100 value: 91.8 - type: mrr_at_1000 value: 91.8 - type: mrr_at_3 value: 91.0 - type: mrr_at_5 value: 91.8 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 78.07300000000001 - type: ndcg_at_100 value: 58.231 - type: ndcg_at_1000 value: 51.153000000000006 - type: ndcg_at_3 value: 81.123 - type: ndcg_at_5 value: 81.059 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 83.0 - type: precision_at_100 value: 59.38 - type: precision_at_1000 value: 22.55 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 86.8 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.2079999999999997 - type: recall_at_100 value: 14.069 - type: recall_at_1000 value: 47.678 - type: recall_at_3 value: 0.7040000000000001 - type: recall_at_5 value: 1.161 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.809 - type: map_at_10 value: 10.394 - type: map_at_100 value: 16.598 - type: map_at_1000 value: 18.142 - type: map_at_3 value: 5.572 - type: map_at_5 value: 7.1370000000000005 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 46.564 - type: mrr_at_100 value: 47.469 - type: mrr_at_1000 value: 47.469 - type: mrr_at_3 value: 42.177 - type: mrr_at_5 value: 44.524 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 25.701 - type: ndcg_at_100 value: 37.532 - type: ndcg_at_1000 value: 48.757 - type: ndcg_at_3 value: 28.199999999999996 - type: ndcg_at_5 value: 25.987 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 7.9799999999999995 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.809 - type: recall_at_10 value: 16.887 - type: recall_at_100 value: 48.67 - type: recall_at_1000 value: 82.89699999999999 - type: recall_at_3 value: 6.521000000000001 - type: recall_at_5 value: 9.609 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.57860000000001 - type: ap value: 13.82629211536393 - type: f1 value: 54.59860966183956 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.38030560271647 - type: f1 value: 59.69685552567865 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.4736717043405 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.92853311080646 - type: cos_sim_ap value: 77.67872502591382 - type: cos_sim_f1 value: 70.33941236068895 - type: cos_sim_precision value: 67.63273258645884 - type: cos_sim_recall value: 73.27176781002639 - type: dot_accuracy value: 85.79603027954938 - type: dot_ap value: 73.73786190233379 - type: dot_f1 value: 67.3437901774235 - type: dot_precision value: 65.67201604814443 - type: dot_recall value: 69.10290237467018 - type: euclidean_accuracy value: 86.94045419324074 - type: euclidean_ap value: 77.6687791535167 - type: euclidean_f1 value: 70.47209214023542 - type: euclidean_precision value: 67.7207492094381 - type: euclidean_recall value: 73.45646437994723 - type: manhattan_accuracy value: 86.87488823985218 - type: manhattan_ap value: 77.63373392430728 - type: manhattan_f1 value: 70.40920716112532 - type: manhattan_precision value: 68.31265508684864 - type: manhattan_recall value: 72.63852242744063 - type: max_accuracy value: 86.94045419324074 - type: max_ap value: 77.67872502591382 - type: max_f1 value: 70.47209214023542 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.67155664221679 - type: cos_sim_ap value: 85.64591703003417 - type: cos_sim_f1 value: 77.59531005352656 - type: cos_sim_precision value: 73.60967184801382 - type: cos_sim_recall value: 82.03726516784724 - type: dot_accuracy value: 88.41541506578181 - type: dot_ap value: 84.6482788957769 - type: dot_f1 value: 77.04748541466657 - type: dot_precision value: 74.02440754931176 - type: dot_recall value: 80.3279950723745 - type: euclidean_accuracy value: 88.63080684596576 - type: euclidean_ap value: 85.44570045321562 - type: euclidean_f1 value: 77.28769403336106 - type: euclidean_precision value: 72.90600040958427 - type: euclidean_recall value: 82.22975053895904 - type: manhattan_accuracy value: 88.59393798269105 - type: manhattan_ap value: 85.40271361038187 - type: manhattan_f1 value: 77.17606419344392 - type: manhattan_precision value: 72.4447747078295 - type: manhattan_recall value: 82.5685247921158 - type: max_accuracy value: 88.67155664221679 - type: max_ap value: 85.64591703003417 - type: max_f1 value: 77.59531005352656 license: mit language: - en --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
28 |
+
"model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and similarity measurement."
|
29 |
+
}
|
data/model_data_json/BAAI_bge-base-en.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-base-en",
|
3 |
+
"downloads": 180226,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"bert",
|
10 |
+
"feature-extraction",
|
11 |
+
"mteb",
|
12 |
+
"en",
|
13 |
+
"arxiv:2310.07554",
|
14 |
+
"arxiv:2309.07597",
|
15 |
+
"license:mit",
|
16 |
+
"model-index",
|
17 |
+
"text-embeddings-inference",
|
18 |
+
"endpoints_compatible",
|
19 |
+
"region:us"
|
20 |
+
],
|
21 |
+
"description": "--- tags: - mteb model-index: - name: bge-base-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.73134328358209 - type: ap value: 38.97277232632892 - type: f1 value: 69.81740361139785 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.56522500000001 - type: ap value: 88.88821771869553 - type: f1 value: 92.54817512659696 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.91 - type: f1 value: 46.28536394320311 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.834 - type: map_at_10 value: 53.564 - type: map_at_100 value: 54.230000000000004 - type: map_at_1000 value: 54.235 - type: map_at_3 value: 49.49 - type: map_at_5 value: 51.784 - type: mrr_at_1 value: 39.26 - type: mrr_at_10 value: 53.744 - type: mrr_at_100 value: 54.410000000000004 - type: mrr_at_1000 value: 54.415 - type: mrr_at_3 value: 49.656 - type: mrr_at_5 value: 52.018 - type: ndcg_at_1 value: 38.834 - type: ndcg_at_10 value: 61.487 - type: ndcg_at_100 value: 64.303 - type: ndcg_at_1000 value: 64.408 - type: ndcg_at_3 value: 53.116 - type: ndcg_at_5 value: 57.248 - type: precision_at_1 value: 38.834 - type: precision_at_10 value: 8.663 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.218999999999998 - type: precision_at_5 value: 14.737 - type: recall_at_1 value: 38.834 - type: recall_at_10 value: 86.629 - type: recall_at_100 value: 98.86200000000001 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 63.656 - type: recall_at_5 value: 73.68400000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.88475477433035 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.85053138403176 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.23221013208242 - type: mrr value: 74.64857318735436 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.4403443247284 - type: cos_sim_spearman value: 85.5326718115169 - type: euclidean_pearson value: 86.0114007449595 - type: euclidean_spearman value: 86.05979225604875 - type: manhattan_pearson value: 86.05423806568598 - type: manhattan_spearman value: 86.02485170086835 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.44480519480518 - type: f1 value: 86.41301900941988 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.17547250880036 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.74514172687293 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.096000000000004 - type: map_at_10 value: 43.345 - type: map_at_100 value: 44.73 - type: map_at_1000 value: 44.85 - type: map_at_3 value: 39.956 - type: map_at_5 value: 41.727 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.742000000000004 - type: mrr_at_100 value: 49.474000000000004 - type: mrr_at_1000 value: 49.513 - type: mrr_at_3 value: 46.161 - type: mrr_at_5 value: 47.721000000000004 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.464999999999996 - type: ndcg_at_100 value: 54.632000000000005 - type: ndcg_at_1000 value: 56.52 - type: ndcg_at_3 value: 44.687 - type: ndcg_at_5 value: 46.814 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.471 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.079 - type: recall_at_1 value: 32.096000000000004 - type: recall_at_10 value: 60.99099999999999 - type: recall_at_100 value: 83.075 - type: recall_at_1000 value: 95.178 - type: recall_at_3 value: 47.009 - type: recall_at_5 value: 53.348 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.588 - type: map_at_10 value: 42.251 - type: map_at_100 value: 43.478 - type: map_at_1000 value: 43.617 - type: map_at_3 value: 39.381 - type: map_at_5 value: 41.141 - type: mrr_at_1 value: 41.21 - type: mrr_at_10 value: 48.765 - type: mrr_at_100 value: 49.403000000000006 - type: mrr_at_1000 value: 49.451 - type: mrr_at_3 value: 46.73 - type: mrr_at_5 value: 47.965999999999994 - type: ndcg_at_1 value: 41.21 - type: ndcg_at_10 value: 47.704 - type: ndcg_at_100 value: 51.916 - type: ndcg_at_1000 value: 54.013999999999996 - type: ndcg_at_3 value: 44.007000000000005 - type: ndcg_at_5 value: 45.936 - type: precision_at_1 value: 41.21 - type: precision_at_10 value: 8.885 - type: precision_at_100 value: 1.409 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 21.274 - type: precision_at_5 value: 15.045 - type: recall_at_1 value: 32.588 - type: recall_at_10 value: 56.333 - type: recall_at_100 value: 74.251 - type: recall_at_1000 value: 87.518 - type: recall_at_3 value: 44.962 - type: recall_at_5 value: 50.609 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.308 - type: map_at_10 value: 53.12 - type: map_at_100 value: 54.123 - type: map_at_1000 value: 54.173 - type: map_at_3 value: 50.017999999999994 - type: map_at_5 value: 51.902 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 56.531 - type: mrr_at_100 value: 57.19800000000001 - type: mrr_at_1000 value: 57.225 - type: mrr_at_3 value: 54.368 - type: mrr_at_5 value: 55.713 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 58.811 - type: ndcg_at_100 value: 62.834 - type: ndcg_at_1000 value: 63.849999999999994 - type: ndcg_at_3 value: 53.88699999999999 - type: ndcg_at_5 value: 56.477999999999994 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.398 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 24.221999999999998 - type: precision_at_5 value: 16.539 - type: recall_at_1 value: 40.308 - type: recall_at_10 value: 72.146 - type: recall_at_100 value: 89.60900000000001 - type: recall_at_1000 value: 96.733 - type: recall_at_3 value: 58.91499999999999 - type: recall_at_5 value: 65.34299999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.383000000000003 - type: map_at_10 value: 35.802 - type: map_at_100 value: 36.756 - type: map_at_1000 value: 36.826 - type: map_at_3 value: 32.923 - type: map_at_5 value: 34.577999999999996 - type: mrr_at_1 value: 29.604999999999997 - type: mrr_at_10 value: 37.918 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.786 - type: mrr_at_3 value: 35.198 - type: mrr_at_5 value: 36.808 - type: ndcg_at_1 value: 29.604999999999997 - type: ndcg_at_10 value: 40.836 - type: ndcg_at_100 value: 45.622 - type: ndcg_at_1000 value: 47.427 - type: ndcg_at_3 value: 35.208 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 29.604999999999997 - type: precision_at_10 value: 6.226 - type: precision_at_100 value: 0.9079999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 14.463000000000001 - type: precision_at_5 value: 10.35 - type: recall_at_1 value: 27.383000000000003 - type: recall_at_10 value: 54.434000000000005 - type: recall_at_100 value: 76.632 - type: recall_at_1000 value: 90.25 - type: recall_at_3 value: 39.275 - type: recall_at_5 value: 46.225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.885 - type: map_at_10 value: 25.724000000000004 - type: map_at_100 value: 26.992 - type: map_at_1000 value: 27.107999999999997 - type: map_at_3 value: 23.04 - type: map_at_5 value: 24.529 - type: mrr_at_1 value: 22.264 - type: mrr_at_10 value: 30.548 - type: mrr_at_100 value: 31.593 - type: mrr_at_1000 value: 31.657999999999998 - type: mrr_at_3 value: 27.756999999999998 - type: mrr_at_5 value: 29.398999999999997 - type: ndcg_at_1 value: 22.264 - type: ndcg_at_10 value: 30.902 - type: ndcg_at_100 value: 36.918 - type: ndcg_at_1000 value: 39.735 - type: ndcg_at_3 value: 25.915 - type: ndcg_at_5 value: 28.255999999999997 - type: precision_at_1 value: 22.264 - type: precision_at_10 value: 5.634 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 12.396 - type: precision_at_5 value: 9.055 - type: recall_at_1 value: 17.885 - type: recall_at_10 value: 42.237 - type: recall_at_100 value: 68.489 - type: recall_at_1000 value: 88.721 - type: recall_at_3 value: 28.283 - type: recall_at_5 value: 34.300000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.737000000000002 - type: map_at_10 value: 39.757 - type: map_at_100 value: 40.992 - type: map_at_1000 value: 41.102 - type: map_at_3 value: 36.612 - type: map_at_5 value: 38.413000000000004 - type: mrr_at_1 value: 35.804 - type: mrr_at_10 value: 45.178000000000004 - type: mrr_at_100 value: 45.975 - type: mrr_at_1000 value: 46.021 - type: mrr_at_3 value: 42.541000000000004 - type: mrr_at_5 value: 44.167 - type: ndcg_at_1 value: 35.804 - type: ndcg_at_10 value: 45.608 - type: ndcg_at_100 value: 50.746 - type: ndcg_at_1000 value: 52.839999999999996 - type: ndcg_at_3 value: 40.52 - type: ndcg_at_5 value: 43.051 - type: precision_at_1 value: 35.804 - type: precision_at_10 value: 8.104 - type: precision_at_100 value: 1.256 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.121 - type: precision_at_5 value: 13.532 - type: recall_at_1 value: 29.737000000000002 - type: recall_at_10 value: 57.66 - type: recall_at_100 value: 79.121 - type: recall_at_1000 value: 93.023 - type: recall_at_3 value: 43.13 - type: recall_at_5 value: 49.836000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.299 - type: map_at_10 value: 35.617 - type: map_at_100 value: 36.972 - type: map_at_1000 value: 37.096000000000004 - type: map_at_3 value: 32.653999999999996 - type: map_at_5 value: 34.363 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 41.423 - type: mrr_at_100 value: 42.333999999999996 - type: mrr_at_1000 value: 42.398 - type: mrr_at_3 value: 39.193 - type: mrr_at_5 value: 40.426 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 41.271 - type: ndcg_at_100 value: 46.843 - type: ndcg_at_1000 value: 49.366 - type: ndcg_at_3 value: 36.735 - type: ndcg_at_5 value: 38.775999999999996 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.580000000000001 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.541999999999998 - type: precision_at_5 value: 12.443 - type: recall_at_1 value: 26.299 - type: recall_at_10 value: 52.256 - type: recall_at_100 value: 75.919 - type: recall_at_1000 value: 93.185 - type: recall_at_3 value: 39.271 - type: recall_at_5 value: 44.901 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.05741666666667 - type: map_at_10 value: 36.086416666666665 - type: map_at_100 value: 37.26916666666667 - type: map_at_1000 value: 37.38191666666666 - type: map_at_3 value: 33.34225 - type: map_at_5 value: 34.86425 - type: mrr_at_1 value: 32.06008333333333 - type: mrr_at_10 value: 40.36658333333333 - type: mrr_at_100 value: 41.206500000000005 - type: mrr_at_1000 value: 41.261083333333325 - type: mrr_at_3 value: 38.01208333333334 - type: mrr_at_5 value: 39.36858333333333 - type: ndcg_at_1 value: 32.06008333333333 - type: ndcg_at_10 value: 41.3535 - type: ndcg_at_100 value: 46.42066666666666 - type: ndcg_at_1000 value: 48.655166666666666 - type: ndcg_at_3 value: 36.78041666666667 - type: ndcg_at_5 value: 38.91783333333334 - type: precision_at_1 value: 32.06008333333333 - type: precision_at_10 value: 7.169833333333332 - type: precision_at_100 value: 1.1395 - type: precision_at_1000 value: 0.15158333333333332 - type: precision_at_3 value: 16.852 - type: precision_at_5 value: 11.8645 - type: recall_at_1 value: 27.05741666666667 - type: recall_at_10 value: 52.64491666666666 - type: recall_at_100 value: 74.99791666666667 - type: recall_at_1000 value: 90.50524999999999 - type: recall_at_3 value: 39.684000000000005 - type: recall_at_5 value: 45.37225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.607999999999997 - type: map_at_10 value: 32.28 - type: map_at_100 value: 33.261 - type: map_at_1000 value: 33.346 - type: map_at_3 value: 30.514999999999997 - type: map_at_5 value: 31.415 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.384 - type: mrr_at_100 value: 36.24 - type: mrr_at_1000 value: 36.299 - type: mrr_at_3 value: 33.717000000000006 - type: mrr_at_5 value: 34.507 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 36.248000000000005 - type: ndcg_at_100 value: 41.034 - type: ndcg_at_1000 value: 43.35 - type: ndcg_at_3 value: 32.987 - type: ndcg_at_5 value: 34.333999999999996 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.506 - type: precision_at_100 value: 0.853 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.11 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 25.607999999999997 - type: recall_at_10 value: 45.344 - type: recall_at_100 value: 67.132 - type: recall_at_1000 value: 84.676 - type: recall_at_3 value: 36.02 - type: recall_at_5 value: 39.613 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.44 - type: map_at_10 value: 25.651000000000003 - type: map_at_100 value: 26.735 - type: map_at_1000 value: 26.86 - type: map_at_3 value: 23.409 - type: map_at_5 value: 24.604 - type: mrr_at_1 value: 22.195 - type: mrr_at_10 value: 29.482000000000003 - type: mrr_at_100 value: 30.395 - type: mrr_at_1000 value: 30.471999999999998 - type: mrr_at_3 value: 27.409 - type: mrr_at_5 value: 28.553 - type: ndcg_at_1 value: 22.195 - type: ndcg_at_10 value: 30.242 - type: ndcg_at_100 value: 35.397 - type: ndcg_at_1000 value: 38.287 - type: ndcg_at_3 value: 26.201 - type: ndcg_at_5 value: 28.008 - type: precision_at_1 value: 22.195 - type: precision_at_10 value: 5.372 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 12.228 - type: precision_at_5 value: 8.727 - type: recall_at_1 value: 18.44 - type: recall_at_10 value: 40.325 - type: recall_at_100 value: 63.504000000000005 - type: recall_at_1000 value: 83.909 - type: recall_at_3 value: 28.925 - type: recall_at_5 value: 33.641 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.535999999999998 - type: map_at_10 value: 35.358000000000004 - type: map_at_100 value: 36.498999999999995 - type: map_at_1000 value: 36.597 - type: map_at_3 value: 32.598 - type: map_at_5 value: 34.185 - type: mrr_at_1 value: 31.25 - type: mrr_at_10 value: 39.593 - type: mrr_at_100 value: 40.443 - type: mrr_at_1000 value: 40.498 - type: mrr_at_3 value: 37.018 - type: mrr_at_5 value: 38.492 - type: ndcg_at_1 value: 31.25 - type: ndcg_at_10 value: 40.71 - type: ndcg_at_100 value: 46.079 - type: ndcg_at_1000 value: 48.287 - type: ndcg_at_3 value: 35.667 - type: ndcg_at_5 value: 38.080000000000005 - type: precision_at_1 value: 31.25 - type: precision_at_10 value: 6.847 - type: precision_at_100 value: 1.079 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 16.262 - type: precision_at_5 value: 11.455 - type: recall_at_1 value: 26.535999999999998 - type: recall_at_10 value: 52.92099999999999 - type: recall_at_100 value: 76.669 - type: recall_at_1000 value: 92.096 - type: recall_at_3 value: 38.956 - type: recall_at_5 value: 45.239000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.691 - type: map_at_10 value: 33.417 - type: map_at_100 value: 35.036 - type: map_at_1000 value: 35.251 - type: map_at_3 value: 30.646 - type: map_at_5 value: 32.177 - type: mrr_at_1 value: 30.04 - type: mrr_at_10 value: 37.905 - type: mrr_at_100 value: 38.929 - type: mrr_at_1000 value: 38.983000000000004 - type: mrr_at_3 value: 35.276999999999994 - type: mrr_at_5 value: 36.897000000000006 - type: ndcg_at_1 value: 30.04 - type: ndcg_at_10 value: 39.037 - type: ndcg_at_100 value: 44.944 - type: ndcg_at_1000 value: 47.644 - type: ndcg_at_3 value: 34.833999999999996 - type: ndcg_at_5 value: 36.83 - type: precision_at_1 value: 30.04 - type: precision_at_10 value: 7.4510000000000005 - type: precision_at_100 value: 1.492 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 11.897 - type: recall_at_1 value: 24.691 - type: recall_at_10 value: 49.303999999999995 - type: recall_at_100 value: 76.20400000000001 - type: recall_at_1000 value: 93.30000000000001 - type: recall_at_3 value: 36.594 - type: recall_at_5 value: 42.41 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.118 - type: map_at_10 value: 30.714999999999996 - type: map_at_100 value: 31.656000000000002 - type: map_at_1000 value: 31.757 - type: map_at_3 value: 28.355000000000004 - type: map_at_5 value: 29.337000000000003 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 32.93 - type: mrr_at_100 value: 33.762 - type: mrr_at_1000 value: 33.829 - type: mrr_at_3 value: 30.775999999999996 - type: mrr_at_5 value: 31.774 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.408 - type: ndcg_at_100 value: 40.083 - type: ndcg_at_1000 value: 42.542 - type: ndcg_at_3 value: 30.717 - type: ndcg_at_5 value: 32.385000000000005 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.843 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.001 - type: precision_at_5 value: 8.834999999999999 - type: recall_at_1 value: 23.118 - type: recall_at_10 value: 47.788000000000004 - type: recall_at_100 value: 69.37 - type: recall_at_1000 value: 87.47399999999999 - type: recall_at_3 value: 34.868 - type: recall_at_5 value: 39.001999999999995 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 14.288 - type: map_at_10 value: 23.256 - type: map_at_100 value: 25.115 - type: map_at_1000 value: 25.319000000000003 - type: map_at_3 value: 20.005 - type: map_at_5 value: 21.529999999999998 - type: mrr_at_1 value: 31.401 - type: mrr_at_10 value: 42.251 - type: mrr_at_100 value: 43.236999999999995 - type: mrr_at_1000 value: 43.272 - type: mrr_at_3 value: 39.164 - type: mrr_at_5 value: 40.881 - type: ndcg_at_1 value: 31.401 - type: ndcg_at_10 value: 31.615 - type: ndcg_at_100 value: 38.982 - type: ndcg_at_1000 value: 42.496 - type: ndcg_at_3 value: 26.608999999999998 - type: ndcg_at_5 value: 28.048000000000002 - type: precision_at_1 value: 31.401 - type: precision_at_10 value: 9.536999999999999 - type: precision_at_100 value: 1.763 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 19.153000000000002 - type: precision_at_5 value: 14.228 - type: recall_at_1 value: 14.288 - type: recall_at_10 value: 36.717 - type: recall_at_100 value: 61.9 - type: recall_at_1000 value: 81.676 - type: recall_at_3 value: 24.203 - type: recall_at_5 value: 28.793999999999997 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.019 - type: map_at_10 value: 19.963 - type: map_at_100 value: 28.834 - type: map_at_1000 value: 30.537999999999997 - type: map_at_3 value: 14.45 - type: map_at_5 value: 16.817999999999998 - type: mrr_at_1 value: 65.75 - type: mrr_at_10 value: 74.646 - type: mrr_at_100 value: 74.946 - type: mrr_at_1000 value: 74.95100000000001 - type: mrr_at_3 value: 72.625 - type: mrr_at_5 value: 74.012 - type: ndcg_at_1 value: 54 - type: ndcg_at_10 value: 42.014 - type: ndcg_at_100 value: 47.527 - type: ndcg_at_1000 value: 54.911 - type: ndcg_at_3 value: 46.586 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 65.75 - type: precision_at_10 value: 33.475 - type: precision_at_100 value: 11.16 - type: precision_at_1000 value: 2.145 - type: precision_at_3 value: 50.083 - type: precision_at_5 value: 42.55 - type: recall_at_1 value: 9.019 - type: recall_at_10 value: 25.558999999999997 - type: recall_at_100 value: 53.937999999999995 - type: recall_at_1000 value: 77.67399999999999 - type: recall_at_3 value: 15.456 - type: recall_at_5 value: 19.259 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.635 - type: f1 value: 47.692783881403926 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 76.893 - type: map_at_10 value: 84.897 - type: map_at_100 value: 85.122 - type: map_at_1000 value: 85.135 - type: map_at_3 value: 83.88 - type: map_at_5 value: 84.565 - type: mrr_at_1 value: 83.003 - type: mrr_at_10 value: 89.506 - type: mrr_at_100 value: 89.574 - type: mrr_at_1000 value: 89.575 - type: mrr_at_3 value: 88.991 - type: mrr_at_5 value: 89.349 - type: ndcg_at_1 value: 83.003 - type: ndcg_at_10 value: 88.351 - type: ndcg_at_100 value: 89.128 - type: ndcg_at_1000 value: 89.34100000000001 - type: ndcg_at_3 value: 86.92 - type: ndcg_at_5 value: 87.78200000000001 - type: precision_at_1 value: 83.003 - type: precision_at_10 value: 10.517999999999999 - type: precision_at_100 value: 1.115 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.062999999999995 - type: precision_at_5 value: 20.498 - type: recall_at_1 value: 76.893 - type: recall_at_10 value: 94.374 - type: recall_at_100 value: 97.409 - type: recall_at_1000 value: 98.687 - type: recall_at_3 value: 90.513 - type: recall_at_5 value: 92.709 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.829 - type: map_at_10 value: 32.86 - type: map_at_100 value: 34.838 - type: map_at_1000 value: 35.006 - type: map_at_3 value: 28.597 - type: map_at_5 value: 31.056 - type: mrr_at_1 value: 41.358 - type: mrr_at_10 value: 49.542 - type: mrr_at_100 value: 50.29900000000001 - type: mrr_at_1000 value: 50.334999999999994 - type: mrr_at_3 value: 46.579 - type: mrr_at_5 value: 48.408 - type: ndcg_at_1 value: 41.358 - type: ndcg_at_10 value: 40.758 - type: ndcg_at_100 value: 47.799 - type: ndcg_at_1000 value: 50.589 - type: ndcg_at_3 value: 36.695 - type: ndcg_at_5 value: 38.193 - type: precision_at_1 value: 41.358 - type: precision_at_10 value: 11.142000000000001 - type: precision_at_100 value: 1.8350000000000002 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 24.023 - type: precision_at_5 value: 17.963 - type: recall_at_1 value: 20.829 - type: recall_at_10 value: 47.467999999999996 - type: recall_at_100 value: 73.593 - type: recall_at_1000 value: 90.122 - type: recall_at_3 value: 32.74 - type: recall_at_5 value: 39.608 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.324 - type: map_at_10 value: 64.183 - type: map_at_100 value: 65.037 - type: map_at_1000 value: 65.094 - type: map_at_3 value: 60.663 - type: map_at_5 value: 62.951 - type: mrr_at_1 value: 80.648 - type: mrr_at_10 value: 86.005 - type: mrr_at_100 value: 86.157 - type: mrr_at_1000 value: 86.162 - type: mrr_at_3 value: 85.116 - type: mrr_at_5 value: 85.703 - type: ndcg_at_1 value: 80.648 - type: ndcg_at_10 value: 72.351 - type: ndcg_at_100 value: 75.279 - type: ndcg_at_1000 value: 76.357 - type: ndcg_at_3 value: 67.484 - type: ndcg_at_5 value: 70.31500000000001 - type: precision_at_1 value: 80.648 - type: precision_at_10 value: 15.103 - type: precision_at_100 value: 1.7399999999999998 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.232 - type: precision_at_5 value: 28.165000000000003 - type: recall_at_1 value: 40.324 - type: recall_at_10 value: 75.517 - type: recall_at_100 value: 86.982 - type: recall_at_1000 value: 94.072 - type: recall_at_3 value: 64.848 - type: recall_at_5 value: 70.41199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.4 - type: ap value: 87.4422032289312 - type: f1 value: 91.39249564302281 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.03 - type: map_at_10 value: 34.402 - type: map_at_100 value: 35.599 - type: map_at_1000 value: 35.648 - type: map_at_3 value: 30.603 - type: map_at_5 value: 32.889 - type: mrr_at_1 value: 22.679 - type: mrr_at_10 value: 35.021 - type: mrr_at_100 value: 36.162 - type: mrr_at_1000 value: 36.205 - type: mrr_at_3 value: 31.319999999999997 - type: mrr_at_5 value: 33.562 - type: ndcg_at_1 value: 22.692999999999998 - type: ndcg_at_10 value: 41.258 - type: ndcg_at_100 value: 46.967 - type: ndcg_at_1000 value: 48.175000000000004 - type: ndcg_at_3 value: 33.611000000000004 - type: ndcg_at_5 value: 37.675 - type: precision_at_1 value: 22.692999999999998 - type: precision_at_10 value: 6.5089999999999995 - type: precision_at_100 value: 0.936 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.413 - type: precision_at_5 value: 10.702 - type: recall_at_1 value: 22.03 - type: recall_at_10 value: 62.248000000000005 - type: recall_at_100 value: 88.524 - type: recall_at_1000 value: 97.714 - type: recall_at_3 value: 41.617 - type: recall_at_5 value: 51.359 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.36844505243957 - type: f1 value: 94.12408743818202 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.43410852713177 - type: f1 value: 58.501855709435624 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.04909213180902 - type: f1 value: 74.1800860395823 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.76126429051781 - type: f1 value: 79.85705217473232 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.70119520292863 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.33544316467486 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.75499243990726 - type: mrr value: 31.70602251821063 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.451999999999999 - type: map_at_10 value: 13.918 - type: map_at_100 value: 17.316000000000003 - type: map_at_1000 value: 18.747 - type: map_at_3 value: 10.471 - type: map_at_5 value: 12.104 - type: mrr_at_1 value: 46.749 - type: mrr_at_10 value: 55.717000000000006 - type: mrr_at_100 value: 56.249 - type: mrr_at_1000 value: 56.288000000000004 - type: mrr_at_3 value: 53.818 - type: mrr_at_5 value: 55.103 - type: ndcg_at_1 value: 45.201 - type: ndcg_at_10 value: 35.539 - type: ndcg_at_100 value: 32.586 - type: ndcg_at_1000 value: 41.486000000000004 - type: ndcg_at_3 value: 41.174 - type: ndcg_at_5 value: 38.939 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 25.944 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.076 - type: precision_at_3 value: 38.7 - type: precision_at_5 value: 33.56 - type: recall_at_1 value: 6.451999999999999 - type: recall_at_10 value: 17.302 - type: recall_at_100 value: 32.14 - type: recall_at_1000 value: 64.12 - type: recall_at_3 value: 11.219 - type: recall_at_5 value: 13.993 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 32.037 - type: map_at_10 value: 46.565 - type: map_at_100 value: 47.606 - type: map_at_1000 value: 47.636 - type: map_at_3 value: 42.459 - type: map_at_5 value: 44.762 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 49.291000000000004 - type: mrr_at_100 value: 50.059 - type: mrr_at_1000 value: 50.078 - type: mrr_at_3 value: 45.829 - type: mrr_at_5 value: 47.797 - type: ndcg_at_1 value: 36.153 - type: ndcg_at_10 value: 53.983000000000004 - type: ndcg_at_100 value: 58.347 - type: ndcg_at_1000 value: 59.058 - type: ndcg_at_3 value: 46.198 - type: ndcg_at_5 value: 50.022 - type: precision_at_1 value: 36.153 - type: precision_at_10 value: 8.763 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.751 - type: precision_at_5 value: 14.646999999999998 - type: recall_at_1 value: 32.037 - type: recall_at_10 value: 74.008 - type: recall_at_100 value: 92.893 - type: recall_at_1000 value: 98.16 - type: recall_at_3 value: 53.705999999999996 - type: recall_at_5 value: 62.495 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.152 - type: map_at_10 value: 85.104 - type: map_at_100 value: 85.745 - type: map_at_1000 value: 85.761 - type: map_at_3 value: 82.175 - type: map_at_5 value: 84.066 - type: mrr_at_1 value: 82.03 - type: mrr_at_10 value: 88.115 - type: mrr_at_100 value: 88.21 - type: mrr_at_1000 value: 88.211 - type: mrr_at_3 value: 87.19200000000001 - type: mrr_at_5 value: 87.85 - type: ndcg_at_1 value: 82.03 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.96300000000001 - type: ndcg_at_1000 value: 90.056 - type: ndcg_at_3 value: 86.051 - type: ndcg_at_5 value: 87.63499999999999 - type: precision_at_1 value: 82.03 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.627 - type: precision_at_5 value: 24.784 - type: recall_at_1 value: 71.152 - type: recall_at_10 value: 95.649 - type: recall_at_100 value: 99.58200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.767 - type: recall_at_5 value: 92.233 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.48713646277477 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.394940772438545 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.043 - type: map_at_10 value: 12.949 - type: map_at_100 value: 15.146 - type: map_at_1000 value: 15.495000000000001 - type: map_at_3 value: 9.333 - type: map_at_5 value: 11.312999999999999 - type: mrr_at_1 value: 24.9 - type: mrr_at_10 value: 35.958 - type: mrr_at_100 value: 37.152 - type: mrr_at_1000 value: 37.201 - type: mrr_at_3 value: 32.667 - type: mrr_at_5 value: 34.567 - type: ndcg_at_1 value: 24.9 - type: ndcg_at_10 value: 21.298000000000002 - type: ndcg_at_100 value: 29.849999999999998 - type: ndcg_at_1000 value: 35.506 - type: ndcg_at_3 value: 20.548 - type: ndcg_at_5 value: 18.064 - type: precision_at_1 value: 24.9 - type: precision_at_10 value: 10.9 - type: precision_at_100 value: 2.331 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 15.939999999999998 - type: recall_at_1 value: 5.043 - type: recall_at_10 value: 22.092 - type: recall_at_100 value: 47.323 - type: recall_at_1000 value: 74.553 - type: recall_at_3 value: 11.728 - type: recall_at_5 value: 16.188 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.7007085938325 - type: cos_sim_spearman value: 80.0171084446234 - type: euclidean_pearson value: 81.28133218355893 - type: euclidean_spearman value: 79.99291731740131 - type: manhattan_pearson value: 81.22926922327846 - type: manhattan_spearman value: 79.94444878127038 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.7411883252923 - type: cos_sim_spearman value: 77.93462937801245 - type: euclidean_pearson value: 83.00858563882404 - type: euclidean_spearman value: 77.82717362433257 - type: manhattan_pearson value: 82.92887645790769 - type: manhattan_spearman value: 77.78807488222115 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.04222459361023 - type: cos_sim_spearman value: 83.85931509330395 - type: euclidean_pearson value: 83.26916063876055 - type: euclidean_spearman value: 83.98621985648353 - type: manhattan_pearson value: 83.14935679184327 - type: manhattan_spearman value: 83.87938828586304 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.41136639535318 - type: cos_sim_spearman value: 81.51200091040481 - type: euclidean_pearson value: 81.45382456114775 - type: euclidean_spearman value: 81.46201181707931 - type: manhattan_pearson value: 81.37243088439584 - type: manhattan_spearman value: 81.39828421893426 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.71942451732227 - type: cos_sim_spearman value: 87.33044482064973 - type: euclidean_pearson value: 86.58580899365178 - type: euclidean_spearman value: 87.09206723832895 - type: manhattan_pearson value: 86.47460784157013 - type: manhattan_spearman value: 86.98367656583076 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.55868078863449 - type: cos_sim_spearman value: 85.38299230074065 - type: euclidean_pearson value: 84.64715256244595 - type: euclidean_spearman value: 85.49112229604047 - type: manhattan_pearson value: 84.60814346792462 - type: manhattan_spearman value: 85.44886026766822 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.99292526370614 - type: cos_sim_spearman value: 85.58139465695983 - type: euclidean_pearson value: 86.51325066734084 - type: euclidean_spearman value: 85.56736418284562 - type: manhattan_pearson value: 86.48190836601357 - type: manhattan_spearman value: 85.51616256224258 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.54124715078807 - type: cos_sim_spearman value: 65.32134275948374 - type: euclidean_pearson value: 67.09791698300816 - type: euclidean_spearman value: 65.79468982468465 - type: manhattan_pearson value: 67.13304723693966 - type: manhattan_spearman value: 65.68439995849283 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.4231099581624 - type: cos_sim_spearman value: 85.95475815226862 - type: euclidean_pearson value: 85.00339401999706 - type: euclidean_spearman value: 85.74133081802971 - type: manhattan_pearson value: 85.00407987181666 - type: manhattan_spearman value: 85.77509596397363 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.25666719585716 - type: mrr value: 96.32769917083642 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.828 - type: map_at_10 value: 68.369 - type: map_at_100 value: 68.83399999999999 - type: map_at_1000 value: 68.856 - type: map_at_3 value: 65.38000000000001 - type: map_at_5 value: 67.06299999999999 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 69.45400000000001 - type: mrr_at_100 value: 69.785 - type: mrr_at_1000 value: 69.807 - type: mrr_at_3 value: 67 - type: mrr_at_5 value: 68.43299999999999 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 73.258 - type: ndcg_at_100 value: 75.173 - type: ndcg_at_1000 value: 75.696 - type: ndcg_at_3 value: 68.162 - type: ndcg_at_5 value: 70.53399999999999 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.087 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 57.828 - type: recall_at_10 value: 87.122 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 73.139 - type: recall_at_5 value: 79.361 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85247524752475 - type: cos_sim_ap value: 96.25640197639723 - type: cos_sim_f1 value: 92.37851662404091 - type: cos_sim_precision value: 94.55497382198953 - type: cos_sim_recall value: 90.3 - type: dot_accuracy value: 99.76138613861386 - type: dot_ap value: 93.40295864389073 - type: dot_f1 value: 87.64267990074441 - type: dot_precision value: 86.99507389162562 - type: dot_recall value: 88.3 - type: euclidean_accuracy value: 99.85049504950496 - type: euclidean_ap value: 96.24254350525462 - type: euclidean_f1 value: 92.32323232323232 - type: euclidean_precision value: 93.26530612244898 - type: euclidean_recall value: 91.4 - type: manhattan_accuracy value: 99.85346534653465 - type: manhattan_ap value: 96.2635334753325 - type: manhattan_f1 value: 92.37899073120495 - type: manhattan_precision value: 95.22292993630573 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.85346534653465 - type: max_ap value: 96.2635334753325 - type: max_f1 value: 92.37899073120495 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.83905786483794 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.031896152126436 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.551326709447146 - type: mrr value: 55.43758222986165 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.305688567308874 - type: cos_sim_spearman value: 29.27135743434515 - type: dot_pearson value: 30.336741878796563 - type: dot_spearman value: 30.513365725895937 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.245 - type: map_at_10 value: 1.92 - type: map_at_100 value: 10.519 - type: map_at_1000 value: 23.874000000000002 - type: map_at_3 value: 0.629 - type: map_at_5 value: 1.0290000000000001 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.5 - type: mrr_at_100 value: 93.5 - type: mrr_at_1000 value: 93.5 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.5 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 76.447 - type: ndcg_at_100 value: 56.516 - type: ndcg_at_1000 value: 48.583999999999996 - type: ndcg_at_3 value: 78.877 - type: ndcg_at_5 value: 79.174 - type: precision_at_1 value: 88 - type: precision_at_10 value: 80.60000000000001 - type: precision_at_100 value: 57.64 - type: precision_at_1000 value: 21.227999999999998 - type: precision_at_3 value: 82 - type: precision_at_5 value: 83.6 - type: recall_at_1 value: 0.245 - type: recall_at_10 value: 2.128 - type: recall_at_100 value: 13.767 - type: recall_at_1000 value: 44.958 - type: recall_at_3 value: 0.654 - type: recall_at_5 value: 1.111 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.5170000000000003 - type: map_at_10 value: 10.915 - type: map_at_100 value: 17.535 - type: map_at_1000 value: 19.042 - type: map_at_3 value: 5.689 - type: map_at_5 value: 7.837 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 49.547999999999995 - type: mrr_at_100 value: 50.653000000000006 - type: mrr_at_1000 value: 50.653000000000006 - type: mrr_at_3 value: 44.558 - type: mrr_at_5 value: 48.333 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 26.543 - type: ndcg_at_100 value: 38.946 - type: ndcg_at_1000 value: 49.406 - type: ndcg_at_3 value: 29.903000000000002 - type: ndcg_at_5 value: 29.231 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.265 - type: precision_at_100 value: 8.102 - type: precision_at_1000 value: 1.5 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.5170000000000003 - type: recall_at_10 value: 16.88 - type: recall_at_100 value: 49.381 - type: recall_at_1000 value: 81.23899999999999 - type: recall_at_3 value: 6.965000000000001 - type: recall_at_5 value: 10.847999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.5942 - type: ap value: 13.92074156956546 - type: f1 value: 54.671999698839066 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.39728353140916 - type: f1 value: 59.68980496759517 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.11181870104935 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.46957143708649 - type: cos_sim_ap value: 76.16120197845457 - type: cos_sim_f1 value: 69.69919295671315 - type: cos_sim_precision value: 64.94986326344576 - type: cos_sim_recall value: 75.19788918205805 - type: dot_accuracy value: 83.0780234845324 - type: dot_ap value: 64.21717343541934 - type: dot_f1 value: 59.48375497624245 - type: dot_precision value: 57.94345759319489 - type: dot_recall value: 61.108179419525065 - type: euclidean_accuracy value: 86.6543482148179 - type: euclidean_ap value: 76.4527555010203 - type: euclidean_f1 value: 70.10156056477584 - type: euclidean_precision value: 66.05975723622782 - type: euclidean_recall value: 74.67018469656992 - type: manhattan_accuracy value: 86.66030875603504 - type: manhattan_ap value: 76.40304567255436 - type: manhattan_f1 value: 70.05275426328058 - type: manhattan_precision value: 65.4666360926393 - type: manhattan_recall value: 75.32981530343008 - type: max_accuracy value: 86.66030875603504 - type: max_ap value: 76.4527555010203 - type: max_f1 value: 70.10156056477584 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.42123646524624 - type: cos_sim_ap value: 85.15431437761646 - type: cos_sim_f1 value: 76.98069301530742 - type: cos_sim_precision value: 72.9314502239063 - type: cos_sim_recall value: 81.50600554357868 - type: dot_accuracy value: 86.70974502270346 - type: dot_ap value: 80.77621563599457 - type: dot_f1 value: 73.87058697285117 - type: dot_precision value: 68.98256396552877 - type: dot_recall value: 79.50415768401602 - type: euclidean_accuracy value: 88.46392672798541 - type: euclidean_ap value: 85.20370297495491 - type: euclidean_f1 value: 77.01372369624886 - type: euclidean_precision value: 73.39052800446397 - type: euclidean_recall value: 81.01324299353249 - type: manhattan_accuracy value: 88.43481973066325 - type: manhattan_ap value: 85.16318289864545 - type: manhattan_f1 value: 76.90884877182597 - type: manhattan_precision value: 74.01737396753062 - type: manhattan_recall value: 80.03541730828458 - type: max_accuracy value: 88.46392672798541 - type: max_ap value: 85.20370297495491 - type: max_f1 value: 77.01372369624886 license: mit language: - en --- **Recommend switching to newest BAAI/bge-base-en-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
22 |
+
"model_explanation_gemini": "Performs text classification, retrieval, clustering, reranking, and semantic textual similarity tasks across various datasets."
|
23 |
+
}
|
data/model_data_json/BAAI_bge-large-en-v1.5.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-large-en-v1.5",
|
3 |
+
"downloads": 2187819,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"bert",
|
10 |
+
"feature-extraction",
|
11 |
+
"sentence-similarity",
|
12 |
+
"transformers",
|
13 |
+
"mteb",
|
14 |
+
"en",
|
15 |
+
"arxiv:2401.03462",
|
16 |
+
"arxiv:2312.15503",
|
17 |
+
"arxiv:2311.13534",
|
18 |
+
"arxiv:2310.07554",
|
19 |
+
"arxiv:2309.07597",
|
20 |
+
"license:mit",
|
21 |
+
"model-index",
|
22 |
+
"autotrain_compatible",
|
23 |
+
"text-embeddings-inference",
|
24 |
+
"endpoints_compatible",
|
25 |
+
"region:us"
|
26 |
+
],
|
27 |
+
"description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.8507462686567 - type: ap value: 38.566457320228245 - type: f1 value: 69.69386648043475 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.416675 - type: ap value: 89.1928861155922 - type: f1 value: 92.39477019574215 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.175999999999995 - type: f1 value: 47.80712792870253 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.184999999999995 - type: map_at_10 value: 55.654 - type: map_at_100 value: 56.25 - type: map_at_1000 value: 56.255 - type: map_at_3 value: 51.742999999999995 - type: map_at_5 value: 54.129000000000005 - type: mrr_at_1 value: 40.967 - type: mrr_at_10 value: 55.96 - type: mrr_at_100 value: 56.54900000000001 - type: mrr_at_1000 value: 56.554 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.44 - type: ndcg_at_1 value: 40.184999999999995 - type: ndcg_at_10 value: 63.542 - type: ndcg_at_100 value: 65.96499999999999 - type: ndcg_at_1000 value: 66.08699999999999 - type: ndcg_at_3 value: 55.582 - type: ndcg_at_5 value: 59.855000000000004 - type: precision_at_1 value: 40.184999999999995 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.987 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.405 - type: recall_at_1 value: 40.184999999999995 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 98.72 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 77.027 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.567077926750066 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.19453389182364 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 64.46555939623092 - type: mrr value: 77.82361605768807 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.9554128814735 - type: cos_sim_spearman value: 84.65373612172036 - type: euclidean_pearson value: 83.2905059954138 - type: euclidean_spearman value: 84.52240782811128 - type: manhattan_pearson value: 82.99533802997436 - type: manhattan_spearman value: 84.20673798475734 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.78896103896103 - type: f1 value: 87.77189310964883 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.714538337650495 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.90108349284447 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.795 - type: map_at_10 value: 43.669000000000004 - type: map_at_100 value: 45.151 - type: map_at_1000 value: 45.278 - type: map_at_3 value: 40.006 - type: map_at_5 value: 42.059999999999995 - type: mrr_at_1 value: 39.771 - type: mrr_at_10 value: 49.826 - type: mrr_at_100 value: 50.504000000000005 - type: mrr_at_1000 value: 50.549 - type: mrr_at_3 value: 47.115 - type: mrr_at_5 value: 48.832 - type: ndcg_at_1 value: 39.771 - type: ndcg_at_10 value: 50.217999999999996 - type: ndcg_at_100 value: 55.454 - type: ndcg_at_1000 value: 57.37 - type: ndcg_at_3 value: 44.885000000000005 - type: ndcg_at_5 value: 47.419 - type: precision_at_1 value: 39.771 - type: precision_at_10 value: 9.642000000000001 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.536 - type: recall_at_1 value: 32.795 - type: recall_at_10 value: 62.580999999999996 - type: recall_at_100 value: 84.438 - type: recall_at_1000 value: 96.492 - type: recall_at_3 value: 47.071000000000005 - type: recall_at_5 value: 54.079 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 43.334 - type: map_at_100 value: 44.566 - type: map_at_1000 value: 44.702999999999996 - type: map_at_3 value: 40.343 - type: map_at_5 value: 41.983 - type: mrr_at_1 value: 40.764 - type: mrr_at_10 value: 49.382 - type: mrr_at_100 value: 49.988 - type: mrr_at_1000 value: 50.03300000000001 - type: mrr_at_3 value: 47.293 - type: mrr_at_5 value: 48.51 - type: ndcg_at_1 value: 40.764 - type: ndcg_at_10 value: 49.039 - type: ndcg_at_100 value: 53.259 - type: ndcg_at_1000 value: 55.253 - type: ndcg_at_3 value: 45.091 - type: ndcg_at_5 value: 46.839999999999996 - type: precision_at_1 value: 40.764 - type: precision_at_10 value: 9.191 - type: precision_at_100 value: 1.476 - type: precision_at_1000 value: 0.19499999999999998 - type: precision_at_3 value: 21.72 - type: precision_at_5 value: 15.299 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 58.816 - type: recall_at_100 value: 76.654 - type: recall_at_1000 value: 89.05999999999999 - type: recall_at_3 value: 46.743 - type: recall_at_5 value: 51.783 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.328 - type: map_at_10 value: 53.32599999999999 - type: map_at_100 value: 54.37499999999999 - type: map_at_1000 value: 54.429 - type: map_at_3 value: 49.902 - type: map_at_5 value: 52.002 - type: mrr_at_1 value: 46.332 - type: mrr_at_10 value: 56.858 - type: mrr_at_100 value: 57.522 - type: mrr_at_1000 value: 57.54899999999999 - type: mrr_at_3 value: 54.472 - type: mrr_at_5 value: 55.996 - type: ndcg_at_1 value: 46.332 - type: ndcg_at_10 value: 59.313 - type: ndcg_at_100 value: 63.266999999999996 - type: ndcg_at_1000 value: 64.36 - type: ndcg_at_3 value: 53.815000000000005 - type: ndcg_at_5 value: 56.814 - type: precision_at_1 value: 46.332 - type: precision_at_10 value: 9.53 - type: precision_at_100 value: 1.238 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.054000000000002 - type: precision_at_5 value: 16.589000000000002 - type: recall_at_1 value: 40.328 - type: recall_at_10 value: 73.421 - type: recall_at_100 value: 90.059 - type: recall_at_1000 value: 97.81 - type: recall_at_3 value: 59.009 - type: recall_at_5 value: 66.352 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.424 - type: map_at_10 value: 36.332 - type: map_at_100 value: 37.347 - type: map_at_1000 value: 37.422 - type: map_at_3 value: 33.743 - type: map_at_5 value: 35.176 - type: mrr_at_1 value: 29.153000000000002 - type: mrr_at_10 value: 38.233 - type: mrr_at_100 value: 39.109 - type: mrr_at_1000 value: 39.164 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.169000000000004 - type: ndcg_at_1 value: 29.153000000000002 - type: ndcg_at_10 value: 41.439 - type: ndcg_at_100 value: 46.42 - type: ndcg_at_1000 value: 48.242000000000004 - type: ndcg_at_3 value: 36.362 - type: ndcg_at_5 value: 38.743 - type: precision_at_1 value: 29.153000000000002 - type: precision_at_10 value: 6.315999999999999 - type: precision_at_100 value: 0.927 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 15.443000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.424 - type: recall_at_10 value: 55.364000000000004 - type: recall_at_100 value: 78.211 - type: recall_at_1000 value: 91.74600000000001 - type: recall_at_3 value: 41.379 - type: recall_at_5 value: 47.14 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.601 - type: map_at_10 value: 27.826 - type: map_at_100 value: 29.017 - type: map_at_1000 value: 29.137 - type: map_at_3 value: 25.125999999999998 - type: map_at_5 value: 26.765 - type: mrr_at_1 value: 24.005000000000003 - type: mrr_at_10 value: 32.716 - type: mrr_at_100 value: 33.631 - type: mrr_at_1000 value: 33.694 - type: mrr_at_3 value: 29.934 - type: mrr_at_5 value: 31.630999999999997 - type: ndcg_at_1 value: 24.005000000000003 - type: ndcg_at_10 value: 33.158 - type: ndcg_at_100 value: 38.739000000000004 - type: ndcg_at_1000 value: 41.495 - type: ndcg_at_3 value: 28.185 - type: ndcg_at_5 value: 30.796 - type: precision_at_1 value: 24.005000000000003 - type: precision_at_10 value: 5.908 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 13.391 - type: precision_at_5 value: 9.876 - type: recall_at_1 value: 19.601 - type: recall_at_10 value: 44.746 - type: recall_at_100 value: 68.82300000000001 - type: recall_at_1000 value: 88.215 - type: recall_at_3 value: 31.239 - type: recall_at_5 value: 37.695 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.130000000000003 - type: map_at_10 value: 40.96 - type: map_at_100 value: 42.282 - type: map_at_1000 value: 42.392 - type: map_at_3 value: 37.889 - type: map_at_5 value: 39.661 - type: mrr_at_1 value: 36.958999999999996 - type: mrr_at_10 value: 46.835 - type: mrr_at_100 value: 47.644 - type: mrr_at_1000 value: 47.688 - type: mrr_at_3 value: 44.562000000000005 - type: mrr_at_5 value: 45.938 - type: ndcg_at_1 value: 36.958999999999996 - type: ndcg_at_10 value: 47.06 - type: ndcg_at_100 value: 52.345 - type: ndcg_at_1000 value: 54.35 - type: ndcg_at_3 value: 42.301 - type: ndcg_at_5 value: 44.635999999999996 - type: precision_at_1 value: 36.958999999999996 - type: precision_at_10 value: 8.479000000000001 - type: precision_at_100 value: 1.284 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 20.244 - type: precision_at_5 value: 14.224999999999998 - type: recall_at_1 value: 30.130000000000003 - type: recall_at_10 value: 59.27 - type: recall_at_100 value: 81.195 - type: recall_at_1000 value: 94.21199999999999 - type: recall_at_3 value: 45.885 - type: recall_at_5 value: 52.016 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.169999999999998 - type: map_at_10 value: 36.451 - type: map_at_100 value: 37.791000000000004 - type: map_at_1000 value: 37.897 - type: map_at_3 value: 33.109 - type: map_at_5 value: 34.937000000000005 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 42.368 - type: mrr_at_100 value: 43.201 - type: mrr_at_1000 value: 43.259 - type: mrr_at_3 value: 39.763999999999996 - type: mrr_at_5 value: 41.260000000000005 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 42.659000000000006 - type: ndcg_at_100 value: 48.161 - type: ndcg_at_1000 value: 50.345 - type: ndcg_at_3 value: 37.302 - type: ndcg_at_5 value: 39.722 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.9 - type: precision_at_100 value: 1.236 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.846 - type: precision_at_5 value: 12.9 - type: recall_at_1 value: 26.169999999999998 - type: recall_at_10 value: 55.35 - type: recall_at_100 value: 78.755 - type: recall_at_1000 value: 93.518 - type: recall_at_3 value: 40.176 - type: recall_at_5 value: 46.589000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.15516666666667 - type: map_at_10 value: 36.65741666666667 - type: map_at_100 value: 37.84991666666666 - type: map_at_1000 value: 37.96316666666667 - type: map_at_3 value: 33.74974999999999 - type: map_at_5 value: 35.3765 - type: mrr_at_1 value: 32.08233333333334 - type: mrr_at_10 value: 41.033833333333334 - type: mrr_at_100 value: 41.84524999999999 - type: mrr_at_1000 value: 41.89983333333333 - type: mrr_at_3 value: 38.62008333333333 - type: mrr_at_5 value: 40.03441666666666 - type: ndcg_at_1 value: 32.08233333333334 - type: ndcg_at_10 value: 42.229 - type: ndcg_at_100 value: 47.26716666666667 - type: ndcg_at_1000 value: 49.43466666666667 - type: ndcg_at_3 value: 37.36408333333333 - type: ndcg_at_5 value: 39.6715 - type: precision_at_1 value: 32.08233333333334 - type: precision_at_10 value: 7.382583333333334 - type: precision_at_100 value: 1.16625 - type: precision_at_1000 value: 0.15408333333333332 - type: precision_at_3 value: 17.218 - type: precision_at_5 value: 12.21875 - type: recall_at_1 value: 27.15516666666667 - type: recall_at_10 value: 54.36683333333333 - type: recall_at_100 value: 76.37183333333333 - type: recall_at_1000 value: 91.26183333333333 - type: recall_at_3 value: 40.769916666666674 - type: recall_at_5 value: 46.702333333333335 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.749 - type: map_at_10 value: 33.001999999999995 - type: map_at_100 value: 33.891 - type: map_at_1000 value: 33.993 - type: map_at_3 value: 30.703999999999997 - type: map_at_5 value: 31.959 - type: mrr_at_1 value: 28.834 - type: mrr_at_10 value: 35.955 - type: mrr_at_100 value: 36.709 - type: mrr_at_1000 value: 36.779 - type: mrr_at_3 value: 33.947 - type: mrr_at_5 value: 35.089 - type: ndcg_at_1 value: 28.834 - type: ndcg_at_10 value: 37.329 - type: ndcg_at_100 value: 41.79 - type: ndcg_at_1000 value: 44.169000000000004 - type: ndcg_at_3 value: 33.184999999999995 - type: ndcg_at_5 value: 35.107 - type: precision_at_1 value: 28.834 - type: precision_at_10 value: 5.7669999999999995 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 14.213000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 25.749 - type: recall_at_10 value: 47.791 - type: recall_at_100 value: 68.255 - type: recall_at_1000 value: 85.749 - type: recall_at_3 value: 36.199 - type: recall_at_5 value: 41.071999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.777 - type: map_at_10 value: 25.201 - type: map_at_100 value: 26.423999999999996 - type: map_at_1000 value: 26.544 - type: map_at_3 value: 22.869 - type: map_at_5 value: 24.023 - type: mrr_at_1 value: 21.473 - type: mrr_at_10 value: 29.12 - type: mrr_at_100 value: 30.144 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.933 - type: mrr_at_5 value: 28.051 - type: ndcg_at_1 value: 21.473 - type: ndcg_at_10 value: 30.003 - type: ndcg_at_100 value: 35.766 - type: ndcg_at_1000 value: 38.501000000000005 - type: ndcg_at_3 value: 25.773000000000003 - type: ndcg_at_5 value: 27.462999999999997 - type: precision_at_1 value: 21.473 - type: precision_at_10 value: 5.482 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.205 - type: precision_at_5 value: 8.692 - type: recall_at_1 value: 17.777 - type: recall_at_10 value: 40.582 - type: recall_at_100 value: 66.305 - type: recall_at_1000 value: 85.636 - type: recall_at_3 value: 28.687 - type: recall_at_5 value: 33.089 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.677 - type: map_at_10 value: 36.309000000000005 - type: map_at_100 value: 37.403999999999996 - type: map_at_1000 value: 37.496 - type: map_at_3 value: 33.382 - type: map_at_5 value: 34.98 - type: mrr_at_1 value: 31.343 - type: mrr_at_10 value: 40.549 - type: mrr_at_100 value: 41.342 - type: mrr_at_1000 value: 41.397 - type: mrr_at_3 value: 38.029 - type: mrr_at_5 value: 39.451 - type: ndcg_at_1 value: 31.343 - type: ndcg_at_10 value: 42.1 - type: ndcg_at_100 value: 47.089999999999996 - type: ndcg_at_1000 value: 49.222 - type: ndcg_at_3 value: 36.836999999999996 - type: ndcg_at_5 value: 39.21 - type: precision_at_1 value: 31.343 - type: precision_at_10 value: 7.164 - type: precision_at_100 value: 1.0959999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.915 - type: precision_at_5 value: 11.940000000000001 - type: recall_at_1 value: 26.677 - type: recall_at_10 value: 55.54599999999999 - type: recall_at_100 value: 77.094 - type: recall_at_1000 value: 92.01 - type: recall_at_3 value: 41.191 - type: recall_at_5 value: 47.006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.501 - type: map_at_10 value: 33.102 - type: map_at_100 value: 34.676 - type: map_at_1000 value: 34.888000000000005 - type: map_at_3 value: 29.944 - type: map_at_5 value: 31.613999999999997 - type: mrr_at_1 value: 29.447000000000003 - type: mrr_at_10 value: 37.996 - type: mrr_at_100 value: 38.946 - type: mrr_at_1000 value: 38.995000000000005 - type: mrr_at_3 value: 35.079 - type: mrr_at_5 value: 36.69 - type: ndcg_at_1 value: 29.447000000000003 - type: ndcg_at_10 value: 39.232 - type: ndcg_at_100 value: 45.247 - type: ndcg_at_1000 value: 47.613 - type: ndcg_at_3 value: 33.922999999999995 - type: ndcg_at_5 value: 36.284 - type: precision_at_1 value: 29.447000000000003 - type: precision_at_10 value: 7.648000000000001 - type: precision_at_100 value: 1.516 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 11.779 - type: recall_at_1 value: 24.501 - type: recall_at_10 value: 51.18899999999999 - type: recall_at_100 value: 78.437 - type: recall_at_1000 value: 92.842 - type: recall_at_3 value: 35.808 - type: recall_at_5 value: 42.197 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.039 - type: map_at_10 value: 30.377 - type: map_at_100 value: 31.275 - type: map_at_1000 value: 31.379 - type: map_at_3 value: 27.98 - type: map_at_5 value: 29.358 - type: mrr_at_1 value: 24.03 - type: mrr_at_10 value: 32.568000000000005 - type: mrr_at_100 value: 33.403 - type: mrr_at_1000 value: 33.475 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 31.796000000000003 - type: ndcg_at_1 value: 24.03 - type: ndcg_at_10 value: 35.198 - type: ndcg_at_100 value: 39.668 - type: ndcg_at_1000 value: 42.296 - type: ndcg_at_3 value: 30.709999999999997 - type: ndcg_at_5 value: 33.024 - type: precision_at_1 value: 24.03 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.828 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 13.309000000000001 - type: precision_at_5 value: 9.39 - type: recall_at_1 value: 22.039 - type: recall_at_10 value: 47.746 - type: recall_at_100 value: 68.23599999999999 - type: recall_at_1000 value: 87.852 - type: recall_at_3 value: 35.852000000000004 - type: recall_at_5 value: 41.410000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 15.692999999999998 - type: map_at_10 value: 26.903 - type: map_at_100 value: 28.987000000000002 - type: map_at_1000 value: 29.176999999999996 - type: map_at_3 value: 22.137 - type: map_at_5 value: 24.758 - type: mrr_at_1 value: 35.57 - type: mrr_at_10 value: 47.821999999999996 - type: mrr_at_100 value: 48.608000000000004 - type: mrr_at_1000 value: 48.638999999999996 - type: mrr_at_3 value: 44.452000000000005 - type: mrr_at_5 value: 46.546 - type: ndcg_at_1 value: 35.57 - type: ndcg_at_10 value: 36.567 - type: ndcg_at_100 value: 44.085 - type: ndcg_at_1000 value: 47.24 - type: ndcg_at_3 value: 29.964000000000002 - type: ndcg_at_5 value: 32.511 - type: precision_at_1 value: 35.57 - type: precision_at_10 value: 11.485 - type: precision_at_100 value: 1.9619999999999997 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 22.237000000000002 - type: precision_at_5 value: 17.471999999999998 - type: recall_at_1 value: 15.692999999999998 - type: recall_at_10 value: 43.056 - type: recall_at_100 value: 68.628 - type: recall_at_1000 value: 86.075 - type: recall_at_3 value: 26.918999999999997 - type: recall_at_5 value: 34.14 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.53 - type: map_at_10 value: 20.951 - type: map_at_100 value: 30.136000000000003 - type: map_at_1000 value: 31.801000000000002 - type: map_at_3 value: 15.021 - type: map_at_5 value: 17.471999999999998 - type: mrr_at_1 value: 71.0 - type: mrr_at_10 value: 79.176 - type: mrr_at_100 value: 79.418 - type: mrr_at_1000 value: 79.426 - type: mrr_at_3 value: 78.125 - type: mrr_at_5 value: 78.61200000000001 - type: ndcg_at_1 value: 58.5 - type: ndcg_at_10 value: 44.106 - type: ndcg_at_100 value: 49.268 - type: ndcg_at_1000 value: 56.711999999999996 - type: ndcg_at_3 value: 48.934 - type: ndcg_at_5 value: 45.826 - type: precision_at_1 value: 71.0 - type: precision_at_10 value: 35.0 - type: precision_at_100 value: 11.360000000000001 - type: precision_at_1000 value: 2.046 - type: precision_at_3 value: 52.833 - type: precision_at_5 value: 44.15 - type: recall_at_1 value: 9.53 - type: recall_at_10 value: 26.811 - type: recall_at_100 value: 55.916999999999994 - type: recall_at_1000 value: 79.973 - type: recall_at_3 value: 16.413 - type: recall_at_5 value: 19.980999999999998 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.519999999999996 - type: f1 value: 46.36601294761231 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.413 - type: map_at_10 value: 83.414 - type: map_at_100 value: 83.621 - type: map_at_1000 value: 83.635 - type: map_at_3 value: 82.337 - type: map_at_5 value: 83.039 - type: mrr_at_1 value: 80.19800000000001 - type: mrr_at_10 value: 87.715 - type: mrr_at_100 value: 87.778 - type: mrr_at_1000 value: 87.779 - type: mrr_at_3 value: 87.106 - type: mrr_at_5 value: 87.555 - type: ndcg_at_1 value: 80.19800000000001 - type: ndcg_at_10 value: 87.182 - type: ndcg_at_100 value: 87.90299999999999 - type: ndcg_at_1000 value: 88.143 - type: ndcg_at_3 value: 85.60600000000001 - type: ndcg_at_5 value: 86.541 - type: precision_at_1 value: 80.19800000000001 - type: precision_at_10 value: 10.531 - type: precision_at_100 value: 1.113 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.933 - type: precision_at_5 value: 20.429 - type: recall_at_1 value: 74.413 - type: recall_at_10 value: 94.363 - type: recall_at_100 value: 97.165 - type: recall_at_1000 value: 98.668 - type: recall_at_3 value: 90.108 - type: recall_at_5 value: 92.52 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.701 - type: map_at_10 value: 37.122 - type: map_at_100 value: 39.178000000000004 - type: map_at_1000 value: 39.326 - type: map_at_3 value: 32.971000000000004 - type: map_at_5 value: 35.332 - type: mrr_at_1 value: 44.753 - type: mrr_at_10 value: 53.452 - type: mrr_at_100 value: 54.198 - type: mrr_at_1000 value: 54.225 - type: mrr_at_3 value: 50.952 - type: mrr_at_5 value: 52.464 - type: ndcg_at_1 value: 44.753 - type: ndcg_at_10 value: 45.021 - type: ndcg_at_100 value: 52.028 - type: ndcg_at_1000 value: 54.596000000000004 - type: ndcg_at_3 value: 41.622 - type: ndcg_at_5 value: 42.736000000000004 - type: precision_at_1 value: 44.753 - type: precision_at_10 value: 12.284 - type: precision_at_100 value: 1.955 - type: precision_at_1000 value: 0.243 - type: precision_at_3 value: 27.828999999999997 - type: precision_at_5 value: 20.061999999999998 - type: recall_at_1 value: 22.701 - type: recall_at_10 value: 51.432 - type: recall_at_100 value: 77.009 - type: recall_at_1000 value: 92.511 - type: recall_at_3 value: 37.919000000000004 - type: recall_at_5 value: 44.131 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.189 - type: map_at_10 value: 66.24600000000001 - type: map_at_100 value: 67.098 - type: map_at_1000 value: 67.149 - type: map_at_3 value: 62.684 - type: map_at_5 value: 64.974 - type: mrr_at_1 value: 80.378 - type: mrr_at_10 value: 86.127 - type: mrr_at_100 value: 86.29299999999999 - type: mrr_at_1000 value: 86.297 - type: mrr_at_3 value: 85.31400000000001 - type: mrr_at_5 value: 85.858 - type: ndcg_at_1 value: 80.378 - type: ndcg_at_10 value: 74.101 - type: ndcg_at_100 value: 76.993 - type: ndcg_at_1000 value: 77.948 - type: ndcg_at_3 value: 69.232 - type: ndcg_at_5 value: 72.04599999999999 - type: precision_at_1 value: 80.378 - type: precision_at_10 value: 15.595999999999998 - type: precision_at_100 value: 1.7840000000000003 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 44.884 - type: precision_at_5 value: 29.145 - type: recall_at_1 value: 40.189 - type: recall_at_10 value: 77.981 - type: recall_at_100 value: 89.21 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 67.326 - type: recall_at_5 value: 72.863 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.84599999999999 - type: ap value: 89.4710787567357 - type: f1 value: 92.83752676932258 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.132 - type: map_at_10 value: 35.543 - type: map_at_100 value: 36.702 - type: map_at_1000 value: 36.748999999999995 - type: map_at_3 value: 31.737 - type: map_at_5 value: 33.927 - type: mrr_at_1 value: 23.782 - type: mrr_at_10 value: 36.204 - type: mrr_at_100 value: 37.29 - type: mrr_at_1000 value: 37.330999999999996 - type: mrr_at_3 value: 32.458999999999996 - type: mrr_at_5 value: 34.631 - type: ndcg_at_1 value: 23.782 - type: ndcg_at_10 value: 42.492999999999995 - type: ndcg_at_100 value: 47.985 - type: ndcg_at_1000 value: 49.141 - type: ndcg_at_3 value: 34.748000000000005 - type: ndcg_at_5 value: 38.651 - type: precision_at_1 value: 23.782 - type: precision_at_10 value: 6.665 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.776 - type: precision_at_5 value: 10.84 - type: recall_at_1 value: 23.132 - type: recall_at_10 value: 63.794 - type: recall_at_100 value: 89.027 - type: recall_at_1000 value: 97.807 - type: recall_at_3 value: 42.765 - type: recall_at_5 value: 52.11 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.59188326493388 - type: f1 value: 94.3842594786827 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.49384404924761 - type: f1 value: 59.7580539534629 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.56220578345663 - type: f1 value: 75.27228165561478 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.53463349024884 - type: f1 value: 80.4893958236536 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.56100273484962 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.470380028839607 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.06102792457849 - type: mrr value: 33.30709199672238 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.776999999999999 - type: map_at_10 value: 14.924000000000001 - type: map_at_100 value: 18.955 - type: map_at_1000 value: 20.538999999999998 - type: map_at_3 value: 10.982 - type: map_at_5 value: 12.679000000000002 - type: mrr_at_1 value: 47.988 - type: mrr_at_10 value: 57.232000000000006 - type: mrr_at_100 value: 57.818999999999996 - type: mrr_at_1000 value: 57.847 - type: mrr_at_3 value: 54.901999999999994 - type: mrr_at_5 value: 56.481 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 38.129000000000005 - type: ndcg_at_100 value: 35.54 - type: ndcg_at_1000 value: 44.172 - type: ndcg_at_3 value: 43.025999999999996 - type: ndcg_at_5 value: 41.052 - type: precision_at_1 value: 47.988 - type: precision_at_10 value: 28.111000000000004 - type: precision_at_100 value: 8.929 - type: precision_at_1000 value: 2.185 - type: precision_at_3 value: 40.144000000000005 - type: precision_at_5 value: 35.232 - type: recall_at_1 value: 6.776999999999999 - type: recall_at_10 value: 19.289 - type: recall_at_100 value: 36.359 - type: recall_at_1000 value: 67.54 - type: recall_at_3 value: 11.869 - type: recall_at_5 value: 14.999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.108000000000004 - type: map_at_10 value: 47.126000000000005 - type: map_at_100 value: 48.171 - type: map_at_1000 value: 48.199 - type: map_at_3 value: 42.734 - type: map_at_5 value: 45.362 - type: mrr_at_1 value: 34.936 - type: mrr_at_10 value: 49.571 - type: mrr_at_100 value: 50.345 - type: mrr_at_1000 value: 50.363 - type: mrr_at_3 value: 45.959 - type: mrr_at_5 value: 48.165 - type: ndcg_at_1 value: 34.936 - type: ndcg_at_10 value: 55.028999999999996 - type: ndcg_at_100 value: 59.244 - type: ndcg_at_1000 value: 59.861 - type: ndcg_at_3 value: 46.872 - type: ndcg_at_5 value: 51.217999999999996 - type: precision_at_1 value: 34.936 - type: precision_at_10 value: 9.099 - type: precision_at_100 value: 1.145 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 21.456 - type: precision_at_5 value: 15.411 - type: recall_at_1 value: 31.108000000000004 - type: recall_at_10 value: 76.53999999999999 - type: recall_at_100 value: 94.39 - type: recall_at_1000 value: 98.947 - type: recall_at_3 value: 55.572 - type: recall_at_5 value: 65.525 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.56400000000001 - type: map_at_10 value: 85.482 - type: map_at_100 value: 86.114 - type: map_at_1000 value: 86.13 - type: map_at_3 value: 82.607 - type: map_at_5 value: 84.405 - type: mrr_at_1 value: 82.42 - type: mrr_at_10 value: 88.304 - type: mrr_at_100 value: 88.399 - type: mrr_at_1000 value: 88.399 - type: mrr_at_3 value: 87.37 - type: mrr_at_5 value: 88.024 - type: ndcg_at_1 value: 82.45 - type: ndcg_at_10 value: 89.06500000000001 - type: ndcg_at_100 value: 90.232 - type: ndcg_at_1000 value: 90.305 - type: ndcg_at_3 value: 86.375 - type: ndcg_at_5 value: 87.85300000000001 - type: precision_at_1 value: 82.45 - type: precision_at_10 value: 13.486999999999998 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.813 - type: precision_at_5 value: 24.773999999999997 - type: recall_at_1 value: 71.56400000000001 - type: recall_at_10 value: 95.812 - type: recall_at_100 value: 99.7 - type: recall_at_1000 value: 99.979 - type: recall_at_3 value: 87.966 - type: recall_at_5 value: 92.268 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 57.241876648614145 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 64.66212576446223 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.308 - type: map_at_10 value: 13.803 - type: map_at_100 value: 16.176 - type: map_at_1000 value: 16.561 - type: map_at_3 value: 9.761000000000001 - type: map_at_5 value: 11.802 - type: mrr_at_1 value: 26.200000000000003 - type: mrr_at_10 value: 37.621 - type: mrr_at_100 value: 38.767 - type: mrr_at_1000 value: 38.815 - type: mrr_at_3 value: 34.117 - type: mrr_at_5 value: 36.107 - type: ndcg_at_1 value: 26.200000000000003 - type: ndcg_at_10 value: 22.64 - type: ndcg_at_100 value: 31.567 - type: ndcg_at_1000 value: 37.623 - type: ndcg_at_3 value: 21.435000000000002 - type: ndcg_at_5 value: 18.87 - type: precision_at_1 value: 26.200000000000003 - type: precision_at_10 value: 11.74 - type: precision_at_100 value: 2.465 - type: precision_at_1000 value: 0.391 - type: precision_at_3 value: 20.033 - type: precision_at_5 value: 16.64 - type: recall_at_1 value: 5.308 - type: recall_at_10 value: 23.794999999999998 - type: recall_at_100 value: 50.015 - type: recall_at_1000 value: 79.283 - type: recall_at_3 value: 12.178 - type: recall_at_5 value: 16.882 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.93231134675553 - type: cos_sim_spearman value: 81.68319292603205 - type: euclidean_pearson value: 81.8396814380367 - type: euclidean_spearman value: 81.24641903349945 - type: manhattan_pearson value: 81.84698799204274 - type: manhattan_spearman value: 81.24269997904105 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.73241671587446 - type: cos_sim_spearman value: 79.05091082971826 - type: euclidean_pearson value: 83.91146869578044 - type: euclidean_spearman value: 79.87978465370936 - type: manhattan_pearson value: 83.90888338917678 - type: manhattan_spearman value: 79.87482848584241 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 85.14970731146177 - type: cos_sim_spearman value: 86.37363490084627 - type: euclidean_pearson value: 83.02154218530433 - type: euclidean_spearman value: 83.80258761957367 - type: manhattan_pearson value: 83.01664495119347 - type: manhattan_spearman value: 83.77567458007952 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.40474139886784 - type: cos_sim_spearman value: 82.77768789165984 - type: euclidean_pearson value: 80.7065877443695 - type: euclidean_spearman value: 81.375940662505 - type: manhattan_pearson value: 80.6507552270278 - type: manhattan_spearman value: 81.32782179098741 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.08585968722274 - type: cos_sim_spearman value: 88.03110031451399 - type: euclidean_pearson value: 85.74012019602384 - type: euclidean_spearman value: 86.13592849438209 - type: manhattan_pearson value: 85.74404842369206 - type: manhattan_spearman value: 86.14492318960154 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.95069052788875 - type: cos_sim_spearman value: 86.4867991595147 - type: euclidean_pearson value: 84.31013325754635 - type: euclidean_spearman value: 85.01529258006482 - type: manhattan_pearson value: 84.26995570085374 - type: manhattan_spearman value: 84.96982104986162 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.54617647971897 - type: cos_sim_spearman value: 87.49834181751034 - type: euclidean_pearson value: 86.01015322577122 - type: euclidean_spearman value: 84.63362652063199 - type: manhattan_pearson value: 86.13807574475706 - type: manhattan_spearman value: 84.7772370721132 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.20047755786615 - type: cos_sim_spearman value: 67.05324077987636 - type: euclidean_pearson value: 66.91930642976601 - type: euclidean_spearman value: 65.21491856099105 - type: manhattan_pearson value: 66.78756851976624 - type: manhattan_spearman value: 65.12356257740728 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.19852871539686 - type: cos_sim_spearman value: 87.5161895296395 - type: euclidean_pearson value: 84.59848645207485 - type: euclidean_spearman value: 85.26427328757919 - type: manhattan_pearson value: 84.59747366996524 - type: manhattan_spearman value: 85.24045855146915 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.63320317811032 - type: mrr value: 96.26242947321379 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 60.928000000000004 - type: map_at_10 value: 70.112 - type: map_at_100 value: 70.59299999999999 - type: map_at_1000 value: 70.623 - type: map_at_3 value: 66.846 - type: map_at_5 value: 68.447 - type: mrr_at_1 value: 64.0 - type: mrr_at_10 value: 71.212 - type: mrr_at_100 value: 71.616 - type: mrr_at_1000 value: 71.64500000000001 - type: mrr_at_3 value: 68.77799999999999 - type: mrr_at_5 value: 70.094 - type: ndcg_at_1 value: 64.0 - type: ndcg_at_10 value: 74.607 - type: ndcg_at_100 value: 76.416 - type: ndcg_at_1000 value: 77.102 - type: ndcg_at_3 value: 69.126 - type: ndcg_at_5 value: 71.41300000000001 - type: precision_at_1 value: 64.0 - type: precision_at_10 value: 9.933 - type: precision_at_100 value: 1.077 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.556 - type: precision_at_5 value: 17.467 - type: recall_at_1 value: 60.928000000000004 - type: recall_at_10 value: 87.322 - type: recall_at_100 value: 94.833 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 72.628 - type: recall_at_5 value: 78.428 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86237623762376 - type: cos_sim_ap value: 96.72586477206649 - type: cos_sim_f1 value: 93.01858362631845 - type: cos_sim_precision value: 93.4409687184662 - type: cos_sim_recall value: 92.60000000000001 - type: dot_accuracy value: 99.78019801980199 - type: dot_ap value: 93.72748205246228 - type: dot_f1 value: 89.04109589041096 - type: dot_precision value: 87.16475095785441 - type: dot_recall value: 91.0 - type: euclidean_accuracy value: 99.85445544554456 - type: euclidean_ap value: 96.6661459876145 - type: euclidean_f1 value: 92.58337481333997 - type: euclidean_precision value: 92.17046580773042 - type: euclidean_recall value: 93.0 - type: manhattan_accuracy value: 99.85445544554456 - type: manhattan_ap value: 96.6883549244056 - type: manhattan_f1 value: 92.57598405580468 - type: manhattan_precision value: 92.25422045680239 - type: manhattan_recall value: 92.9 - type: max_accuracy value: 99.86237623762376 - type: max_ap value: 96.72586477206649 - type: max_f1 value: 93.01858362631845 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.39930057069995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.96398659903402 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.946944700355395 - type: mrr value: 56.97151398438164 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.541657650692905 - type: cos_sim_spearman value: 31.605804192286303 - type: dot_pearson value: 28.26905996736398 - type: dot_spearman value: 27.864801765851187 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22599999999999998 - type: map_at_10 value: 1.8870000000000002 - type: map_at_100 value: 9.78 - type: map_at_1000 value: 22.514 - type: map_at_3 value: 0.6669999999999999 - type: map_at_5 value: 1.077 - type: mrr_at_1 value: 82.0 - type: mrr_at_10 value: 89.86699999999999 - type: mrr_at_100 value: 89.86699999999999 - type: mrr_at_1000 value: 89.86699999999999 - type: mrr_at_3 value: 89.667 - type: mrr_at_5 value: 89.667 - type: ndcg_at_1 value: 79.0 - type: ndcg_at_10 value: 74.818 - type: ndcg_at_100 value: 53.715999999999994 - type: ndcg_at_1000 value: 47.082 - type: ndcg_at_3 value: 82.134 - type: ndcg_at_5 value: 79.81899999999999 - type: precision_at_1 value: 82.0 - type: precision_at_10 value: 78.0 - type: precision_at_100 value: 54.48 - type: precision_at_1000 value: 20.518 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 85.2 - type: recall_at_1 value: 0.22599999999999998 - type: recall_at_10 value: 2.072 - type: recall_at_100 value: 13.013 - type: recall_at_1000 value: 43.462 - type: recall_at_3 value: 0.695 - type: recall_at_5 value: 1.139 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.328 - type: map_at_10 value: 9.795 - type: map_at_100 value: 15.801000000000002 - type: map_at_1000 value: 17.23 - type: map_at_3 value: 4.734 - type: map_at_5 value: 6.644 - type: mrr_at_1 value: 30.612000000000002 - type: mrr_at_10 value: 46.902 - type: mrr_at_100 value: 47.495 - type: mrr_at_1000 value: 47.495 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.218 - type: ndcg_at_1 value: 28.571 - type: ndcg_at_10 value: 24.806 - type: ndcg_at_100 value: 36.419000000000004 - type: ndcg_at_1000 value: 47.272999999999996 - type: ndcg_at_3 value: 25.666 - type: ndcg_at_5 value: 25.448999999999998 - type: precision_at_1 value: 30.612000000000002 - type: precision_at_10 value: 23.061 - type: precision_at_100 value: 7.714 - type: precision_at_1000 value: 1.484 - type: precision_at_3 value: 26.531 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.328 - type: recall_at_10 value: 16.524 - type: recall_at_100 value: 47.179 - type: recall_at_1000 value: 81.22200000000001 - type: recall_at_3 value: 5.745 - type: recall_at_5 value: 9.339 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.9142 - type: ap value: 14.335574772555415 - type: f1 value: 54.62839595194111 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.94340690435768 - type: f1 value: 60.286487936731916 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.26597708987974 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.48882398521786 - type: cos_sim_ap value: 79.04326607602204 - type: cos_sim_f1 value: 71.64566826860633 - type: cos_sim_precision value: 70.55512918905092 - type: cos_sim_recall value: 72.77044854881267 - type: dot_accuracy value: 84.19264469213805 - type: dot_ap value: 67.96360043562528 - type: dot_f1 value: 64.06418393006827 - type: dot_precision value: 58.64941898706424 - type: dot_recall value: 70.58047493403694 - type: euclidean_accuracy value: 87.45902127913214 - type: euclidean_ap value: 78.9742237648272 - type: euclidean_f1 value: 71.5553235908142 - type: euclidean_precision value: 70.77955601445535 - type: euclidean_recall value: 72.34828496042216 - type: manhattan_accuracy value: 87.41729749061214 - type: manhattan_ap value: 78.90073137580596 - type: manhattan_f1 value: 71.3942611553533 - type: manhattan_precision value: 68.52705653967483 - type: manhattan_recall value: 74.51187335092348 - type: max_accuracy value: 87.48882398521786 - type: max_ap value: 79.04326607602204 - type: max_f1 value: 71.64566826860633 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.68125897465751 - type: cos_sim_ap value: 85.6003454431979 - type: cos_sim_f1 value: 77.6957163958641 - type: cos_sim_precision value: 73.0110366307807 - type: cos_sim_recall value: 83.02279026793964 - type: dot_accuracy value: 87.7672992587418 - type: dot_ap value: 82.4971301112899 - type: dot_f1 value: 75.90528233151184 - type: dot_precision value: 72.0370626469368 - type: dot_recall value: 80.21250384970742 - type: euclidean_accuracy value: 88.4503434625684 - type: euclidean_ap value: 84.91949884748384 - type: euclidean_f1 value: 76.92365018444684 - type: euclidean_precision value: 74.53245721712759 - type: euclidean_recall value: 79.47336002463813 - type: manhattan_accuracy value: 88.47556952691427 - type: manhattan_ap value: 84.8963689101517 - type: manhattan_f1 value: 76.85901249256395 - type: manhattan_precision value: 74.31693989071039 - type: manhattan_recall value: 79.58115183246073 - type: max_accuracy value: 88.68125897465751 - type: max_ap value: 85.6003454431979 - type: max_f1 value: 77.6957163958641 license: mit language: - en --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model that supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
28 |
+
"model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and similarity measurement."
|
29 |
+
}
|
data/model_data_json/BAAI_bge-large-en.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-large-en",
|
3 |
+
"downloads": 501936,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"bert",
|
9 |
+
"feature-extraction",
|
10 |
+
"mteb",
|
11 |
+
"sentence-transfomres",
|
12 |
+
"en",
|
13 |
+
"arxiv:2310.07554",
|
14 |
+
"arxiv:2309.07597",
|
15 |
+
"license:mit",
|
16 |
+
"model-index",
|
17 |
+
"text-embeddings-inference",
|
18 |
+
"endpoints_compatible",
|
19 |
+
"region:us"
|
20 |
+
],
|
21 |
+
"description": "--- tags: - mteb - sentence-transfomres - transformers model-index: - name: bge-large-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.94029850746269 - type: ap value: 40.00228964744091 - type: f1 value: 70.86088267934595 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.93745 - type: ap value: 88.24758534667426 - type: f1 value: 91.91033034217591 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.158 - type: f1 value: 45.78935185074774 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 39.972 - type: map_at_10 value: 54.874 - type: map_at_100 value: 55.53399999999999 - type: map_at_1000 value: 55.539 - type: map_at_3 value: 51.031000000000006 - type: map_at_5 value: 53.342999999999996 - type: mrr_at_1 value: 40.541 - type: mrr_at_10 value: 55.096000000000004 - type: mrr_at_100 value: 55.75599999999999 - type: mrr_at_1000 value: 55.761 - type: mrr_at_3 value: 51.221000000000004 - type: mrr_at_5 value: 53.568000000000005 - type: ndcg_at_1 value: 39.972 - type: ndcg_at_10 value: 62.456999999999994 - type: ndcg_at_100 value: 65.262 - type: ndcg_at_1000 value: 65.389 - type: ndcg_at_3 value: 54.673 - type: ndcg_at_5 value: 58.80499999999999 - type: precision_at_1 value: 39.972 - type: precision_at_10 value: 8.634 - type: precision_at_100 value: 0.9860000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.740000000000002 - type: precision_at_5 value: 15.036 - type: recall_at_1 value: 39.972 - type: recall_at_10 value: 86.344 - type: recall_at_100 value: 98.578 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 65.22 - type: recall_at_5 value: 75.178 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.94652870403906 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.17257160340209 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.97867370559182 - type: mrr value: 77.00820032537484 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 80.00986015960616 - type: cos_sim_spearman value: 80.36387933827882 - type: euclidean_pearson value: 80.32305287257296 - type: euclidean_spearman value: 82.0524720308763 - type: manhattan_pearson value: 80.19847473906454 - type: manhattan_spearman value: 81.87957652506985 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 88.00000000000001 - type: f1 value: 87.99039027511853 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 41.36932844640705 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 38.34983239611985 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.257999999999996 - type: map_at_10 value: 42.937 - type: map_at_100 value: 44.406 - type: map_at_1000 value: 44.536 - type: map_at_3 value: 39.22 - type: map_at_5 value: 41.458 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.701 - type: mrr_at_100 value: 49.431000000000004 - type: mrr_at_1000 value: 49.476 - type: mrr_at_3 value: 45.875 - type: mrr_at_5 value: 47.67 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.35 - type: ndcg_at_100 value: 54.618 - type: ndcg_at_1000 value: 56.655 - type: ndcg_at_3 value: 43.826 - type: ndcg_at_5 value: 46.72 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.328 - type: precision_at_100 value: 1.484 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 20.649 - type: precision_at_5 value: 15.25 - type: recall_at_1 value: 32.257999999999996 - type: recall_at_10 value: 61.849 - type: recall_at_100 value: 83.70400000000001 - type: recall_at_1000 value: 96.344 - type: recall_at_3 value: 46.037 - type: recall_at_5 value: 53.724000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.979 - type: map_at_10 value: 43.376999999999995 - type: map_at_100 value: 44.667 - type: map_at_1000 value: 44.794 - type: map_at_3 value: 40.461999999999996 - type: map_at_5 value: 42.138 - type: mrr_at_1 value: 41.146 - type: mrr_at_10 value: 49.575 - type: mrr_at_100 value: 50.187000000000005 - type: mrr_at_1000 value: 50.231 - type: mrr_at_3 value: 47.601 - type: mrr_at_5 value: 48.786 - type: ndcg_at_1 value: 41.146 - type: ndcg_at_10 value: 48.957 - type: ndcg_at_100 value: 53.296 - type: ndcg_at_1000 value: 55.254000000000005 - type: ndcg_at_3 value: 45.235 - type: ndcg_at_5 value: 47.014 - type: precision_at_1 value: 41.146 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.481 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 21.783 - type: precision_at_5 value: 15.274 - type: recall_at_1 value: 32.979 - type: recall_at_10 value: 58.167 - type: recall_at_100 value: 76.374 - type: recall_at_1000 value: 88.836 - type: recall_at_3 value: 46.838 - type: recall_at_5 value: 52.006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.326 - type: map_at_10 value: 53.468 - type: map_at_100 value: 54.454 - type: map_at_1000 value: 54.508 - type: map_at_3 value: 50.12799999999999 - type: map_at_5 value: 51.991 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 57.016999999999996 - type: mrr_at_100 value: 57.67099999999999 - type: mrr_at_1000 value: 57.699999999999996 - type: mrr_at_3 value: 54.65 - type: mrr_at_5 value: 56.101 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 59.507 - type: ndcg_at_100 value: 63.31099999999999 - type: ndcg_at_1000 value: 64.388 - type: ndcg_at_3 value: 54.04600000000001 - type: ndcg_at_5 value: 56.723 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.567 - type: precision_at_100 value: 1.234 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.117 - type: precision_at_5 value: 16.426 - type: recall_at_1 value: 40.326 - type: recall_at_10 value: 73.763 - type: recall_at_100 value: 89.927 - type: recall_at_1000 value: 97.509 - type: recall_at_3 value: 59.34 - type: recall_at_5 value: 65.915 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.661 - type: map_at_10 value: 35.522 - type: map_at_100 value: 36.619 - type: map_at_1000 value: 36.693999999999996 - type: map_at_3 value: 33.154 - type: map_at_5 value: 34.353 - type: mrr_at_1 value: 28.362 - type: mrr_at_10 value: 37.403999999999996 - type: mrr_at_100 value: 38.374 - type: mrr_at_1000 value: 38.428000000000004 - type: mrr_at_3 value: 35.235 - type: mrr_at_5 value: 36.269 - type: ndcg_at_1 value: 28.362 - type: ndcg_at_10 value: 40.431 - type: ndcg_at_100 value: 45.745999999999995 - type: ndcg_at_1000 value: 47.493 - type: ndcg_at_3 value: 35.733 - type: ndcg_at_5 value: 37.722 - type: precision_at_1 value: 28.362 - type: precision_at_10 value: 6.101999999999999 - type: precision_at_100 value: 0.922 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.140999999999998 - type: precision_at_5 value: 10.305 - type: recall_at_1 value: 26.661 - type: recall_at_10 value: 53.675 - type: recall_at_100 value: 77.891 - type: recall_at_1000 value: 90.72 - type: recall_at_3 value: 40.751 - type: recall_at_5 value: 45.517 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.886 - type: map_at_10 value: 27.288 - type: map_at_100 value: 28.327999999999996 - type: map_at_1000 value: 28.438999999999997 - type: map_at_3 value: 24.453 - type: map_at_5 value: 25.959 - type: mrr_at_1 value: 23.134 - type: mrr_at_10 value: 32.004 - type: mrr_at_100 value: 32.789 - type: mrr_at_1000 value: 32.857 - type: mrr_at_3 value: 29.084 - type: mrr_at_5 value: 30.614 - type: ndcg_at_1 value: 23.134 - type: ndcg_at_10 value: 32.852 - type: ndcg_at_100 value: 37.972 - type: ndcg_at_1000 value: 40.656 - type: ndcg_at_3 value: 27.435 - type: ndcg_at_5 value: 29.823 - type: precision_at_1 value: 23.134 - type: precision_at_10 value: 6.032 - type: precision_at_100 value: 0.9950000000000001 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 13.017999999999999 - type: precision_at_5 value: 9.501999999999999 - type: recall_at_1 value: 18.886 - type: recall_at_10 value: 45.34 - type: recall_at_100 value: 67.947 - type: recall_at_1000 value: 86.924 - type: recall_at_3 value: 30.535 - type: recall_at_5 value: 36.451 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.994999999999997 - type: map_at_10 value: 40.04 - type: map_at_100 value: 41.435 - type: map_at_1000 value: 41.537 - type: map_at_3 value: 37.091 - type: map_at_5 value: 38.802 - type: mrr_at_1 value: 35.034 - type: mrr_at_10 value: 45.411 - type: mrr_at_100 value: 46.226 - type: mrr_at_1000 value: 46.27 - type: mrr_at_3 value: 43.086 - type: mrr_at_5 value: 44.452999999999996 - type: ndcg_at_1 value: 35.034 - type: ndcg_at_10 value: 46.076 - type: ndcg_at_100 value: 51.483000000000004 - type: ndcg_at_1000 value: 53.433 - type: ndcg_at_3 value: 41.304 - type: ndcg_at_5 value: 43.641999999999996 - type: precision_at_1 value: 35.034 - type: precision_at_10 value: 8.258000000000001 - type: precision_at_100 value: 1.268 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.57 - type: precision_at_5 value: 13.782 - type: recall_at_1 value: 28.994999999999997 - type: recall_at_10 value: 58.538000000000004 - type: recall_at_100 value: 80.72399999999999 - type: recall_at_1000 value: 93.462 - type: recall_at_3 value: 45.199 - type: recall_at_5 value: 51.237 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.795 - type: map_at_10 value: 34.935 - type: map_at_100 value: 36.306 - type: map_at_1000 value: 36.417 - type: map_at_3 value: 31.831 - type: map_at_5 value: 33.626 - type: mrr_at_1 value: 30.479 - type: mrr_at_10 value: 40.225 - type: mrr_at_100 value: 41.055 - type: mrr_at_1000 value: 41.114 - type: mrr_at_3 value: 37.538 - type: mrr_at_5 value: 39.073 - type: ndcg_at_1 value: 30.479 - type: ndcg_at_10 value: 40.949999999999996 - type: ndcg_at_100 value: 46.525 - type: ndcg_at_1000 value: 48.892 - type: ndcg_at_3 value: 35.79 - type: ndcg_at_5 value: 38.237 - type: precision_at_1 value: 30.479 - type: precision_at_10 value: 7.6259999999999994 - type: precision_at_100 value: 1.203 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 17.199 - type: precision_at_5 value: 12.466000000000001 - type: recall_at_1 value: 24.795 - type: recall_at_10 value: 53.421 - type: recall_at_100 value: 77.189 - type: recall_at_1000 value: 93.407 - type: recall_at_3 value: 39.051 - type: recall_at_5 value: 45.462 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.853499999999997 - type: map_at_10 value: 36.20433333333333 - type: map_at_100 value: 37.40391666666667 - type: map_at_1000 value: 37.515 - type: map_at_3 value: 33.39975 - type: map_at_5 value: 34.9665 - type: mrr_at_1 value: 31.62666666666667 - type: mrr_at_10 value: 40.436749999999996 - type: mrr_at_100 value: 41.260333333333335 - type: mrr_at_1000 value: 41.31525 - type: mrr_at_3 value: 38.06733333333332 - type: mrr_at_5 value: 39.41541666666667 - type: ndcg_at_1 value: 31.62666666666667 - type: ndcg_at_10 value: 41.63341666666667 - type: ndcg_at_100 value: 46.704166666666666 - type: ndcg_at_1000 value: 48.88483333333335 - type: ndcg_at_3 value: 36.896 - type: ndcg_at_5 value: 39.11891666666667 - type: precision_at_1 value: 31.62666666666667 - type: precision_at_10 value: 7.241083333333333 - type: precision_at_100 value: 1.1488333333333334 - type: precision_at_1000 value: 0.15250000000000002 - type: precision_at_3 value: 16.908333333333335 - type: precision_at_5 value: 11.942833333333333 - type: recall_at_1 value: 26.853499999999997 - type: recall_at_10 value: 53.461333333333336 - type: recall_at_100 value: 75.63633333333333 - type: recall_at_1000 value: 90.67016666666666 - type: recall_at_3 value: 40.24241666666667 - type: recall_at_5 value: 45.98608333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.241999999999997 - type: map_at_10 value: 31.863999999999997 - type: map_at_100 value: 32.835 - type: map_at_1000 value: 32.928000000000004 - type: map_at_3 value: 29.694 - type: map_at_5 value: 30.978 - type: mrr_at_1 value: 28.374 - type: mrr_at_10 value: 34.814 - type: mrr_at_100 value: 35.596 - type: mrr_at_1000 value: 35.666 - type: mrr_at_3 value: 32.745000000000005 - type: mrr_at_5 value: 34.049 - type: ndcg_at_1 value: 28.374 - type: ndcg_at_10 value: 35.969 - type: ndcg_at_100 value: 40.708 - type: ndcg_at_1000 value: 43.08 - type: ndcg_at_3 value: 31.968999999999998 - type: ndcg_at_5 value: 34.069 - type: precision_at_1 value: 28.374 - type: precision_at_10 value: 5.583 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 13.547999999999998 - type: precision_at_5 value: 9.447999999999999 - type: recall_at_1 value: 25.241999999999997 - type: recall_at_10 value: 45.711 - type: recall_at_100 value: 67.482 - type: recall_at_1000 value: 85.13300000000001 - type: recall_at_3 value: 34.622 - type: recall_at_5 value: 40.043 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.488999999999997 - type: map_at_10 value: 25.142999999999997 - type: map_at_100 value: 26.244 - type: map_at_1000 value: 26.363999999999997 - type: map_at_3 value: 22.654 - type: map_at_5 value: 24.017 - type: mrr_at_1 value: 21.198 - type: mrr_at_10 value: 28.903000000000002 - type: mrr_at_100 value: 29.860999999999997 - type: mrr_at_1000 value: 29.934 - type: mrr_at_3 value: 26.634999999999998 - type: mrr_at_5 value: 27.903 - type: ndcg_at_1 value: 21.198 - type: ndcg_at_10 value: 29.982999999999997 - type: ndcg_at_100 value: 35.275 - type: ndcg_at_1000 value: 38.074000000000005 - type: ndcg_at_3 value: 25.502999999999997 - type: ndcg_at_5 value: 27.557 - type: precision_at_1 value: 21.198 - type: precision_at_10 value: 5.502 - type: precision_at_100 value: 0.942 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 12.044 - type: precision_at_5 value: 8.782 - type: recall_at_1 value: 17.488999999999997 - type: recall_at_10 value: 40.821000000000005 - type: recall_at_100 value: 64.567 - type: recall_at_1000 value: 84.452 - type: recall_at_3 value: 28.351 - type: recall_at_5 value: 33.645 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.066000000000003 - type: map_at_10 value: 36.134 - type: map_at_100 value: 37.285000000000004 - type: map_at_1000 value: 37.389 - type: map_at_3 value: 33.522999999999996 - type: map_at_5 value: 34.905 - type: mrr_at_1 value: 31.436999999999998 - type: mrr_at_10 value: 40.225 - type: mrr_at_100 value: 41.079 - type: mrr_at_1000 value: 41.138000000000005 - type: mrr_at_3 value: 38.074999999999996 - type: mrr_at_5 value: 39.190000000000005 - type: ndcg_at_1 value: 31.436999999999998 - type: ndcg_at_10 value: 41.494 - type: ndcg_at_100 value: 46.678999999999995 - type: ndcg_at_1000 value: 48.964 - type: ndcg_at_3 value: 36.828 - type: ndcg_at_5 value: 38.789 - type: precision_at_1 value: 31.436999999999998 - type: precision_at_10 value: 6.931 - type: precision_at_100 value: 1.072 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 11.567 - type: recall_at_1 value: 27.066000000000003 - type: recall_at_10 value: 53.705000000000005 - type: recall_at_100 value: 75.968 - type: recall_at_1000 value: 91.937 - type: recall_at_3 value: 40.865 - type: recall_at_5 value: 45.739999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.979000000000003 - type: map_at_10 value: 32.799 - type: map_at_100 value: 34.508 - type: map_at_1000 value: 34.719 - type: map_at_3 value: 29.947000000000003 - type: map_at_5 value: 31.584 - type: mrr_at_1 value: 30.237000000000002 - type: mrr_at_10 value: 37.651 - type: mrr_at_100 value: 38.805 - type: mrr_at_1000 value: 38.851 - type: mrr_at_3 value: 35.046 - type: mrr_at_5 value: 36.548 - type: ndcg_at_1 value: 30.237000000000002 - type: ndcg_at_10 value: 38.356 - type: ndcg_at_100 value: 44.906 - type: ndcg_at_1000 value: 47.299 - type: ndcg_at_3 value: 33.717999999999996 - type: ndcg_at_5 value: 35.946 - type: precision_at_1 value: 30.237000000000002 - type: precision_at_10 value: 7.292 - type: precision_at_100 value: 1.496 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 15.547 - type: precision_at_5 value: 11.344 - type: recall_at_1 value: 24.979000000000003 - type: recall_at_10 value: 48.624 - type: recall_at_100 value: 77.932 - type: recall_at_1000 value: 92.66499999999999 - type: recall_at_3 value: 35.217 - type: recall_at_5 value: 41.394 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.566 - type: map_at_10 value: 30.945 - type: map_at_100 value: 31.759999999999998 - type: map_at_1000 value: 31.855 - type: map_at_3 value: 28.64 - type: map_at_5 value: 29.787000000000003 - type: mrr_at_1 value: 24.954 - type: mrr_at_10 value: 33.311 - type: mrr_at_100 value: 34.050000000000004 - type: mrr_at_1000 value: 34.117999999999995 - type: mrr_at_3 value: 31.238 - type: mrr_at_5 value: 32.329 - type: ndcg_at_1 value: 24.954 - type: ndcg_at_10 value: 35.676 - type: ndcg_at_100 value: 39.931 - type: ndcg_at_1000 value: 42.43 - type: ndcg_at_3 value: 31.365 - type: ndcg_at_5 value: 33.184999999999995 - type: precision_at_1 value: 24.954 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.826 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.555 - type: precision_at_5 value: 9.168 - type: recall_at_1 value: 22.566 - type: recall_at_10 value: 47.922 - type: recall_at_100 value: 67.931 - type: recall_at_1000 value: 86.653 - type: recall_at_3 value: 36.103 - type: recall_at_5 value: 40.699000000000005 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 16.950000000000003 - type: map_at_10 value: 28.612 - type: map_at_100 value: 30.476999999999997 - type: map_at_1000 value: 30.674 - type: map_at_3 value: 24.262 - type: map_at_5 value: 26.554 - type: mrr_at_1 value: 38.241 - type: mrr_at_10 value: 50.43 - type: mrr_at_100 value: 51.059 - type: mrr_at_1000 value: 51.090999999999994 - type: mrr_at_3 value: 47.514 - type: mrr_at_5 value: 49.246 - type: ndcg_at_1 value: 38.241 - type: ndcg_at_10 value: 38.218 - type: ndcg_at_100 value: 45.003 - type: ndcg_at_1000 value: 48.269 - type: ndcg_at_3 value: 32.568000000000005 - type: ndcg_at_5 value: 34.400999999999996 - type: precision_at_1 value: 38.241 - type: precision_at_10 value: 11.674 - type: precision_at_100 value: 1.913 - type: precision_at_1000 value: 0.252 - type: precision_at_3 value: 24.387 - type: precision_at_5 value: 18.163 - type: recall_at_1 value: 16.950000000000003 - type: recall_at_10 value: 43.769000000000005 - type: recall_at_100 value: 66.875 - type: recall_at_1000 value: 84.92699999999999 - type: recall_at_3 value: 29.353 - type: recall_at_5 value: 35.467 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.276 - type: map_at_10 value: 20.848 - type: map_at_100 value: 29.804000000000002 - type: map_at_1000 value: 31.398 - type: map_at_3 value: 14.886 - type: map_at_5 value: 17.516000000000002 - type: mrr_at_1 value: 71 - type: mrr_at_10 value: 78.724 - type: mrr_at_100 value: 78.976 - type: mrr_at_1000 value: 78.986 - type: mrr_at_3 value: 77.333 - type: mrr_at_5 value: 78.021 - type: ndcg_at_1 value: 57.875 - type: ndcg_at_10 value: 43.855 - type: ndcg_at_100 value: 48.99 - type: ndcg_at_1000 value: 56.141 - type: ndcg_at_3 value: 48.914 - type: ndcg_at_5 value: 45.961 - type: precision_at_1 value: 71 - type: precision_at_10 value: 34.575 - type: precision_at_100 value: 11.182 - type: precision_at_1000 value: 2.044 - type: precision_at_3 value: 52.5 - type: precision_at_5 value: 44.2 - type: recall_at_1 value: 9.276 - type: recall_at_10 value: 26.501 - type: recall_at_100 value: 55.72899999999999 - type: recall_at_1000 value: 78.532 - type: recall_at_3 value: 16.365 - type: recall_at_5 value: 20.154 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.71 - type: f1 value: 47.74801556489574 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.405 - type: map_at_10 value: 82.822 - type: map_at_100 value: 83.042 - type: map_at_1000 value: 83.055 - type: map_at_3 value: 81.65299999999999 - type: map_at_5 value: 82.431 - type: mrr_at_1 value: 79.178 - type: mrr_at_10 value: 87.02 - type: mrr_at_100 value: 87.095 - type: mrr_at_1000 value: 87.09700000000001 - type: mrr_at_3 value: 86.309 - type: mrr_at_5 value: 86.824 - type: ndcg_at_1 value: 79.178 - type: ndcg_at_10 value: 86.72 - type: ndcg_at_100 value: 87.457 - type: ndcg_at_1000 value: 87.691 - type: ndcg_at_3 value: 84.974 - type: ndcg_at_5 value: 86.032 - type: precision_at_1 value: 79.178 - type: precision_at_10 value: 10.548 - type: precision_at_100 value: 1.113 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.848 - type: precision_at_5 value: 20.45 - type: recall_at_1 value: 73.405 - type: recall_at_10 value: 94.39699999999999 - type: recall_at_100 value: 97.219 - type: recall_at_1000 value: 98.675 - type: recall_at_3 value: 89.679 - type: recall_at_5 value: 92.392 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.651 - type: map_at_10 value: 36.886 - type: map_at_100 value: 38.811 - type: map_at_1000 value: 38.981 - type: map_at_3 value: 32.538 - type: map_at_5 value: 34.763 - type: mrr_at_1 value: 44.444 - type: mrr_at_10 value: 53.168000000000006 - type: mrr_at_100 value: 53.839000000000006 - type: mrr_at_1000 value: 53.869 - type: mrr_at_3 value: 50.54 - type: mrr_at_5 value: 52.068000000000005 - type: ndcg_at_1 value: 44.444 - type: ndcg_at_10 value: 44.994 - type: ndcg_at_100 value: 51.599 - type: ndcg_at_1000 value: 54.339999999999996 - type: ndcg_at_3 value: 41.372 - type: ndcg_at_5 value: 42.149 - type: precision_at_1 value: 44.444 - type: precision_at_10 value: 12.407 - type: precision_at_100 value: 1.9269999999999998 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 27.726 - type: precision_at_5 value: 19.814999999999998 - type: recall_at_1 value: 22.651 - type: recall_at_10 value: 52.075 - type: recall_at_100 value: 76.51400000000001 - type: recall_at_1000 value: 92.852 - type: recall_at_3 value: 37.236000000000004 - type: recall_at_5 value: 43.175999999999995 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.777 - type: map_at_10 value: 66.79899999999999 - type: map_at_100 value: 67.65299999999999 - type: map_at_1000 value: 67.706 - type: map_at_3 value: 63.352 - type: map_at_5 value: 65.52900000000001 - type: mrr_at_1 value: 81.553 - type: mrr_at_10 value: 86.983 - type: mrr_at_100 value: 87.132 - type: mrr_at_1000 value: 87.136 - type: mrr_at_3 value: 86.156 - type: mrr_at_5 value: 86.726 - type: ndcg_at_1 value: 81.553 - type: ndcg_at_10 value: 74.64 - type: ndcg_at_100 value: 77.459 - type: ndcg_at_1000 value: 78.43 - type: ndcg_at_3 value: 69.878 - type: ndcg_at_5 value: 72.59400000000001 - type: precision_at_1 value: 81.553 - type: precision_at_10 value: 15.654000000000002 - type: precision_at_100 value: 1.783 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 45.199 - type: precision_at_5 value: 29.267 - type: recall_at_1 value: 40.777 - type: recall_at_10 value: 78.271 - type: recall_at_100 value: 89.129 - type: recall_at_1000 value: 95.49 - type: recall_at_3 value: 67.79899999999999 - type: recall_at_5 value: 73.167 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 93.5064 - type: ap value: 90.25495114444111 - type: f1 value: 93.5012434973381 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.301 - type: map_at_10 value: 35.657 - type: map_at_100 value: 36.797000000000004 - type: map_at_1000 value: 36.844 - type: map_at_3 value: 31.743 - type: map_at_5 value: 34.003 - type: mrr_at_1 value: 23.854 - type: mrr_at_10 value: 36.242999999999995 - type: mrr_at_100 value: 37.32 - type: mrr_at_1000 value: 37.361 - type: mrr_at_3 value: 32.4 - type: mrr_at_5 value: 34.634 - type: ndcg_at_1 value: 23.868000000000002 - type: ndcg_at_10 value: 42.589 - type: ndcg_at_100 value: 48.031 - type: ndcg_at_1000 value: 49.189 - type: ndcg_at_3 value: 34.649 - type: ndcg_at_5 value: 38.676 - type: precision_at_1 value: 23.868000000000002 - type: precision_at_10 value: 6.6850000000000005 - type: precision_at_100 value: 0.9400000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.651 - type: precision_at_5 value: 10.834000000000001 - type: recall_at_1 value: 23.301 - type: recall_at_10 value: 63.88700000000001 - type: recall_at_100 value: 88.947 - type: recall_at_1000 value: 97.783 - type: recall_at_3 value: 42.393 - type: recall_at_5 value: 52.036 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.64888280893753 - type: f1 value: 94.41310774203512 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.72184222526221 - type: f1 value: 61.522034067350106 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 79.60659045057163 - type: f1 value: 77.268649687049 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.83254875588432 - type: f1 value: 81.61520635919082 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 36.31529875009507 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.734233714415073 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.994501713009452 - type: mrr value: 32.13512850703073 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.603000000000001 - type: map_at_10 value: 13.767999999999999 - type: map_at_100 value: 17.197000000000003 - type: map_at_1000 value: 18.615000000000002 - type: map_at_3 value: 10.567 - type: map_at_5 value: 12.078999999999999 - type: mrr_at_1 value: 44.891999999999996 - type: mrr_at_10 value: 53.75299999999999 - type: mrr_at_100 value: 54.35 - type: mrr_at_1000 value: 54.388000000000005 - type: mrr_at_3 value: 51.495999999999995 - type: mrr_at_5 value: 52.688 - type: ndcg_at_1 value: 43.189 - type: ndcg_at_10 value: 34.567 - type: ndcg_at_100 value: 32.273 - type: ndcg_at_1000 value: 41.321999999999996 - type: ndcg_at_3 value: 40.171 - type: ndcg_at_5 value: 37.502 - type: precision_at_1 value: 44.582 - type: precision_at_10 value: 25.139 - type: precision_at_100 value: 7.739999999999999 - type: precision_at_1000 value: 2.054 - type: precision_at_3 value: 37.152 - type: precision_at_5 value: 31.826999999999998 - type: recall_at_1 value: 6.603000000000001 - type: recall_at_10 value: 17.023 - type: recall_at_100 value: 32.914 - type: recall_at_1000 value: 64.44800000000001 - type: recall_at_3 value: 11.457 - type: recall_at_5 value: 13.816 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 30.026000000000003 - type: map_at_10 value: 45.429 - type: map_at_100 value: 46.45 - type: map_at_1000 value: 46.478 - type: map_at_3 value: 41.147 - type: map_at_5 value: 43.627 - type: mrr_at_1 value: 33.951 - type: mrr_at_10 value: 47.953 - type: mrr_at_100 value: 48.731 - type: mrr_at_1000 value: 48.751 - type: mrr_at_3 value: 44.39 - type: mrr_at_5 value: 46.533 - type: ndcg_at_1 value: 33.951 - type: ndcg_at_10 value: 53.24100000000001 - type: ndcg_at_100 value: 57.599999999999994 - type: ndcg_at_1000 value: 58.270999999999994 - type: ndcg_at_3 value: 45.190999999999995 - type: ndcg_at_5 value: 49.339 - type: precision_at_1 value: 33.951 - type: precision_at_10 value: 8.856 - type: precision_at_100 value: 1.133 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.713 - type: precision_at_5 value: 14.838000000000001 - type: recall_at_1 value: 30.026000000000003 - type: recall_at_10 value: 74.512 - type: recall_at_100 value: 93.395 - type: recall_at_1000 value: 98.402 - type: recall_at_3 value: 53.677 - type: recall_at_5 value: 63.198 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.41300000000001 - type: map_at_10 value: 85.387 - type: map_at_100 value: 86.027 - type: map_at_1000 value: 86.041 - type: map_at_3 value: 82.543 - type: map_at_5 value: 84.304 - type: mrr_at_1 value: 82.35 - type: mrr_at_10 value: 88.248 - type: mrr_at_100 value: 88.348 - type: mrr_at_1000 value: 88.349 - type: mrr_at_3 value: 87.348 - type: mrr_at_5 value: 87.96300000000001 - type: ndcg_at_1 value: 82.37 - type: ndcg_at_10 value: 88.98 - type: ndcg_at_100 value: 90.16499999999999 - type: ndcg_at_1000 value: 90.239 - type: ndcg_at_3 value: 86.34100000000001 - type: ndcg_at_5 value: 87.761 - type: precision_at_1 value: 82.37 - type: precision_at_10 value: 13.471 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.827 - type: precision_at_5 value: 24.773999999999997 - type: recall_at_1 value: 71.41300000000001 - type: recall_at_10 value: 95.748 - type: recall_at_100 value: 99.69200000000001 - type: recall_at_1000 value: 99.98 - type: recall_at_3 value: 87.996 - type: recall_at_5 value: 92.142 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.96878497780007 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 65.31371347128074 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.287 - type: map_at_10 value: 13.530000000000001 - type: map_at_100 value: 15.891 - type: map_at_1000 value: 16.245 - type: map_at_3 value: 9.612 - type: map_at_5 value: 11.672 - type: mrr_at_1 value: 26 - type: mrr_at_10 value: 37.335 - type: mrr_at_100 value: 38.443 - type: mrr_at_1000 value: 38.486 - type: mrr_at_3 value: 33.783 - type: mrr_at_5 value: 36.028 - type: ndcg_at_1 value: 26 - type: ndcg_at_10 value: 22.215 - type: ndcg_at_100 value: 31.101 - type: ndcg_at_1000 value: 36.809 - type: ndcg_at_3 value: 21.104 - type: ndcg_at_5 value: 18.759999999999998 - type: precision_at_1 value: 26 - type: precision_at_10 value: 11.43 - type: precision_at_100 value: 2.424 - type: precision_at_1000 value: 0.379 - type: precision_at_3 value: 19.7 - type: precision_at_5 value: 16.619999999999997 - type: recall_at_1 value: 5.287 - type: recall_at_10 value: 23.18 - type: recall_at_100 value: 49.208 - type: recall_at_1000 value: 76.85300000000001 - type: recall_at_3 value: 11.991999999999999 - type: recall_at_5 value: 16.85 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.87834913790886 - type: cos_sim_spearman value: 81.04583513112122 - type: euclidean_pearson value: 81.20484174558065 - type: euclidean_spearman value: 80.76430832561769 - type: manhattan_pearson value: 81.21416730978615 - type: manhattan_spearman value: 80.7797637394211 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.56143998865157 - type: cos_sim_spearman value: 79.75387012744471 - type: euclidean_pearson value: 83.7877519997019 - type: euclidean_spearman value: 79.90489748003296 - type: manhattan_pearson value: 83.7540590666095 - type: manhattan_spearman value: 79.86434577931573 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.92102564177941 - type: cos_sim_spearman value: 84.98234585939103 - type: euclidean_pearson value: 84.47729567593696 - type: euclidean_spearman value: 85.09490696194469 - type: manhattan_pearson value: 84.38622951588229 - type: manhattan_spearman value: 85.02507171545574 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.1891164763377 - type: cos_sim_spearman value: 80.7997969966883 - type: euclidean_pearson value: 80.48572256162396 - type: euclidean_spearman value: 80.57851903536378 - type: manhattan_pearson value: 80.4324819433651 - type: manhattan_spearman value: 80.5074526239062 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 82.64319975116025 - type: cos_sim_spearman value: 84.88671197763652 - type: euclidean_pearson value: 84.74692193293231 - type: euclidean_spearman value: 85.27151722073653 - type: manhattan_pearson value: 84.72460516785438 - type: manhattan_spearman value: 85.26518899786687 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.24687565822381 - type: cos_sim_spearman value: 85.60418454111263 - type: euclidean_pearson value: 84.85829740169851 - type: euclidean_spearman value: 85.66378014138306 - type: manhattan_pearson value: 84.84672408808835 - type: manhattan_spearman value: 85.63331924364891 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.87758895415485 - type: cos_sim_spearman value: 85.8193745617297 - type: euclidean_pearson value: 85.78719118848134 - type: euclidean_spearman value: 84.35797575385688 - type: manhattan_pearson value: 85.97919844815692 - type: manhattan_spearman value: 84.58334745175151 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.27076035963599 - type: cos_sim_spearman value: 67.21433656439973 - type: euclidean_pearson value: 68.07434078679324 - type: euclidean_spearman value: 66.0249731719049 - type: manhattan_pearson value: 67.95495198947476 - type: manhattan_spearman value: 65.99893908331886 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 82.22437747056817 - type: cos_sim_spearman value: 85.0995685206174 - type: euclidean_pearson value: 84.08616925603394 - type: euclidean_spearman value: 84.89633925691658 - type: manhattan_pearson value: 84.08332675923133 - type: manhattan_spearman value: 84.8858228112915 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.6909022589666 - type: mrr value: 96.43341952165481 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.660999999999994 - type: map_at_10 value: 67.625 - type: map_at_100 value: 68.07600000000001 - type: map_at_1000 value: 68.10199999999999 - type: map_at_3 value: 64.50399999999999 - type: map_at_5 value: 66.281 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 68.953 - type: mrr_at_100 value: 69.327 - type: mrr_at_1000 value: 69.352 - type: mrr_at_3 value: 66.833 - type: mrr_at_5 value: 68.05 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 72.369 - type: ndcg_at_100 value: 74.237 - type: ndcg_at_1000 value: 74.939 - type: ndcg_at_3 value: 67.284 - type: ndcg_at_5 value: 69.72500000000001 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.733 - type: precision_at_100 value: 1.0670000000000002 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 26.222 - type: precision_at_5 value: 17.4 - type: recall_at_1 value: 57.660999999999994 - type: recall_at_10 value: 85.656 - type: recall_at_100 value: 93.833 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 71.961 - type: recall_at_5 value: 78.094 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86930693069307 - type: cos_sim_ap value: 96.76685487950894 - type: cos_sim_f1 value: 93.44587884806354 - type: cos_sim_precision value: 92.80078895463511 - type: cos_sim_recall value: 94.1 - type: dot_accuracy value: 99.54356435643564 - type: dot_ap value: 81.18659960405607 - type: dot_f1 value: 75.78008915304605 - type: dot_precision value: 75.07360157016683 - type: dot_recall value: 76.5 - type: euclidean_accuracy value: 99.87326732673267 - type: euclidean_ap value: 96.8102411908941 - type: euclidean_f1 value: 93.6127744510978 - type: euclidean_precision value: 93.42629482071713 - type: euclidean_recall value: 93.8 - type: manhattan_accuracy value: 99.87425742574257 - type: manhattan_ap value: 96.82857341435529 - type: manhattan_f1 value: 93.62129583124059 - type: manhattan_precision value: 94.04641775983855 - type: manhattan_recall value: 93.2 - type: max_accuracy value: 99.87425742574257 - type: max_ap value: 96.82857341435529 - type: max_f1 value: 93.62129583124059 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.92560972698926 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.92797240259008 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.244624045597654 - type: mrr value: 56.185303666921314 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.02491987312937 - type: cos_sim_spearman value: 32.055592206679734 - type: dot_pearson value: 24.731627575422557 - type: dot_spearman value: 24.308029077069733 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.231 - type: map_at_10 value: 1.899 - type: map_at_100 value: 9.498 - type: map_at_1000 value: 20.979999999999997 - type: map_at_3 value: 0.652 - type: map_at_5 value: 1.069 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.4 - type: mrr_at_100 value: 93.4 - type: mrr_at_1000 value: 93.4 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.4 - type: ndcg_at_1 value: 86 - type: ndcg_at_10 value: 75.375 - type: ndcg_at_100 value: 52.891999999999996 - type: ndcg_at_1000 value: 44.952999999999996 - type: ndcg_at_3 value: 81.05 - type: ndcg_at_5 value: 80.175 - type: precision_at_1 value: 88 - type: precision_at_10 value: 79 - type: precision_at_100 value: 53.16 - type: precision_at_1000 value: 19.408 - type: precision_at_3 value: 85.333 - type: precision_at_5 value: 84 - type: recall_at_1 value: 0.231 - type: recall_at_10 value: 2.078 - type: recall_at_100 value: 12.601 - type: recall_at_1000 value: 41.296 - type: recall_at_3 value: 0.6779999999999999 - type: recall_at_5 value: 1.1360000000000001 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.782 - type: map_at_10 value: 10.204 - type: map_at_100 value: 16.176 - type: map_at_1000 value: 17.456 - type: map_at_3 value: 5.354 - type: map_at_5 value: 7.503 - type: mrr_at_1 value: 40.816 - type: mrr_at_10 value: 54.010000000000005 - type: mrr_at_100 value: 54.49 - type: mrr_at_1000 value: 54.49 - type: mrr_at_3 value: 48.980000000000004 - type: mrr_at_5 value: 51.735 - type: ndcg_at_1 value: 36.735 - type: ndcg_at_10 value: 26.61 - type: ndcg_at_100 value: 36.967 - type: ndcg_at_1000 value: 47.274 - type: ndcg_at_3 value: 30.363 - type: ndcg_at_5 value: 29.448999999999998 - type: precision_at_1 value: 40.816 - type: precision_at_10 value: 23.878 - type: precision_at_100 value: 7.693999999999999 - type: precision_at_1000 value: 1.4489999999999998 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.782 - type: recall_at_10 value: 16.485 - type: recall_at_100 value: 46.924 - type: recall_at_1000 value: 79.365 - type: recall_at_3 value: 6.52 - type: recall_at_5 value: 10.48 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.08300000000001 - type: ap value: 13.91559884590195 - type: f1 value: 53.956838444291364 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.34069043576683 - type: f1 value: 59.662041994618406 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 53.70780611078653 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.10734934732073 - type: cos_sim_ap value: 77.58349999516054 - type: cos_sim_f1 value: 70.25391395868965 - type: cos_sim_precision value: 70.06035161374967 - type: cos_sim_recall value: 70.44854881266491 - type: dot_accuracy value: 80.60439887941826 - type: dot_ap value: 54.52935200483575 - type: dot_f1 value: 54.170444242973716 - type: dot_precision value: 47.47715534366309 - type: dot_recall value: 63.06068601583114 - type: euclidean_accuracy value: 87.26828396018358 - type: euclidean_ap value: 78.00158454104036 - type: euclidean_f1 value: 70.70292457670601 - type: euclidean_precision value: 68.79680479281079 - type: euclidean_recall value: 72.71767810026385 - type: manhattan_accuracy value: 87.11330988853788 - type: manhattan_ap value: 77.92527099601855 - type: manhattan_f1 value: 70.76488706365502 - type: manhattan_precision value: 68.89055472263868 - type: manhattan_recall value: 72.74406332453826 - type: max_accuracy value: 87.26828396018358 - type: max_ap value: 78.00158454104036 - type: max_f1 value: 70.76488706365502 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.80804905499282 - type: cos_sim_ap value: 83.06187782630936 - type: cos_sim_f1 value: 74.99716435403985 - type: cos_sim_precision value: 73.67951860931579 - type: cos_sim_recall value: 76.36279642747151 - type: dot_accuracy value: 81.83141227151008 - type: dot_ap value: 67.18241090841795 - type: dot_f1 value: 62.216037571751606 - type: dot_precision value: 56.749381227391005 - type: dot_recall value: 68.84816753926701 - type: euclidean_accuracy value: 87.91671517832887 - type: euclidean_ap value: 83.56538942001427 - type: euclidean_f1 value: 75.7327253337256 - type: euclidean_precision value: 72.48856036606828 - type: euclidean_recall value: 79.28087465352634 - type: manhattan_accuracy value: 87.86626304963713 - type: manhattan_ap value: 83.52939841172832 - type: manhattan_f1 value: 75.73635656329888 - type: manhattan_precision value: 72.99150182103836 - type: manhattan_recall value: 78.69571912534647 - type: max_accuracy value: 87.91671517832887 - type: max_ap value: 83.56538942001427 - type: max_f1 value: 75.73635656329888 license: mit language: - en --- **Recommend switching to newest BAAI/bge-large-en-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
22 |
+
"model_explanation_gemini": "BAAI_bge-large-en is a versatile model excelling in text classification, retrieval, clustering, reranking, and semantic textual similarity tasks across various datasets."
|
23 |
+
}
|
data/model_data_json/BAAI_bge-large-zh-v1.5.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-large-zh-v1.5",
|
3 |
+
"downloads": 220157,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"bert",
|
8 |
+
"feature-extraction",
|
9 |
+
"sentence-similarity",
|
10 |
+
"transformers",
|
11 |
+
"zh",
|
12 |
+
"arxiv:2401.03462",
|
13 |
+
"arxiv:2312.15503",
|
14 |
+
"arxiv:2311.13534",
|
15 |
+
"arxiv:2310.07554",
|
16 |
+
"arxiv:2309.07597",
|
17 |
+
"license:mit",
|
18 |
+
"autotrain_compatible",
|
19 |
+
"text-embeddings-inference",
|
20 |
+
"endpoints_compatible",
|
21 |
+
"region:us"
|
22 |
+
],
|
23 |
+
"description": "--- license: mit language: - zh tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
24 |
+
"model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable tasks like sentence similarity and retrieval-augmented language models."
|
25 |
+
}
|
data/model_data_json/BAAI_bge-m3.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-m3",
|
3 |
+
"downloads": 3377897,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"xlm-roberta",
|
9 |
+
"feature-extraction",
|
10 |
+
"sentence-similarity",
|
11 |
+
"arxiv:2402.03216",
|
12 |
+
"arxiv:2004.04906",
|
13 |
+
"arxiv:2106.14807",
|
14 |
+
"arxiv:2107.05720",
|
15 |
+
"arxiv:2004.12832",
|
16 |
+
"license:mit",
|
17 |
+
"autotrain_compatible",
|
18 |
+
"text-embeddings-inference",
|
19 |
+
"endpoints_compatible",
|
20 |
+
"region:us"
|
21 |
+
],
|
22 |
+
"description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity license: mit --- For more details please refer to our github repo: # BGE-M3 (paper, code) In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity. - Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. - Multi-Linguality: It can support more than 100 working languages. - Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens. **Some suggestions for retrieval pipeline in RAG** We recommend to use the following pipeline: hybrid retrieval + re-ranking. - Hybrid retrieval leverages the strengths of various methods, offering higher accuracy and stronger generalization capabilities. A classic example: using both embedding retrieval and the BM25 algorithm. Now, you can try to use BGE-M3, which supports both embedding and sparse retrieval. This allows you to obtain token weights (similar to the BM25) without any additional cost when generate dense embeddings. To use hybrid retrieval, you can refer to Vespa and Milvus. - As cross-encoder models, re-ranker demonstrates higher accuracy than bi-encoder embedding model. Utilizing the re-ranking model (e.g., bge-reranker, bge-reranker-v2) after retrieval can further filter the selected text. ## News: - 2024/7/1: **We update the MIRACL evaluation results of BGE-M3**. To reproduce the new results, you can refer to: bge-m3_miracl_2cr. We have also updated our paper on arXiv. <details> <summary> Details </summary> The previous test results were lower because we mistakenly removed the passages that have the same id as the query from the search results. After correcting this mistake, the overall performance of BGE-M3 on MIRACL is higher than the previous results, but the experimental conclusion remains unchanged. The other results are not affected by this mistake. To reproduce the previous lower results, you need to add the parameter when using or to search the passages. </details> - 2024/3/20: **Thanks Milvus team!** Now you can use hybrid retrieval of bge-m3 in Milvus: pymilvus/examples /hello_hybrid_sparse_dense.py. - 2024/3/8: **Thanks for the experimental results from @Yannael. In this benchmark, BGE-M3 achieves top performance in both English and other languages, surpassing models such as OpenAI.** - 2024/3/2: Release unified fine-tuning example and data - 2024/2/6: We release the MLDR (a long document retrieval dataset covering 13 languages) and evaluation pipeline. - 2024/2/1: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this notebook ## Specs - Model | Model Name | Dimension | Sequence Length | Introduction | |:----:|:---:|:---:|:---:| | BAAI/bge-m3 | 1024 | 8192 | multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised| | BAAI/bge-m3-unsupervised | 1024 | 8192 | multilingual; contrastive learning from bge-m3-retromae | | BAAI/bge-m3-retromae | -- | 8192 | multilingual; extend the max_length of xlm-roberta to 8192 and further pretrained via retromae| | BAAI/bge-large-en-v1.5 | 1024 | 512 | English model | | BAAI/bge-base-en-v1.5 | 768 | 512 | English model | | BAAI/bge-small-en-v1.5 | 384 | 512 | English model | - Data | Dataset | Introduction | |:----------------------------------------------------------:|:-------------------------------------------------:| | MLDR | Docuemtn Retrieval Dataset, covering 13 languages | | bge-m3-data | Fine-tuning data used by bge-m3 | ## FAQ **1. Introduction for different retrieval methods** - Dense retrieval: map the text into a single embedding, e.g., DPR, BGE-v1.5 - Sparse retrieval (lexical matching): a vector of size equal to the vocabulary, with the majority of positions set to zero, calculating a weight only for tokens present in the text. e.g., BM25, unicoil, and splade - Multi-vector retrieval: use multiple vectors to represent a text, e.g., ColBERT. **2. How to use BGE-M3 in other projects?** For embedding retrieval, you can employ the BGE-M3 model using the same approach as BGE. The only difference is that the BGE-M3 model no longer requires adding instructions to the queries. For hybrid retrieval, you can use Vespa and Milvus. **3. How to fine-tune bge-M3 model?** You can follow the common in this example to fine-tune the dense embedding. If you want to fine-tune all embedding function of m3 (dense, sparse and colbert), you can refer to the unified_fine-tuning example ## Usage Install: or: ### Generate Embedding for text - Dense Embedding You also can use sentence-transformers and huggingface transformers to generate dense embeddings. Refer to baai_general_embedding for details. - Sparse Embedding (Lexical Weight) - Multi-Vector (ColBERT) ### Compute score for text pairs Input a list of text pairs, you can get the scores computed by different methods. ## Evaluation We provide the evaluation script for MKQA and MLDR ### Benchmarks from the open-source community !avatar The BGE-M3 model emerged as the top performer on this benchmark (OAI is short for OpenAI). For more details, please refer to the article and Github Repo ### Our results - Multilingual (Miracl dataset) !avatar - Cross-lingual (MKQA dataset) !avatar - Long Document Retrieval - MLDR: !avatar Please note that MLDR is a document retrieval dataset we constructed via LLM, covering 13 languages, including test set, validation set, and training set. We utilized the training set from MLDR to enhance the model's long document retrieval capabilities. Therefore, comparing baselines with (fine-tuning without long document dataset) is more equitable. Additionally, this long document retrieval dataset will be open-sourced to address the current lack of open-source multilingual long text retrieval datasets. We believe that this data will be helpful for the open-source community in training document retrieval models. - NarritiveQA: !avatar - Comparison with BM25 We utilized Pyserini to implement BM25, and the test results can be reproduced by this script. We tested BM25 using two different tokenizers: one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta). The results indicate that BM25 remains a competitive baseline, especially in long document retrieval. !avatar ## Training - Self-knowledge Distillation: combining multiple outputs from different retrieval modes as reward signal to enhance the performance of single mode(especially for sparse retrieval and multi-vec(colbert) retrival) - Efficient Batching: Improve the efficiency when fine-tuning on long text. The small-batch strategy is simple but effective, which also can used to fine-tune large embedding model. - MCLS: A simple method to improve the performance on long text without fine-tuning. If you have no enough resource to fine-tuning model with long text, the method is useful. Refer to our report for more details. ## Acknowledgement Thanks to the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc. Thanks to the open-sourced libraries like Tevatron, Pyserini. ## Citation If you find this repository useful, please consider giving a star :star: and citation",
|
23 |
+
"model_explanation_gemini": "BGE-M3 is a versatile multilingual embedding model supporting dense, sparse, and multi-vector retrieval across 100+ languages, handling text from short sentences to long documents up to 8192 tokens."
|
24 |
+
}
|
data/model_data_json/BAAI_bge-multilingual-gemma2.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/model_data_json/BAAI_bge-reranker-base.json
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-reranker-base",
|
3 |
+
"downloads": 1113449,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"xlm-roberta",
|
10 |
+
"mteb",
|
11 |
+
"text-embeddings-inference",
|
12 |
+
"text-classification",
|
13 |
+
"en",
|
14 |
+
"zh",
|
15 |
+
"arxiv:2401.03462",
|
16 |
+
"arxiv:2312.15503",
|
17 |
+
"arxiv:2311.13534",
|
18 |
+
"arxiv:2310.07554",
|
19 |
+
"arxiv:2309.07597",
|
20 |
+
"license:mit",
|
21 |
+
"model-index",
|
22 |
+
"region:us"
|
23 |
+
],
|
24 |
+
"description": "--- license: mit language: - en - zh tags: - mteb - text-embeddings-inference model-index: - name: bge-reranker-base results: - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 81.27206722525007 - type: mrr value: 84.14238095238095 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 84.10369934291236 - type: mrr value: 86.79376984126984 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 35.4600511272538 - type: mrr value: 34.60238095238095 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.27728847727172 - type: mrr value: 77.1315192743764 pipeline_tag: text-classification library_name: sentence-transformers --- **We have updated the new reranker, supporting larger lengths, more languages, and achieving better performance.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> **More details please refer to our Github: FlagEmbedding.** English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Embedding Model**: Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: llm rerankers, BGE Reranker - **Benchmark**: C-MTEB ## News - 3/18/2024: Release new rerankers, built upon powerful M3 and LLM (GEMMA and MiniCPM, not so large actually) backbones, supporitng multi-lingual processing and larger inputs, massive improvements of ranking performances on BEIR, C-MTEB/Retrieval, MIRACL, LlamaIndex Evaluation. - 3/18/2024: Release Visualized-BGE, equipping BGE with visual capabilities. Visualized-BGE can be utilized to generate embeddings for hybrid image-text data. - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. Refer to this example for the fine-tuning for reranker </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage reranker with the ONNX files #### Usage reranker with infinity Its also possible to deploy the onnx/torch files with the infinity_emb pip package. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
25 |
+
"model_explanation_gemini": "Reranks text documents to improve retrieval performance by prioritizing relevant content based on query-document relationships."
|
26 |
+
}
|
data/model_data_json/BAAI_bge-reranker-large.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-reranker-large",
|
3 |
+
"downloads": 482447,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"xlm-roberta",
|
10 |
+
"text-classification",
|
11 |
+
"mteb",
|
12 |
+
"feature-extraction",
|
13 |
+
"en",
|
14 |
+
"zh",
|
15 |
+
"arxiv:2401.03462",
|
16 |
+
"arxiv:2312.15503",
|
17 |
+
"arxiv:2311.13534",
|
18 |
+
"arxiv:2310.07554",
|
19 |
+
"arxiv:2309.07597",
|
20 |
+
"license:mit",
|
21 |
+
"model-index",
|
22 |
+
"autotrain_compatible",
|
23 |
+
"text-embeddings-inference",
|
24 |
+
"endpoints_compatible",
|
25 |
+
"region:us"
|
26 |
+
],
|
27 |
+
"description": "--- license: mit language: - en - zh tags: - mteb model-index: - name: bge-reranker-base results: - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 81.27206722525007 - type: mrr value: 84.14238095238095 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 84.10369934291236 - type: mrr value: 86.79376984126984 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 35.4600511272538 - type: mrr value: 34.60238095238095 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.27728847727172 - type: mrr value: 77.1315192743764 pipeline_tag: feature-extraction --- **We have updated the new reranker, supporting larger lengths, more languages, and achieving better performance.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> **More details please refer to our Github: FlagEmbedding.** English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Embedding Model**: Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: llm rerankers, BGE Reranker - **Benchmark**: C-MTEB ## News - 3/18/2024: Release new rerankers, built upon powerful M3 and LLM (GEMMA and MiniCPM, not so large actually) backbones, supporitng multi-lingual processing and larger inputs, massive improvements of ranking performances on BEIR, C-MTEB/Retrieval, MIRACL, LlamaIndex Evaluation. - 3/18/2024: Release Visualized-BGE, equipping BGE with visual capabilities. Visualized-BGE can be utilized to generate embeddings for hybrid image-text data. - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. Refer to this example for the fine-tuning for reranker </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage reranker with the ONNX files #### Usage reranker with infinity Its also possible to deploy the onnx/torch files with the infinity_emb pip package. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
28 |
+
"model_explanation_gemini": "Reranks documents or search results to improve relevance, supporting multiple languages and longer input lengths for enhanced retrieval performance."
|
29 |
+
}
|
data/model_data_json/BAAI_bge-reranker-v2-m3.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-reranker-v2-m3",
|
3 |
+
"downloads": 1891532,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"safetensors",
|
7 |
+
"xlm-roberta",
|
8 |
+
"text-classification",
|
9 |
+
"transformers",
|
10 |
+
"text-embeddings-inference",
|
11 |
+
"multilingual",
|
12 |
+
"arxiv:2312.15503",
|
13 |
+
"arxiv:2402.03216",
|
14 |
+
"license:apache-2.0",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- license: apache-2.0 pipeline_tag: text-classification tags: - transformers - sentence-transformers - text-embeddings-inference language: - multilingual --- # Reranker **More details please refer to our Github: FlagEmbedding.** - Model List - Usage - Fine-tuning - Evaluation - Citation Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function. ## Model List | Model | Base model | Language | layerwise | feature | |:--------------------------------------------------------------------------|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:| | BAAI/bge-reranker-base | xlm-roberta-base | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. | | BAAI/bge-reranker-large | xlm-roberta-large | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. | | BAAI/bge-reranker-v2-m3 | bge-m3 | Multilingual | - | Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. | | BAAI/bge-reranker-v2-gemma | gemma-2b | Multilingual | - | Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. | | BAAI/bge-reranker-v2-minicpm-layerwise | MiniCPM-2B-dpo-bf16 | Multilingual | 8-40 | Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. | You can select the model according your senario and resource. - For **multilingual**, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-gemma - For **Chinese or English**, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise. - For **efficiency**, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise. - For better performance, recommand BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma ## Usage ### Using FlagEmbedding #### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) Get relevance scores (higher scores indicate more relevance): #### For LLM-based reranker #### For LLM-based layerwise reranker ### Using Huggingface transformers #### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) Get relevance scores (higher scores indicate more relevance): #### For LLM-based reranker #### For LLM-based layerwise reranker ## Fine-tune ### Data Format Train data should be a json file, where each line is a dict like this: is the query, and is a list of positive texts, is a list of negative texts, indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives. See toy_finetune_data.jsonl for a toy data file. ### Train You can fine-tune the reranker with the following code: **For llm-based reranker** **For llm-based layerwise reranker** Our rerankers are initialized from google/gemma-2b (for llm-based reranker) and openbmb/MiniCPM-2B-dpo-bf16 (for llm-based layerwise reranker), and we train it on a mixture of multilingual datasets: - bge-m3-data - quora train data - fever train data ## Evaluation - llama-index. !image-20240317193909373 - BEIR. rereank the top 100 results from bge-en-v1.5 large. !image-20240317174633333 rereank the top 100 results from e5 mistral 7b instruct. !image-20240317172949713 - CMTEB-retrieval. It rereank the top 100 results from bge-zh-v1.5 large. !image-20240317173026235 - miracl (multi-language). It rereank the top 100 results from bge-m3. !image-20240317173117639 ## Citation If you find this repository useful, please consider giving a star and citation",
|
18 |
+
"model_explanation_gemini": "Computes relevance scores between queries and documents in multiple languages to rank passages by similarity."
|
19 |
+
}
|
data/model_data_json/BAAI_bge-small-en-v1.5.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-small-en-v1.5",
|
3 |
+
"downloads": 3366269,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"pytorch",
|
7 |
+
"onnx",
|
8 |
+
"safetensors",
|
9 |
+
"bert",
|
10 |
+
"feature-extraction",
|
11 |
+
"sentence-similarity",
|
12 |
+
"transformers",
|
13 |
+
"mteb",
|
14 |
+
"en",
|
15 |
+
"arxiv:2401.03462",
|
16 |
+
"arxiv:2312.15503",
|
17 |
+
"arxiv:2311.13534",
|
18 |
+
"arxiv:2310.07554",
|
19 |
+
"arxiv:2309.07597",
|
20 |
+
"license:mit",
|
21 |
+
"model-index",
|
22 |
+
"autotrain_compatible",
|
23 |
+
"text-embeddings-inference",
|
24 |
+
"endpoints_compatible",
|
25 |
+
"region:us"
|
26 |
+
],
|
27 |
+
"description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-small-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.79104477611939 - type: ap value: 37.21923821573361 - type: f1 value: 68.0914945617093 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.75377499999999 - type: ap value: 89.46766124546022 - type: f1 value: 92.73884001331487 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.986 - type: f1 value: 46.55936786727896 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 35.846000000000004 - type: map_at_10 value: 51.388 - type: map_at_100 value: 52.132999999999996 - type: map_at_1000 value: 52.141000000000005 - type: map_at_3 value: 47.037 - type: map_at_5 value: 49.579 - type: mrr_at_1 value: 36.558 - type: mrr_at_10 value: 51.658 - type: mrr_at_100 value: 52.402 - type: mrr_at_1000 value: 52.410000000000004 - type: mrr_at_3 value: 47.345 - type: mrr_at_5 value: 49.797999999999995 - type: ndcg_at_1 value: 35.846000000000004 - type: ndcg_at_10 value: 59.550000000000004 - type: ndcg_at_100 value: 62.596 - type: ndcg_at_1000 value: 62.759 - type: ndcg_at_3 value: 50.666999999999994 - type: ndcg_at_5 value: 55.228 - type: precision_at_1 value: 35.846000000000004 - type: precision_at_10 value: 8.542 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.389 - type: precision_at_5 value: 14.438 - type: recall_at_1 value: 35.846000000000004 - type: recall_at_10 value: 85.42 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 61.166 - type: recall_at_5 value: 72.191 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.402770198163594 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 40.01545436974177 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.586465273207196 - type: mrr value: 74.42169019038825 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.1891186537969 - type: cos_sim_spearman value: 83.75492046087288 - type: euclidean_pearson value: 84.11766204805357 - type: euclidean_spearman value: 84.01456493126516 - type: manhattan_pearson value: 84.2132950502772 - type: manhattan_spearman value: 83.89227298813377 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.74025974025975 - type: f1 value: 85.71493566466381 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.467181385006434 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.719496037339056 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.587000000000003 - type: map_at_10 value: 41.114 - type: map_at_100 value: 42.532 - type: map_at_1000 value: 42.661 - type: map_at_3 value: 37.483 - type: map_at_5 value: 39.652 - type: mrr_at_1 value: 36.338 - type: mrr_at_10 value: 46.763 - type: mrr_at_100 value: 47.393 - type: mrr_at_1000 value: 47.445 - type: mrr_at_3 value: 43.538 - type: mrr_at_5 value: 45.556000000000004 - type: ndcg_at_1 value: 36.338 - type: ndcg_at_10 value: 47.658 - type: ndcg_at_100 value: 52.824000000000005 - type: ndcg_at_1000 value: 54.913999999999994 - type: ndcg_at_3 value: 41.989 - type: ndcg_at_5 value: 44.944 - type: precision_at_1 value: 36.338 - type: precision_at_10 value: 9.156 - type: precision_at_100 value: 1.4789999999999999 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.85 - type: recall_at_1 value: 29.587000000000003 - type: recall_at_10 value: 60.746 - type: recall_at_100 value: 82.157 - type: recall_at_1000 value: 95.645 - type: recall_at_3 value: 44.821 - type: recall_at_5 value: 52.819 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.239 - type: map_at_10 value: 39.989000000000004 - type: map_at_100 value: 41.196 - type: map_at_1000 value: 41.325 - type: map_at_3 value: 37.261 - type: map_at_5 value: 38.833 - type: mrr_at_1 value: 37.516 - type: mrr_at_10 value: 46.177 - type: mrr_at_100 value: 46.806 - type: mrr_at_1000 value: 46.849000000000004 - type: mrr_at_3 value: 44.002 - type: mrr_at_5 value: 45.34 - type: ndcg_at_1 value: 37.516 - type: ndcg_at_10 value: 45.586 - type: ndcg_at_100 value: 49.897000000000006 - type: ndcg_at_1000 value: 51.955 - type: ndcg_at_3 value: 41.684 - type: ndcg_at_5 value: 43.617 - type: precision_at_1 value: 37.516 - type: precision_at_10 value: 8.522 - type: precision_at_100 value: 1.374 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 20.105999999999998 - type: precision_at_5 value: 14.152999999999999 - type: recall_at_1 value: 30.239 - type: recall_at_10 value: 55.03 - type: recall_at_100 value: 73.375 - type: recall_at_1000 value: 86.29599999999999 - type: recall_at_3 value: 43.269000000000005 - type: recall_at_5 value: 48.878 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.338 - type: map_at_10 value: 50.468999999999994 - type: map_at_100 value: 51.553000000000004 - type: map_at_1000 value: 51.608 - type: map_at_3 value: 47.107 - type: map_at_5 value: 49.101 - type: mrr_at_1 value: 44.201 - type: mrr_at_10 value: 54.057 - type: mrr_at_100 value: 54.764 - type: mrr_at_1000 value: 54.791000000000004 - type: mrr_at_3 value: 51.56699999999999 - type: mrr_at_5 value: 53.05 - type: ndcg_at_1 value: 44.201 - type: ndcg_at_10 value: 56.379000000000005 - type: ndcg_at_100 value: 60.645 - type: ndcg_at_1000 value: 61.73499999999999 - type: ndcg_at_3 value: 50.726000000000006 - type: ndcg_at_5 value: 53.58500000000001 - type: precision_at_1 value: 44.201 - type: precision_at_10 value: 9.141 - type: precision_at_100 value: 1.216 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 22.654 - type: precision_at_5 value: 15.723999999999998 - type: recall_at_1 value: 38.338 - type: recall_at_10 value: 70.30499999999999 - type: recall_at_100 value: 88.77199999999999 - type: recall_at_1000 value: 96.49799999999999 - type: recall_at_3 value: 55.218 - type: recall_at_5 value: 62.104000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.682 - type: map_at_10 value: 33.498 - type: map_at_100 value: 34.461000000000006 - type: map_at_1000 value: 34.544000000000004 - type: map_at_3 value: 30.503999999999998 - type: map_at_5 value: 32.216 - type: mrr_at_1 value: 27.683999999999997 - type: mrr_at_10 value: 35.467999999999996 - type: mrr_at_100 value: 36.32 - type: mrr_at_1000 value: 36.386 - type: mrr_at_3 value: 32.618 - type: mrr_at_5 value: 34.262 - type: ndcg_at_1 value: 27.683999999999997 - type: ndcg_at_10 value: 38.378 - type: ndcg_at_100 value: 43.288 - type: ndcg_at_1000 value: 45.413 - type: ndcg_at_3 value: 32.586 - type: ndcg_at_5 value: 35.499 - type: precision_at_1 value: 27.683999999999997 - type: precision_at_10 value: 5.864 - type: precision_at_100 value: 0.882 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 13.446 - type: precision_at_5 value: 9.718 - type: recall_at_1 value: 25.682 - type: recall_at_10 value: 51.712 - type: recall_at_100 value: 74.446 - type: recall_at_1000 value: 90.472 - type: recall_at_3 value: 36.236000000000004 - type: recall_at_5 value: 43.234 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.073999999999998 - type: map_at_10 value: 24.352999999999998 - type: map_at_100 value: 25.438 - type: map_at_1000 value: 25.545 - type: map_at_3 value: 21.614 - type: map_at_5 value: 23.104 - type: mrr_at_1 value: 19.776 - type: mrr_at_10 value: 28.837000000000003 - type: mrr_at_100 value: 29.755 - type: mrr_at_1000 value: 29.817 - type: mrr_at_3 value: 26.201999999999998 - type: mrr_at_5 value: 27.714 - type: ndcg_at_1 value: 19.776 - type: ndcg_at_10 value: 29.701 - type: ndcg_at_100 value: 35.307 - type: ndcg_at_1000 value: 37.942 - type: ndcg_at_3 value: 24.764 - type: ndcg_at_5 value: 27.025 - type: precision_at_1 value: 19.776 - type: precision_at_10 value: 5.659 - type: precision_at_100 value: 0.971 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 12.065 - type: precision_at_5 value: 8.905000000000001 - type: recall_at_1 value: 16.073999999999998 - type: recall_at_10 value: 41.647 - type: recall_at_100 value: 66.884 - type: recall_at_1000 value: 85.91499999999999 - type: recall_at_3 value: 27.916 - type: recall_at_5 value: 33.729 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.444999999999997 - type: map_at_10 value: 38.218999999999994 - type: map_at_100 value: 39.595 - type: map_at_1000 value: 39.709 - type: map_at_3 value: 35.586 - type: map_at_5 value: 36.895 - type: mrr_at_1 value: 34.841 - type: mrr_at_10 value: 44.106 - type: mrr_at_100 value: 44.98 - type: mrr_at_1000 value: 45.03 - type: mrr_at_3 value: 41.979 - type: mrr_at_5 value: 43.047999999999995 - type: ndcg_at_1 value: 34.841 - type: ndcg_at_10 value: 43.922 - type: ndcg_at_100 value: 49.504999999999995 - type: ndcg_at_1000 value: 51.675000000000004 - type: ndcg_at_3 value: 39.858 - type: ndcg_at_5 value: 41.408 - type: precision_at_1 value: 34.841 - type: precision_at_10 value: 7.872999999999999 - type: precision_at_100 value: 1.2449999999999999 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 18.993 - type: precision_at_5 value: 13.032 - type: recall_at_1 value: 28.444999999999997 - type: recall_at_10 value: 54.984 - type: recall_at_100 value: 78.342 - type: recall_at_1000 value: 92.77 - type: recall_at_3 value: 42.842999999999996 - type: recall_at_5 value: 47.247 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.072 - type: map_at_10 value: 32.354 - type: map_at_100 value: 33.800000000000004 - type: map_at_1000 value: 33.908 - type: map_at_3 value: 29.232000000000003 - type: map_at_5 value: 31.049 - type: mrr_at_1 value: 29.110000000000003 - type: mrr_at_10 value: 38.03 - type: mrr_at_100 value: 39.032 - type: mrr_at_1000 value: 39.086999999999996 - type: mrr_at_3 value: 35.407 - type: mrr_at_5 value: 36.76 - type: ndcg_at_1 value: 29.110000000000003 - type: ndcg_at_10 value: 38.231 - type: ndcg_at_100 value: 44.425 - type: ndcg_at_1000 value: 46.771 - type: ndcg_at_3 value: 33.095 - type: ndcg_at_5 value: 35.459 - type: precision_at_1 value: 29.110000000000003 - type: precision_at_10 value: 7.215000000000001 - type: precision_at_100 value: 1.2109999999999999 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 16.058 - type: precision_at_5 value: 11.644 - type: recall_at_1 value: 23.072 - type: recall_at_10 value: 50.285999999999994 - type: recall_at_100 value: 76.596 - type: recall_at_1000 value: 92.861 - type: recall_at_3 value: 35.702 - type: recall_at_5 value: 42.152 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.937916666666666 - type: map_at_10 value: 33.755250000000004 - type: map_at_100 value: 34.955999999999996 - type: map_at_1000 value: 35.070499999999996 - type: map_at_3 value: 30.98708333333333 - type: map_at_5 value: 32.51491666666666 - type: mrr_at_1 value: 29.48708333333333 - type: mrr_at_10 value: 37.92183333333334 - type: mrr_at_100 value: 38.76583333333333 - type: mrr_at_1000 value: 38.82466666666667 - type: mrr_at_3 value: 35.45125 - type: mrr_at_5 value: 36.827000000000005 - type: ndcg_at_1 value: 29.48708333333333 - type: ndcg_at_10 value: 39.05225 - type: ndcg_at_100 value: 44.25983333333334 - type: ndcg_at_1000 value: 46.568333333333335 - type: ndcg_at_3 value: 34.271583333333325 - type: ndcg_at_5 value: 36.483916666666666 - type: precision_at_1 value: 29.48708333333333 - type: precision_at_10 value: 6.865749999999999 - type: precision_at_100 value: 1.1195833333333332 - type: precision_at_1000 value: 0.15058333333333335 - type: precision_at_3 value: 15.742083333333333 - type: precision_at_5 value: 11.221916666666667 - type: recall_at_1 value: 24.937916666666666 - type: recall_at_10 value: 50.650416666666665 - type: recall_at_100 value: 73.55383333333334 - type: recall_at_1000 value: 89.61691666666667 - type: recall_at_3 value: 37.27808333333334 - type: recall_at_5 value: 42.99475 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.947 - type: map_at_10 value: 30.575000000000003 - type: map_at_100 value: 31.465 - type: map_at_1000 value: 31.558000000000003 - type: map_at_3 value: 28.814 - type: map_at_5 value: 29.738999999999997 - type: mrr_at_1 value: 26.994 - type: mrr_at_10 value: 33.415 - type: mrr_at_100 value: 34.18 - type: mrr_at_1000 value: 34.245 - type: mrr_at_3 value: 31.621 - type: mrr_at_5 value: 32.549 - type: ndcg_at_1 value: 26.994 - type: ndcg_at_10 value: 34.482 - type: ndcg_at_100 value: 38.915 - type: ndcg_at_1000 value: 41.355 - type: ndcg_at_3 value: 31.139 - type: ndcg_at_5 value: 32.589 - type: precision_at_1 value: 26.994 - type: precision_at_10 value: 5.322 - type: precision_at_100 value: 0.8160000000000001 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 13.344000000000001 - type: precision_at_5 value: 8.988 - type: recall_at_1 value: 23.947 - type: recall_at_10 value: 43.647999999999996 - type: recall_at_100 value: 63.851 - type: recall_at_1000 value: 82.0 - type: recall_at_3 value: 34.288000000000004 - type: recall_at_5 value: 38.117000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.197 - type: map_at_10 value: 22.968 - type: map_at_100 value: 24.095 - type: map_at_1000 value: 24.217 - type: map_at_3 value: 20.771 - type: map_at_5 value: 21.995 - type: mrr_at_1 value: 19.511 - type: mrr_at_10 value: 26.55 - type: mrr_at_100 value: 27.500999999999998 - type: mrr_at_1000 value: 27.578999999999997 - type: mrr_at_3 value: 24.421 - type: mrr_at_5 value: 25.604 - type: ndcg_at_1 value: 19.511 - type: ndcg_at_10 value: 27.386 - type: ndcg_at_100 value: 32.828 - type: ndcg_at_1000 value: 35.739 - type: ndcg_at_3 value: 23.405 - type: ndcg_at_5 value: 25.255 - type: precision_at_1 value: 19.511 - type: precision_at_10 value: 5.017 - type: precision_at_100 value: 0.91 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 11.023 - type: precision_at_5 value: 8.025 - type: recall_at_1 value: 16.197 - type: recall_at_10 value: 37.09 - type: recall_at_100 value: 61.778 - type: recall_at_1000 value: 82.56599999999999 - type: recall_at_3 value: 26.034000000000002 - type: recall_at_5 value: 30.762 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.41 - type: map_at_10 value: 33.655 - type: map_at_100 value: 34.892 - type: map_at_1000 value: 34.995 - type: map_at_3 value: 30.94 - type: map_at_5 value: 32.303 - type: mrr_at_1 value: 29.477999999999998 - type: mrr_at_10 value: 37.443 - type: mrr_at_100 value: 38.383 - type: mrr_at_1000 value: 38.440000000000005 - type: mrr_at_3 value: 34.949999999999996 - type: mrr_at_5 value: 36.228 - type: ndcg_at_1 value: 29.477999999999998 - type: ndcg_at_10 value: 38.769 - type: ndcg_at_100 value: 44.245000000000005 - type: ndcg_at_1000 value: 46.593 - type: ndcg_at_3 value: 33.623 - type: ndcg_at_5 value: 35.766 - type: precision_at_1 value: 29.477999999999998 - type: precision_at_10 value: 6.455 - type: precision_at_100 value: 1.032 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 14.893999999999998 - type: precision_at_5 value: 10.485 - type: recall_at_1 value: 25.41 - type: recall_at_10 value: 50.669 - type: recall_at_100 value: 74.084 - type: recall_at_1000 value: 90.435 - type: recall_at_3 value: 36.679 - type: recall_at_5 value: 41.94 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.339 - type: map_at_10 value: 31.852000000000004 - type: map_at_100 value: 33.411 - type: map_at_1000 value: 33.62 - type: map_at_3 value: 28.929 - type: map_at_5 value: 30.542 - type: mrr_at_1 value: 28.063 - type: mrr_at_10 value: 36.301 - type: mrr_at_100 value: 37.288 - type: mrr_at_1000 value: 37.349 - type: mrr_at_3 value: 33.663 - type: mrr_at_5 value: 35.165 - type: ndcg_at_1 value: 28.063 - type: ndcg_at_10 value: 37.462 - type: ndcg_at_100 value: 43.620999999999995 - type: ndcg_at_1000 value: 46.211 - type: ndcg_at_3 value: 32.68 - type: ndcg_at_5 value: 34.981 - type: precision_at_1 value: 28.063 - type: precision_at_10 value: 7.1739999999999995 - type: precision_at_100 value: 1.486 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 15.217 - type: precision_at_5 value: 11.265 - type: recall_at_1 value: 23.339 - type: recall_at_10 value: 48.376999999999995 - type: recall_at_100 value: 76.053 - type: recall_at_1000 value: 92.455 - type: recall_at_3 value: 34.735 - type: recall_at_5 value: 40.71 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.925 - type: map_at_10 value: 26.017000000000003 - type: map_at_100 value: 27.034000000000002 - type: map_at_1000 value: 27.156000000000002 - type: map_at_3 value: 23.604 - type: map_at_5 value: 24.75 - type: mrr_at_1 value: 20.333000000000002 - type: mrr_at_10 value: 27.915 - type: mrr_at_100 value: 28.788000000000004 - type: mrr_at_1000 value: 28.877999999999997 - type: mrr_at_3 value: 25.446999999999996 - type: mrr_at_5 value: 26.648 - type: ndcg_at_1 value: 20.333000000000002 - type: ndcg_at_10 value: 30.673000000000002 - type: ndcg_at_100 value: 35.618 - type: ndcg_at_1000 value: 38.517 - type: ndcg_at_3 value: 25.71 - type: ndcg_at_5 value: 27.679 - type: precision_at_1 value: 20.333000000000002 - type: precision_at_10 value: 4.9910000000000005 - type: precision_at_100 value: 0.8130000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 11.029 - type: precision_at_5 value: 7.8740000000000006 - type: recall_at_1 value: 18.925 - type: recall_at_10 value: 43.311 - type: recall_at_100 value: 66.308 - type: recall_at_1000 value: 87.49 - type: recall_at_3 value: 29.596 - type: recall_at_5 value: 34.245 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.714 - type: map_at_10 value: 23.194 - type: map_at_100 value: 24.976000000000003 - type: map_at_1000 value: 25.166 - type: map_at_3 value: 19.709 - type: map_at_5 value: 21.523999999999997 - type: mrr_at_1 value: 30.619000000000003 - type: mrr_at_10 value: 42.563 - type: mrr_at_100 value: 43.386 - type: mrr_at_1000 value: 43.423 - type: mrr_at_3 value: 39.555 - type: mrr_at_5 value: 41.268 - type: ndcg_at_1 value: 30.619000000000003 - type: ndcg_at_10 value: 31.836 - type: ndcg_at_100 value: 38.652 - type: ndcg_at_1000 value: 42.088 - type: ndcg_at_3 value: 26.733 - type: ndcg_at_5 value: 28.435 - type: precision_at_1 value: 30.619000000000003 - type: precision_at_10 value: 9.751999999999999 - type: precision_at_100 value: 1.71 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 19.935 - type: precision_at_5 value: 14.984 - type: recall_at_1 value: 13.714 - type: recall_at_10 value: 37.26 - type: recall_at_100 value: 60.546 - type: recall_at_1000 value: 79.899 - type: recall_at_3 value: 24.325 - type: recall_at_5 value: 29.725 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.462 - type: map_at_10 value: 18.637 - type: map_at_100 value: 26.131999999999998 - type: map_at_1000 value: 27.607 - type: map_at_3 value: 13.333 - type: map_at_5 value: 15.654000000000002 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.32600000000001 - type: mrr_at_100 value: 74.60900000000001 - type: mrr_at_1000 value: 74.62 - type: mrr_at_3 value: 72.667 - type: mrr_at_5 value: 73.817 - type: ndcg_at_1 value: 53.87499999999999 - type: ndcg_at_10 value: 40.028999999999996 - type: ndcg_at_100 value: 44.199 - type: ndcg_at_1000 value: 51.629999999999995 - type: ndcg_at_3 value: 44.113 - type: ndcg_at_5 value: 41.731 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 31.900000000000002 - type: precision_at_100 value: 10.043000000000001 - type: precision_at_1000 value: 1.926 - type: precision_at_3 value: 47.417 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.462 - type: recall_at_10 value: 24.293 - type: recall_at_100 value: 50.146 - type: recall_at_1000 value: 74.034 - type: recall_at_3 value: 14.967 - type: recall_at_5 value: 18.682000000000002 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.84499999999999 - type: f1 value: 42.48106691979349 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.034 - type: map_at_10 value: 82.76 - type: map_at_100 value: 82.968 - type: map_at_1000 value: 82.98299999999999 - type: map_at_3 value: 81.768 - type: map_at_5 value: 82.418 - type: mrr_at_1 value: 80.048 - type: mrr_at_10 value: 87.64999999999999 - type: mrr_at_100 value: 87.712 - type: mrr_at_1000 value: 87.713 - type: mrr_at_3 value: 87.01100000000001 - type: mrr_at_5 value: 87.466 - type: ndcg_at_1 value: 80.048 - type: ndcg_at_10 value: 86.643 - type: ndcg_at_100 value: 87.361 - type: ndcg_at_1000 value: 87.606 - type: ndcg_at_3 value: 85.137 - type: ndcg_at_5 value: 86.016 - type: precision_at_1 value: 80.048 - type: precision_at_10 value: 10.372 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 32.638 - type: precision_at_5 value: 20.177 - type: recall_at_1 value: 74.034 - type: recall_at_10 value: 93.769 - type: recall_at_100 value: 96.569 - type: recall_at_1000 value: 98.039 - type: recall_at_3 value: 89.581 - type: recall_at_5 value: 91.906 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.5 - type: map_at_10 value: 32.857 - type: map_at_100 value: 34.589 - type: map_at_1000 value: 34.778 - type: map_at_3 value: 29.160999999999998 - type: map_at_5 value: 31.033 - type: mrr_at_1 value: 40.123 - type: mrr_at_10 value: 48.776 - type: mrr_at_100 value: 49.495 - type: mrr_at_1000 value: 49.539 - type: mrr_at_3 value: 46.605000000000004 - type: mrr_at_5 value: 47.654 - type: ndcg_at_1 value: 40.123 - type: ndcg_at_10 value: 40.343 - type: ndcg_at_100 value: 46.56 - type: ndcg_at_1000 value: 49.777 - type: ndcg_at_3 value: 37.322 - type: ndcg_at_5 value: 37.791000000000004 - type: precision_at_1 value: 40.123 - type: precision_at_10 value: 11.08 - type: precision_at_100 value: 1.752 - type: precision_at_1000 value: 0.232 - type: precision_at_3 value: 24.897 - type: precision_at_5 value: 17.809 - type: recall_at_1 value: 20.5 - type: recall_at_10 value: 46.388 - type: recall_at_100 value: 69.552 - type: recall_at_1000 value: 89.011 - type: recall_at_3 value: 33.617999999999995 - type: recall_at_5 value: 38.211 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.135999999999996 - type: map_at_10 value: 61.673 - type: map_at_100 value: 62.562 - type: map_at_1000 value: 62.62 - type: map_at_3 value: 58.467999999999996 - type: map_at_5 value: 60.463 - type: mrr_at_1 value: 78.271 - type: mrr_at_10 value: 84.119 - type: mrr_at_100 value: 84.29299999999999 - type: mrr_at_1000 value: 84.299 - type: mrr_at_3 value: 83.18900000000001 - type: mrr_at_5 value: 83.786 - type: ndcg_at_1 value: 78.271 - type: ndcg_at_10 value: 69.935 - type: ndcg_at_100 value: 73.01299999999999 - type: ndcg_at_1000 value: 74.126 - type: ndcg_at_3 value: 65.388 - type: ndcg_at_5 value: 67.906 - type: precision_at_1 value: 78.271 - type: precision_at_10 value: 14.562 - type: precision_at_100 value: 1.6969999999999998 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 41.841 - type: precision_at_5 value: 27.087 - type: recall_at_1 value: 39.135999999999996 - type: recall_at_10 value: 72.809 - type: recall_at_100 value: 84.86200000000001 - type: recall_at_1000 value: 92.208 - type: recall_at_3 value: 62.76199999999999 - type: recall_at_5 value: 67.718 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.60600000000001 - type: ap value: 86.6579587804335 - type: f1 value: 90.5938853929307 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.852 - type: map_at_10 value: 33.982 - type: map_at_100 value: 35.116 - type: map_at_1000 value: 35.167 - type: map_at_3 value: 30.134 - type: map_at_5 value: 32.340999999999994 - type: mrr_at_1 value: 22.479 - type: mrr_at_10 value: 34.594 - type: mrr_at_100 value: 35.672 - type: mrr_at_1000 value: 35.716 - type: mrr_at_3 value: 30.84 - type: mrr_at_5 value: 32.998 - type: ndcg_at_1 value: 22.493 - type: ndcg_at_10 value: 40.833000000000006 - type: ndcg_at_100 value: 46.357 - type: ndcg_at_1000 value: 47.637 - type: ndcg_at_3 value: 32.995999999999995 - type: ndcg_at_5 value: 36.919000000000004 - type: precision_at_1 value: 22.493 - type: precision_at_10 value: 6.465999999999999 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.030999999999999 - type: precision_at_5 value: 10.413 - type: recall_at_1 value: 21.852 - type: recall_at_10 value: 61.934999999999995 - type: recall_at_100 value: 87.611 - type: recall_at_1000 value: 97.441 - type: recall_at_3 value: 40.583999999999996 - type: recall_at_5 value: 49.992999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.36069311445507 - type: f1 value: 93.16456330371453 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 74.74692202462381 - type: f1 value: 58.17903579421599 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.80833893745796 - type: f1 value: 72.70786592684664 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.69872225958305 - type: f1 value: 78.61626934504731 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.058658628717694 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.85561739360599 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.290259910144385 - type: mrr value: 32.44223046102856 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.288 - type: map_at_10 value: 12.267999999999999 - type: map_at_100 value: 15.557000000000002 - type: map_at_1000 value: 16.98 - type: map_at_3 value: 8.866 - type: map_at_5 value: 10.418 - type: mrr_at_1 value: 43.653 - type: mrr_at_10 value: 52.681 - type: mrr_at_100 value: 53.315999999999995 - type: mrr_at_1000 value: 53.357 - type: mrr_at_3 value: 51.393 - type: mrr_at_5 value: 51.903999999999996 - type: ndcg_at_1 value: 42.415000000000006 - type: ndcg_at_10 value: 34.305 - type: ndcg_at_100 value: 30.825999999999997 - type: ndcg_at_1000 value: 39.393 - type: ndcg_at_3 value: 39.931 - type: ndcg_at_5 value: 37.519999999999996 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 25.728 - type: precision_at_100 value: 7.932 - type: precision_at_1000 value: 2.07 - type: precision_at_3 value: 38.184000000000005 - type: precision_at_5 value: 32.879000000000005 - type: recall_at_1 value: 5.288 - type: recall_at_10 value: 16.195 - type: recall_at_100 value: 31.135 - type: recall_at_1000 value: 61.531000000000006 - type: recall_at_3 value: 10.313 - type: recall_at_5 value: 12.754999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 28.216 - type: map_at_10 value: 42.588 - type: map_at_100 value: 43.702999999999996 - type: map_at_1000 value: 43.739 - type: map_at_3 value: 38.177 - type: map_at_5 value: 40.754000000000005 - type: mrr_at_1 value: 31.866 - type: mrr_at_10 value: 45.189 - type: mrr_at_100 value: 46.056000000000004 - type: mrr_at_1000 value: 46.081 - type: mrr_at_3 value: 41.526999999999994 - type: mrr_at_5 value: 43.704 - type: ndcg_at_1 value: 31.837 - type: ndcg_at_10 value: 50.178 - type: ndcg_at_100 value: 54.98800000000001 - type: ndcg_at_1000 value: 55.812 - type: ndcg_at_3 value: 41.853 - type: ndcg_at_5 value: 46.153 - type: precision_at_1 value: 31.837 - type: precision_at_10 value: 8.43 - type: precision_at_100 value: 1.1119999999999999 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 19.023 - type: precision_at_5 value: 13.911000000000001 - type: recall_at_1 value: 28.216 - type: recall_at_10 value: 70.8 - type: recall_at_100 value: 91.857 - type: recall_at_1000 value: 97.941 - type: recall_at_3 value: 49.196 - type: recall_at_5 value: 59.072 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.22800000000001 - type: map_at_10 value: 85.115 - type: map_at_100 value: 85.72 - type: map_at_1000 value: 85.737 - type: map_at_3 value: 82.149 - type: map_at_5 value: 84.029 - type: mrr_at_1 value: 81.96 - type: mrr_at_10 value: 88.00200000000001 - type: mrr_at_100 value: 88.088 - type: mrr_at_1000 value: 88.089 - type: mrr_at_3 value: 87.055 - type: mrr_at_5 value: 87.715 - type: ndcg_at_1 value: 82.01 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.91 - type: ndcg_at_1000 value: 90.013 - type: ndcg_at_3 value: 85.957 - type: ndcg_at_5 value: 87.56 - type: precision_at_1 value: 82.01 - type: precision_at_10 value: 13.462 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.553 - type: precision_at_5 value: 24.732000000000003 - type: recall_at_1 value: 71.22800000000001 - type: recall_at_10 value: 95.69 - type: recall_at_100 value: 99.531 - type: recall_at_1000 value: 99.98 - type: recall_at_3 value: 87.632 - type: recall_at_5 value: 92.117 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 52.31768034366916 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 60.640266772723606 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.7780000000000005 - type: map_at_10 value: 12.299 - type: map_at_100 value: 14.363000000000001 - type: map_at_1000 value: 14.71 - type: map_at_3 value: 8.738999999999999 - type: map_at_5 value: 10.397 - type: mrr_at_1 value: 23.599999999999998 - type: mrr_at_10 value: 34.845 - type: mrr_at_100 value: 35.916 - type: mrr_at_1000 value: 35.973 - type: mrr_at_3 value: 31.7 - type: mrr_at_5 value: 33.535 - type: ndcg_at_1 value: 23.599999999999998 - type: ndcg_at_10 value: 20.522000000000002 - type: ndcg_at_100 value: 28.737000000000002 - type: ndcg_at_1000 value: 34.596 - type: ndcg_at_3 value: 19.542 - type: ndcg_at_5 value: 16.958000000000002 - type: precision_at_1 value: 23.599999999999998 - type: precision_at_10 value: 10.67 - type: precision_at_100 value: 2.259 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 18.333 - type: precision_at_5 value: 14.879999999999999 - type: recall_at_1 value: 4.7780000000000005 - type: recall_at_10 value: 21.617 - type: recall_at_100 value: 45.905 - type: recall_at_1000 value: 74.42 - type: recall_at_3 value: 11.148 - type: recall_at_5 value: 15.082999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.22372750297885 - type: cos_sim_spearman value: 79.40972617119405 - type: euclidean_pearson value: 80.6101072020434 - type: euclidean_spearman value: 79.53844217225202 - type: manhattan_pearson value: 80.57265975286111 - type: manhattan_spearman value: 79.46335611792958 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.43713315520749 - type: cos_sim_spearman value: 77.44128693329532 - type: euclidean_pearson value: 81.63869928101123 - type: euclidean_spearman value: 77.29512977961515 - type: manhattan_pearson value: 81.63704185566183 - type: manhattan_spearman value: 77.29909412738657 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 81.59451537860527 - type: cos_sim_spearman value: 82.97994638856723 - type: euclidean_pearson value: 82.89478688288412 - type: euclidean_spearman value: 83.58740751053104 - type: manhattan_pearson value: 82.69140840941608 - type: manhattan_spearman value: 83.33665956040555 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.00756527711764 - type: cos_sim_spearman value: 81.83560996841379 - type: euclidean_pearson value: 82.07684151976518 - type: euclidean_spearman value: 82.00913052060511 - type: manhattan_pearson value: 82.05690778488794 - type: manhattan_spearman value: 82.02260252019525 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.13710262895447 - type: cos_sim_spearman value: 87.26412811156248 - type: euclidean_pearson value: 86.94151453230228 - type: euclidean_spearman value: 87.5363796699571 - type: manhattan_pearson value: 86.86989424083748 - type: manhattan_spearman value: 87.47315940781353 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.0230597603627 - type: cos_sim_spearman value: 84.93344499318864 - type: euclidean_pearson value: 84.23754743431141 - type: euclidean_spearman value: 85.09707376597099 - type: manhattan_pearson value: 84.04325160987763 - type: manhattan_spearman value: 84.89353071339909 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.75620824563921 - type: cos_sim_spearman value: 87.15065513706398 - type: euclidean_pearson value: 88.26281533633521 - type: euclidean_spearman value: 87.51963738643983 - type: manhattan_pearson value: 88.25599267618065 - type: manhattan_spearman value: 87.58048736047483 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.74645319195137 - type: cos_sim_spearman value: 65.29996325037214 - type: euclidean_pearson value: 67.04297794086443 - type: euclidean_spearman value: 65.43841726694343 - type: manhattan_pearson value: 67.39459955690904 - type: manhattan_spearman value: 65.92864704413651 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.31291020270801 - type: cos_sim_spearman value: 85.86473738688068 - type: euclidean_pearson value: 85.65537275064152 - type: euclidean_spearman value: 86.13087454209642 - type: manhattan_pearson value: 85.43946955047609 - type: manhattan_spearman value: 85.91568175344916 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.93798118350695 - type: mrr value: 95.93536274908824 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.594 - type: map_at_10 value: 66.81899999999999 - type: map_at_100 value: 67.368 - type: map_at_1000 value: 67.4 - type: map_at_3 value: 64.061 - type: map_at_5 value: 65.47 - type: mrr_at_1 value: 60.667 - type: mrr_at_10 value: 68.219 - type: mrr_at_100 value: 68.655 - type: mrr_at_1000 value: 68.684 - type: mrr_at_3 value: 66.22200000000001 - type: mrr_at_5 value: 67.289 - type: ndcg_at_1 value: 60.667 - type: ndcg_at_10 value: 71.275 - type: ndcg_at_100 value: 73.642 - type: ndcg_at_1000 value: 74.373 - type: ndcg_at_3 value: 66.521 - type: ndcg_at_5 value: 68.581 - type: precision_at_1 value: 60.667 - type: precision_at_10 value: 9.433 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.556 - type: precision_at_5 value: 16.8 - type: recall_at_1 value: 57.594 - type: recall_at_10 value: 83.622 - type: recall_at_100 value: 94.167 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 70.64399999999999 - type: recall_at_5 value: 75.983 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85841584158416 - type: cos_sim_ap value: 96.66996142314342 - type: cos_sim_f1 value: 92.83208020050125 - type: cos_sim_precision value: 93.06532663316584 - type: cos_sim_recall value: 92.60000000000001 - type: dot_accuracy value: 99.85841584158416 - type: dot_ap value: 96.6775307676576 - type: dot_f1 value: 92.69289729177312 - type: dot_precision value: 94.77533960292581 - type: dot_recall value: 90.7 - type: euclidean_accuracy value: 99.86138613861387 - type: euclidean_ap value: 96.6338454403108 - type: euclidean_f1 value: 92.92214357937311 - type: euclidean_precision value: 93.96728016359918 - type: euclidean_recall value: 91.9 - type: manhattan_accuracy value: 99.86237623762376 - type: manhattan_ap value: 96.60370449645053 - type: manhattan_f1 value: 92.91177970423253 - type: manhattan_precision value: 94.7970863683663 - type: manhattan_recall value: 91.10000000000001 - type: max_accuracy value: 99.86237623762376 - type: max_ap value: 96.6775307676576 - type: max_f1 value: 92.92214357937311 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 60.77977058695198 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.2725272535638 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.64052466362125 - type: mrr value: 54.533067014684654 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.677624219206578 - type: cos_sim_spearman value: 30.121368518123447 - type: dot_pearson value: 30.69870088041608 - type: dot_spearman value: 29.61284927093751 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 1.855 - type: map_at_100 value: 9.885 - type: map_at_1000 value: 23.416999999999998 - type: map_at_3 value: 0.637 - type: map_at_5 value: 1.024 - type: mrr_at_1 value: 88.0 - type: mrr_at_10 value: 93.067 - type: mrr_at_100 value: 93.067 - type: mrr_at_1000 value: 93.067 - type: mrr_at_3 value: 92.667 - type: mrr_at_5 value: 93.067 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 75.899 - type: ndcg_at_100 value: 55.115 - type: ndcg_at_1000 value: 48.368 - type: ndcg_at_3 value: 79.704 - type: ndcg_at_5 value: 78.39699999999999 - type: precision_at_1 value: 88.0 - type: precision_at_10 value: 79.60000000000001 - type: precision_at_100 value: 56.06 - type: precision_at_1000 value: 21.206 - type: precision_at_3 value: 84.667 - type: precision_at_5 value: 83.2 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.078 - type: recall_at_100 value: 13.297 - type: recall_at_1000 value: 44.979 - type: recall_at_3 value: 0.6689999999999999 - type: recall_at_5 value: 1.106 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.258 - type: map_at_10 value: 10.439 - type: map_at_100 value: 16.89 - type: map_at_1000 value: 18.407999999999998 - type: map_at_3 value: 5.668 - type: map_at_5 value: 7.718 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 51.159 - type: mrr_at_100 value: 51.714000000000006 - type: mrr_at_1000 value: 51.714000000000006 - type: mrr_at_3 value: 47.959 - type: mrr_at_5 value: 50.407999999999994 - type: ndcg_at_1 value: 29.592000000000002 - type: ndcg_at_10 value: 26.037 - type: ndcg_at_100 value: 37.924 - type: ndcg_at_1000 value: 49.126999999999995 - type: ndcg_at_3 value: 30.631999999999998 - type: ndcg_at_5 value: 28.571 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 22.857 - type: precision_at_100 value: 7.754999999999999 - type: precision_at_1000 value: 1.529 - type: precision_at_3 value: 34.014 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.258 - type: recall_at_10 value: 16.554 - type: recall_at_100 value: 48.439 - type: recall_at_1000 value: 82.80499999999999 - type: recall_at_3 value: 7.283 - type: recall_at_5 value: 10.732 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 69.8858 - type: ap value: 13.835684144362109 - type: f1 value: 53.803351693244586 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.50650820599886 - type: f1 value: 60.84357825979259 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.52131044852134 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.59337187816654 - type: cos_sim_ap value: 73.23925826533437 - type: cos_sim_f1 value: 67.34693877551021 - type: cos_sim_precision value: 62.40432237730752 - type: cos_sim_recall value: 73.13984168865434 - type: dot_accuracy value: 85.31322644096085 - type: dot_ap value: 72.30723963807422 - type: dot_f1 value: 66.47051612112296 - type: dot_precision value: 62.0792305930845 - type: dot_recall value: 71.53034300791556 - type: euclidean_accuracy value: 85.61125350181797 - type: euclidean_ap value: 73.32843720487845 - type: euclidean_f1 value: 67.36549633745895 - type: euclidean_precision value: 64.60755813953489 - type: euclidean_recall value: 70.36939313984169 - type: manhattan_accuracy value: 85.63509566668654 - type: manhattan_ap value: 73.16658488311325 - type: manhattan_f1 value: 67.20597386434349 - type: manhattan_precision value: 63.60424028268551 - type: manhattan_recall value: 71.2401055408971 - type: max_accuracy value: 85.63509566668654 - type: max_ap value: 73.32843720487845 - type: max_f1 value: 67.36549633745895 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.33779640625606 - type: cos_sim_ap value: 84.83868375898157 - type: cos_sim_f1 value: 77.16506154017773 - type: cos_sim_precision value: 74.62064005753327 - type: cos_sim_recall value: 79.88912842623961 - type: dot_accuracy value: 88.02732176815307 - type: dot_ap value: 83.95089283763002 - type: dot_f1 value: 76.29635101196631 - type: dot_precision value: 73.31771720613288 - type: dot_recall value: 79.52725592854944 - type: euclidean_accuracy value: 88.44452206310397 - type: euclidean_ap value: 84.98384576824827 - type: euclidean_f1 value: 77.29311047696697 - type: euclidean_precision value: 74.51232583065381 - type: euclidean_recall value: 80.28949799815214 - type: manhattan_accuracy value: 88.47362906042613 - type: manhattan_ap value: 84.91421462218432 - type: manhattan_f1 value: 77.05107637204792 - type: manhattan_precision value: 74.74484256243214 - type: manhattan_recall value: 79.50415768401602 - type: max_accuracy value: 88.47362906042613 - type: max_ap value: 84.98384576824827 - type: max_f1 value: 77.29311047696697 license: mit language: - en --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. Recommended is with flash attention on gpu, and for onnx inference. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
28 |
+
"model_explanation_gemini": "Generates sentence embeddings for tasks like sentence similarity, retrieval, classification, clustering, and reranking."
|
29 |
+
}
|
data/model_data_json/BAAI_bge-small-en.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/bge-small-en",
|
3 |
+
"downloads": 251381,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"bert",
|
9 |
+
"feature-extraction",
|
10 |
+
"mteb",
|
11 |
+
"sentence transformers",
|
12 |
+
"en",
|
13 |
+
"arxiv:2311.13534",
|
14 |
+
"arxiv:2310.07554",
|
15 |
+
"arxiv:2309.07597",
|
16 |
+
"license:mit",
|
17 |
+
"model-index",
|
18 |
+
"text-embeddings-inference",
|
19 |
+
"endpoints_compatible",
|
20 |
+
"region:us"
|
21 |
+
],
|
22 |
+
"description": "--- tags: - mteb - sentence transformers model-index: - name: bge-small-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.34328358208955 - type: ap value: 37.59947775195661 - type: f1 value: 68.548415491933 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.04527499999999 - type: ap value: 89.60696356772135 - type: f1 value: 93.03361469382438 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.08 - type: f1 value: 45.66249835363254 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 35.205999999999996 - type: map_at_10 value: 50.782000000000004 - type: map_at_100 value: 51.547 - type: map_at_1000 value: 51.554 - type: map_at_3 value: 46.515 - type: map_at_5 value: 49.296 - type: mrr_at_1 value: 35.632999999999996 - type: mrr_at_10 value: 50.958999999999996 - type: mrr_at_100 value: 51.724000000000004 - type: mrr_at_1000 value: 51.731 - type: mrr_at_3 value: 46.669 - type: mrr_at_5 value: 49.439 - type: ndcg_at_1 value: 35.205999999999996 - type: ndcg_at_10 value: 58.835 - type: ndcg_at_100 value: 62.095 - type: ndcg_at_1000 value: 62.255 - type: ndcg_at_3 value: 50.255 - type: ndcg_at_5 value: 55.296 - type: precision_at_1 value: 35.205999999999996 - type: precision_at_10 value: 8.421 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.365 - type: precision_at_5 value: 14.680000000000001 - type: recall_at_1 value: 35.205999999999996 - type: recall_at_10 value: 84.211 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 61.095 - type: recall_at_5 value: 73.4 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.52644476278646 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 39.973045724188964 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.28285314871488 - type: mrr value: 74.52743701358659 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 80.09041909160327 - type: cos_sim_spearman value: 79.96266537706944 - type: euclidean_pearson value: 79.50774978162241 - type: euclidean_spearman value: 79.9144715078551 - type: manhattan_pearson value: 79.2062139879302 - type: manhattan_spearman value: 79.35000081468212 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.31493506493506 - type: f1 value: 85.2704557977762 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.6837242810816 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 35.38881249555897 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.884999999999998 - type: map_at_10 value: 39.574 - type: map_at_100 value: 40.993 - type: map_at_1000 value: 41.129 - type: map_at_3 value: 36.089 - type: map_at_5 value: 38.191 - type: mrr_at_1 value: 34.477999999999994 - type: mrr_at_10 value: 45.411 - type: mrr_at_100 value: 46.089999999999996 - type: mrr_at_1000 value: 46.147 - type: mrr_at_3 value: 42.346000000000004 - type: mrr_at_5 value: 44.292 - type: ndcg_at_1 value: 34.477999999999994 - type: ndcg_at_10 value: 46.123999999999995 - type: ndcg_at_100 value: 51.349999999999994 - type: ndcg_at_1000 value: 53.578 - type: ndcg_at_3 value: 40.824 - type: ndcg_at_5 value: 43.571 - type: precision_at_1 value: 34.477999999999994 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 1.4460000000000002 - type: precision_at_1000 value: 0.192 - type: precision_at_3 value: 19.742 - type: precision_at_5 value: 14.421000000000001 - type: recall_at_1 value: 27.884999999999998 - type: recall_at_10 value: 59.087 - type: recall_at_100 value: 80.609 - type: recall_at_1000 value: 95.054 - type: recall_at_3 value: 44.082 - type: recall_at_5 value: 51.593999999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.639 - type: map_at_10 value: 40.047 - type: map_at_100 value: 41.302 - type: map_at_1000 value: 41.425 - type: map_at_3 value: 37.406 - type: map_at_5 value: 38.934000000000005 - type: mrr_at_1 value: 37.707 - type: mrr_at_10 value: 46.082 - type: mrr_at_100 value: 46.745 - type: mrr_at_1000 value: 46.786 - type: mrr_at_3 value: 43.980999999999995 - type: mrr_at_5 value: 45.287 - type: ndcg_at_1 value: 37.707 - type: ndcg_at_10 value: 45.525 - type: ndcg_at_100 value: 49.976 - type: ndcg_at_1000 value: 51.94499999999999 - type: ndcg_at_3 value: 41.704 - type: ndcg_at_5 value: 43.596000000000004 - type: precision_at_1 value: 37.707 - type: precision_at_10 value: 8.465 - type: precision_at_100 value: 1.375 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 19.979 - type: precision_at_5 value: 14.115 - type: recall_at_1 value: 30.639 - type: recall_at_10 value: 54.775 - type: recall_at_100 value: 73.678 - type: recall_at_1000 value: 86.142 - type: recall_at_3 value: 43.230000000000004 - type: recall_at_5 value: 48.622 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.038 - type: map_at_10 value: 49.922 - type: map_at_100 value: 51.032 - type: map_at_1000 value: 51.085 - type: map_at_3 value: 46.664 - type: map_at_5 value: 48.588 - type: mrr_at_1 value: 43.95 - type: mrr_at_10 value: 53.566 - type: mrr_at_100 value: 54.318999999999996 - type: mrr_at_1000 value: 54.348 - type: mrr_at_3 value: 51.066 - type: mrr_at_5 value: 52.649 - type: ndcg_at_1 value: 43.95 - type: ndcg_at_10 value: 55.676 - type: ndcg_at_100 value: 60.126000000000005 - type: ndcg_at_1000 value: 61.208 - type: ndcg_at_3 value: 50.20400000000001 - type: ndcg_at_5 value: 53.038 - type: precision_at_1 value: 43.95 - type: precision_at_10 value: 8.953 - type: precision_at_100 value: 1.2109999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 22.256999999999998 - type: precision_at_5 value: 15.524 - type: recall_at_1 value: 38.038 - type: recall_at_10 value: 69.15 - type: recall_at_100 value: 88.31599999999999 - type: recall_at_1000 value: 95.993 - type: recall_at_3 value: 54.663 - type: recall_at_5 value: 61.373 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.872 - type: map_at_10 value: 32.912 - type: map_at_100 value: 33.972 - type: map_at_1000 value: 34.046 - type: map_at_3 value: 30.361 - type: map_at_5 value: 31.704 - type: mrr_at_1 value: 26.779999999999998 - type: mrr_at_10 value: 34.812 - type: mrr_at_100 value: 35.754999999999995 - type: mrr_at_1000 value: 35.809000000000005 - type: mrr_at_3 value: 32.335 - type: mrr_at_5 value: 33.64 - type: ndcg_at_1 value: 26.779999999999998 - type: ndcg_at_10 value: 37.623 - type: ndcg_at_100 value: 42.924 - type: ndcg_at_1000 value: 44.856 - type: ndcg_at_3 value: 32.574 - type: ndcg_at_5 value: 34.842 - type: precision_at_1 value: 26.779999999999998 - type: precision_at_10 value: 5.729 - type: precision_at_100 value: 0.886 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 13.559 - type: precision_at_5 value: 9.469 - type: recall_at_1 value: 24.872 - type: recall_at_10 value: 50.400999999999996 - type: recall_at_100 value: 74.954 - type: recall_at_1000 value: 89.56 - type: recall_at_3 value: 36.726 - type: recall_at_5 value: 42.138999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.803 - type: map_at_10 value: 24.348 - type: map_at_100 value: 25.56 - type: map_at_1000 value: 25.668000000000003 - type: map_at_3 value: 21.811 - type: map_at_5 value: 23.287 - type: mrr_at_1 value: 20.771 - type: mrr_at_10 value: 28.961 - type: mrr_at_100 value: 29.979 - type: mrr_at_1000 value: 30.046 - type: mrr_at_3 value: 26.555 - type: mrr_at_5 value: 28.060000000000002 - type: ndcg_at_1 value: 20.771 - type: ndcg_at_10 value: 29.335 - type: ndcg_at_100 value: 35.188 - type: ndcg_at_1000 value: 37.812 - type: ndcg_at_3 value: 24.83 - type: ndcg_at_5 value: 27.119 - type: precision_at_1 value: 20.771 - type: precision_at_10 value: 5.4350000000000005 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 11.982 - type: precision_at_5 value: 8.831 - type: recall_at_1 value: 16.803 - type: recall_at_10 value: 40.039 - type: recall_at_100 value: 65.83200000000001 - type: recall_at_1000 value: 84.478 - type: recall_at_3 value: 27.682000000000002 - type: recall_at_5 value: 33.535 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.345 - type: map_at_10 value: 37.757000000000005 - type: map_at_100 value: 39.141 - type: map_at_1000 value: 39.262 - type: map_at_3 value: 35.183 - type: map_at_5 value: 36.592 - type: mrr_at_1 value: 34.649 - type: mrr_at_10 value: 43.586999999999996 - type: mrr_at_100 value: 44.481 - type: mrr_at_1000 value: 44.542 - type: mrr_at_3 value: 41.29 - type: mrr_at_5 value: 42.642 - type: ndcg_at_1 value: 34.649 - type: ndcg_at_10 value: 43.161 - type: ndcg_at_100 value: 48.734 - type: ndcg_at_1000 value: 51.046 - type: ndcg_at_3 value: 39.118 - type: ndcg_at_5 value: 41.022 - type: precision_at_1 value: 34.649 - type: precision_at_10 value: 7.603 - type: precision_at_100 value: 1.209 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 18.319 - type: precision_at_5 value: 12.839 - type: recall_at_1 value: 28.345 - type: recall_at_10 value: 53.367 - type: recall_at_100 value: 76.453 - type: recall_at_1000 value: 91.82000000000001 - type: recall_at_3 value: 41.636 - type: recall_at_5 value: 46.760000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.419 - type: map_at_10 value: 31.716 - type: map_at_100 value: 33.152 - type: map_at_1000 value: 33.267 - type: map_at_3 value: 28.74 - type: map_at_5 value: 30.48 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 37.039 - type: mrr_at_100 value: 38.09 - type: mrr_at_1000 value: 38.145 - type: mrr_at_3 value: 34.437 - type: mrr_at_5 value: 36.024 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 37.41 - type: ndcg_at_100 value: 43.647999999999996 - type: ndcg_at_1000 value: 46.007 - type: ndcg_at_3 value: 32.509 - type: ndcg_at_5 value: 34.943999999999996 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 6.963 - type: precision_at_100 value: 1.1860000000000002 - type: precision_at_1000 value: 0.154 - type: precision_at_3 value: 15.867999999999999 - type: precision_at_5 value: 11.507000000000001 - type: recall_at_1 value: 22.419 - type: recall_at_10 value: 49.28 - type: recall_at_100 value: 75.802 - type: recall_at_1000 value: 92.032 - type: recall_at_3 value: 35.399 - type: recall_at_5 value: 42.027 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.669249999999998 - type: map_at_10 value: 33.332583333333325 - type: map_at_100 value: 34.557833333333335 - type: map_at_1000 value: 34.67141666666666 - type: map_at_3 value: 30.663166666666662 - type: map_at_5 value: 32.14883333333333 - type: mrr_at_1 value: 29.193833333333334 - type: mrr_at_10 value: 37.47625 - type: mrr_at_100 value: 38.3545 - type: mrr_at_1000 value: 38.413166666666676 - type: mrr_at_3 value: 35.06741666666667 - type: mrr_at_5 value: 36.450666666666656 - type: ndcg_at_1 value: 29.193833333333334 - type: ndcg_at_10 value: 38.505416666666676 - type: ndcg_at_100 value: 43.81125 - type: ndcg_at_1000 value: 46.09558333333333 - type: ndcg_at_3 value: 33.90916666666667 - type: ndcg_at_5 value: 36.07666666666666 - type: precision_at_1 value: 29.193833333333334 - type: precision_at_10 value: 6.7251666666666665 - type: precision_at_100 value: 1.1058333333333332 - type: precision_at_1000 value: 0.14833333333333332 - type: precision_at_3 value: 15.554166666666665 - type: precision_at_5 value: 11.079250000000002 - type: recall_at_1 value: 24.669249999999998 - type: recall_at_10 value: 49.75583333333332 - type: recall_at_100 value: 73.06908333333332 - type: recall_at_1000 value: 88.91316666666667 - type: recall_at_3 value: 36.913250000000005 - type: recall_at_5 value: 42.48641666666666 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.044999999999998 - type: map_at_10 value: 30.349999999999998 - type: map_at_100 value: 31.273 - type: map_at_1000 value: 31.362000000000002 - type: map_at_3 value: 28.508 - type: map_at_5 value: 29.369 - type: mrr_at_1 value: 26.994 - type: mrr_at_10 value: 33.12 - type: mrr_at_100 value: 33.904 - type: mrr_at_1000 value: 33.967000000000006 - type: mrr_at_3 value: 31.365 - type: mrr_at_5 value: 32.124 - type: ndcg_at_1 value: 26.994 - type: ndcg_at_10 value: 34.214 - type: ndcg_at_100 value: 38.681 - type: ndcg_at_1000 value: 40.926 - type: ndcg_at_3 value: 30.725 - type: ndcg_at_5 value: 31.967000000000002 - type: precision_at_1 value: 26.994 - type: precision_at_10 value: 5.215 - type: precision_at_100 value: 0.807 - type: precision_at_1000 value: 0.108 - type: precision_at_3 value: 12.986 - type: precision_at_5 value: 8.712 - type: recall_at_1 value: 24.044999999999998 - type: recall_at_10 value: 43.456 - type: recall_at_100 value: 63.675000000000004 - type: recall_at_1000 value: 80.05499999999999 - type: recall_at_3 value: 33.561 - type: recall_at_5 value: 36.767 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.672 - type: map_at_10 value: 22.641 - type: map_at_100 value: 23.75 - type: map_at_1000 value: 23.877000000000002 - type: map_at_3 value: 20.219 - type: map_at_5 value: 21.648 - type: mrr_at_1 value: 18.823 - type: mrr_at_10 value: 26.101999999999997 - type: mrr_at_100 value: 27.038 - type: mrr_at_1000 value: 27.118 - type: mrr_at_3 value: 23.669 - type: mrr_at_5 value: 25.173000000000002 - type: ndcg_at_1 value: 18.823 - type: ndcg_at_10 value: 27.176000000000002 - type: ndcg_at_100 value: 32.42 - type: ndcg_at_1000 value: 35.413 - type: ndcg_at_3 value: 22.756999999999998 - type: ndcg_at_5 value: 25.032 - type: precision_at_1 value: 18.823 - type: precision_at_10 value: 5.034000000000001 - type: precision_at_100 value: 0.895 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 10.771 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.672 - type: recall_at_10 value: 37.296 - type: recall_at_100 value: 60.863 - type: recall_at_1000 value: 82.234 - type: recall_at_3 value: 25.330000000000002 - type: recall_at_5 value: 30.964000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.633 - type: map_at_10 value: 32.858 - type: map_at_100 value: 34.038000000000004 - type: map_at_1000 value: 34.141 - type: map_at_3 value: 30.209000000000003 - type: map_at_5 value: 31.567 - type: mrr_at_1 value: 28.358 - type: mrr_at_10 value: 36.433 - type: mrr_at_100 value: 37.352000000000004 - type: mrr_at_1000 value: 37.41 - type: mrr_at_3 value: 34.033 - type: mrr_at_5 value: 35.246 - type: ndcg_at_1 value: 28.358 - type: ndcg_at_10 value: 37.973 - type: ndcg_at_100 value: 43.411 - type: ndcg_at_1000 value: 45.747 - type: ndcg_at_3 value: 32.934999999999995 - type: ndcg_at_5 value: 35.013 - type: precision_at_1 value: 28.358 - type: precision_at_10 value: 6.418 - type: precision_at_100 value: 1.02 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 14.677000000000001 - type: precision_at_5 value: 10.335999999999999 - type: recall_at_1 value: 24.633 - type: recall_at_10 value: 50.048 - type: recall_at_100 value: 73.821 - type: recall_at_1000 value: 90.046 - type: recall_at_3 value: 36.284 - type: recall_at_5 value: 41.370000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.133 - type: map_at_10 value: 31.491999999999997 - type: map_at_100 value: 33.062000000000005 - type: map_at_1000 value: 33.256 - type: map_at_3 value: 28.886 - type: map_at_5 value: 30.262 - type: mrr_at_1 value: 28.063 - type: mrr_at_10 value: 36.144 - type: mrr_at_100 value: 37.14 - type: mrr_at_1000 value: 37.191 - type: mrr_at_3 value: 33.762 - type: mrr_at_5 value: 34.997 - type: ndcg_at_1 value: 28.063 - type: ndcg_at_10 value: 36.951 - type: ndcg_at_100 value: 43.287 - type: ndcg_at_1000 value: 45.777 - type: ndcg_at_3 value: 32.786 - type: ndcg_at_5 value: 34.65 - type: precision_at_1 value: 28.063 - type: precision_at_10 value: 7.055 - type: precision_at_100 value: 1.476 - type: precision_at_1000 value: 0.22899999999999998 - type: precision_at_3 value: 15.481 - type: precision_at_5 value: 11.186 - type: recall_at_1 value: 23.133 - type: recall_at_10 value: 47.285 - type: recall_at_100 value: 76.176 - type: recall_at_1000 value: 92.176 - type: recall_at_3 value: 35.223 - type: recall_at_5 value: 40.142 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.547 - type: map_at_10 value: 26.374 - type: map_at_100 value: 27.419 - type: map_at_1000 value: 27.539 - type: map_at_3 value: 23.882 - type: map_at_5 value: 25.163999999999998 - type: mrr_at_1 value: 21.442 - type: mrr_at_10 value: 28.458 - type: mrr_at_100 value: 29.360999999999997 - type: mrr_at_1000 value: 29.448999999999998 - type: mrr_at_3 value: 25.97 - type: mrr_at_5 value: 27.273999999999997 - type: ndcg_at_1 value: 21.442 - type: ndcg_at_10 value: 30.897000000000002 - type: ndcg_at_100 value: 35.99 - type: ndcg_at_1000 value: 38.832 - type: ndcg_at_3 value: 25.944 - type: ndcg_at_5 value: 28.126 - type: precision_at_1 value: 21.442 - type: precision_at_10 value: 4.9910000000000005 - type: precision_at_100 value: 0.8109999999999999 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 11.029 - type: precision_at_5 value: 7.911 - type: recall_at_1 value: 19.547 - type: recall_at_10 value: 42.886 - type: recall_at_100 value: 66.64999999999999 - type: recall_at_1000 value: 87.368 - type: recall_at_3 value: 29.143 - type: recall_at_5 value: 34.544000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 15.572 - type: map_at_10 value: 25.312 - type: map_at_100 value: 27.062 - type: map_at_1000 value: 27.253 - type: map_at_3 value: 21.601 - type: map_at_5 value: 23.473 - type: mrr_at_1 value: 34.984 - type: mrr_at_10 value: 46.406 - type: mrr_at_100 value: 47.179 - type: mrr_at_1000 value: 47.21 - type: mrr_at_3 value: 43.485 - type: mrr_at_5 value: 45.322 - type: ndcg_at_1 value: 34.984 - type: ndcg_at_10 value: 34.344 - type: ndcg_at_100 value: 41.015 - type: ndcg_at_1000 value: 44.366 - type: ndcg_at_3 value: 29.119 - type: ndcg_at_5 value: 30.825999999999997 - type: precision_at_1 value: 34.984 - type: precision_at_10 value: 10.358 - type: precision_at_100 value: 1.762 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 21.368000000000002 - type: precision_at_5 value: 15.948 - type: recall_at_1 value: 15.572 - type: recall_at_10 value: 39.367999999999995 - type: recall_at_100 value: 62.183 - type: recall_at_1000 value: 80.92200000000001 - type: recall_at_3 value: 26.131999999999998 - type: recall_at_5 value: 31.635999999999996 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.848 - type: map_at_10 value: 19.25 - type: map_at_100 value: 27.193 - type: map_at_1000 value: 28.721999999999998 - type: map_at_3 value: 13.968 - type: map_at_5 value: 16.283 - type: mrr_at_1 value: 68.75 - type: mrr_at_10 value: 76.25 - type: mrr_at_100 value: 76.534 - type: mrr_at_1000 value: 76.53999999999999 - type: mrr_at_3 value: 74.667 - type: mrr_at_5 value: 75.86699999999999 - type: ndcg_at_1 value: 56.00000000000001 - type: ndcg_at_10 value: 41.426 - type: ndcg_at_100 value: 45.660000000000004 - type: ndcg_at_1000 value: 53.02 - type: ndcg_at_3 value: 46.581 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 68.75 - type: precision_at_10 value: 32.800000000000004 - type: precision_at_100 value: 10.440000000000001 - type: precision_at_1000 value: 1.9980000000000002 - type: precision_at_3 value: 49.667 - type: precision_at_5 value: 42.25 - type: recall_at_1 value: 8.848 - type: recall_at_10 value: 24.467 - type: recall_at_100 value: 51.344 - type: recall_at_1000 value: 75.235 - type: recall_at_3 value: 15.329 - type: recall_at_5 value: 18.892999999999997 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.95 - type: f1 value: 43.44563593360779 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 78.036 - type: map_at_10 value: 85.639 - type: map_at_100 value: 85.815 - type: map_at_1000 value: 85.829 - type: map_at_3 value: 84.795 - type: map_at_5 value: 85.336 - type: mrr_at_1 value: 84.353 - type: mrr_at_10 value: 90.582 - type: mrr_at_100 value: 90.617 - type: mrr_at_1000 value: 90.617 - type: mrr_at_3 value: 90.132 - type: mrr_at_5 value: 90.447 - type: ndcg_at_1 value: 84.353 - type: ndcg_at_10 value: 89.003 - type: ndcg_at_100 value: 89.60000000000001 - type: ndcg_at_1000 value: 89.836 - type: ndcg_at_3 value: 87.81400000000001 - type: ndcg_at_5 value: 88.478 - type: precision_at_1 value: 84.353 - type: precision_at_10 value: 10.482 - type: precision_at_100 value: 1.099 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 33.257999999999996 - type: precision_at_5 value: 20.465 - type: recall_at_1 value: 78.036 - type: recall_at_10 value: 94.517 - type: recall_at_100 value: 96.828 - type: recall_at_1000 value: 98.261 - type: recall_at_3 value: 91.12 - type: recall_at_5 value: 92.946 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.191 - type: map_at_10 value: 32.369 - type: map_at_100 value: 34.123999999999995 - type: map_at_1000 value: 34.317 - type: map_at_3 value: 28.71 - type: map_at_5 value: 30.607 - type: mrr_at_1 value: 40.894999999999996 - type: mrr_at_10 value: 48.842 - type: mrr_at_100 value: 49.599 - type: mrr_at_1000 value: 49.647000000000006 - type: mrr_at_3 value: 46.785 - type: mrr_at_5 value: 47.672 - type: ndcg_at_1 value: 40.894999999999996 - type: ndcg_at_10 value: 39.872 - type: ndcg_at_100 value: 46.126 - type: ndcg_at_1000 value: 49.476 - type: ndcg_at_3 value: 37.153000000000006 - type: ndcg_at_5 value: 37.433 - type: precision_at_1 value: 40.894999999999996 - type: precision_at_10 value: 10.818 - type: precision_at_100 value: 1.73 - type: precision_at_1000 value: 0.231 - type: precision_at_3 value: 25.051000000000002 - type: precision_at_5 value: 17.531 - type: recall_at_1 value: 20.191 - type: recall_at_10 value: 45.768 - type: recall_at_100 value: 68.82000000000001 - type: recall_at_1000 value: 89.133 - type: recall_at_3 value: 33.296 - type: recall_at_5 value: 38.022 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.257 - type: map_at_10 value: 61.467000000000006 - type: map_at_100 value: 62.364 - type: map_at_1000 value: 62.424 - type: map_at_3 value: 58.228 - type: map_at_5 value: 60.283 - type: mrr_at_1 value: 78.515 - type: mrr_at_10 value: 84.191 - type: mrr_at_100 value: 84.378 - type: mrr_at_1000 value: 84.385 - type: mrr_at_3 value: 83.284 - type: mrr_at_5 value: 83.856 - type: ndcg_at_1 value: 78.515 - type: ndcg_at_10 value: 69.78999999999999 - type: ndcg_at_100 value: 72.886 - type: ndcg_at_1000 value: 74.015 - type: ndcg_at_3 value: 65.23 - type: ndcg_at_5 value: 67.80199999999999 - type: precision_at_1 value: 78.515 - type: precision_at_10 value: 14.519000000000002 - type: precision_at_100 value: 1.694 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 41.702 - type: precision_at_5 value: 27.046999999999997 - type: recall_at_1 value: 39.257 - type: recall_at_10 value: 72.59299999999999 - type: recall_at_100 value: 84.679 - type: recall_at_1000 value: 92.12 - type: recall_at_3 value: 62.552 - type: recall_at_5 value: 67.616 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.5152 - type: ap value: 87.64584669595709 - type: f1 value: 91.50605576428437 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.926000000000002 - type: map_at_10 value: 34.049 - type: map_at_100 value: 35.213 - type: map_at_1000 value: 35.265 - type: map_at_3 value: 30.309 - type: map_at_5 value: 32.407000000000004 - type: mrr_at_1 value: 22.55 - type: mrr_at_10 value: 34.657 - type: mrr_at_100 value: 35.760999999999996 - type: mrr_at_1000 value: 35.807 - type: mrr_at_3 value: 30.989 - type: mrr_at_5 value: 33.039 - type: ndcg_at_1 value: 22.55 - type: ndcg_at_10 value: 40.842 - type: ndcg_at_100 value: 46.436 - type: ndcg_at_1000 value: 47.721999999999994 - type: ndcg_at_3 value: 33.209 - type: ndcg_at_5 value: 36.943 - type: precision_at_1 value: 22.55 - type: precision_at_10 value: 6.447 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.136000000000001 - type: precision_at_5 value: 10.381 - type: recall_at_1 value: 21.926000000000002 - type: recall_at_10 value: 61.724999999999994 - type: recall_at_100 value: 87.604 - type: recall_at_1000 value: 97.421 - type: recall_at_3 value: 40.944 - type: recall_at_5 value: 49.915 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.54765161878704 - type: f1 value: 93.3298945415573 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.71591427268582 - type: f1 value: 59.32113870474471 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.83053127101547 - type: f1 value: 73.60757944876475 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.72562205783457 - type: f1 value: 78.63761662505502 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.37935633767996 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.55270546130387 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.462692753143834 - type: mrr value: 31.497569753511563 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.646 - type: map_at_10 value: 12.498 - type: map_at_100 value: 15.486 - type: map_at_1000 value: 16.805999999999997 - type: map_at_3 value: 9.325 - type: map_at_5 value: 10.751 - type: mrr_at_1 value: 43.034 - type: mrr_at_10 value: 52.662 - type: mrr_at_100 value: 53.189 - type: mrr_at_1000 value: 53.25 - type: mrr_at_3 value: 50.929 - type: mrr_at_5 value: 51.92 - type: ndcg_at_1 value: 41.796 - type: ndcg_at_10 value: 33.477000000000004 - type: ndcg_at_100 value: 29.996000000000002 - type: ndcg_at_1000 value: 38.864 - type: ndcg_at_3 value: 38.940000000000005 - type: ndcg_at_5 value: 36.689 - type: precision_at_1 value: 43.034 - type: precision_at_10 value: 24.799 - type: precision_at_100 value: 7.432999999999999 - type: precision_at_1000 value: 1.9929999999999999 - type: precision_at_3 value: 36.842000000000006 - type: precision_at_5 value: 32.135999999999996 - type: recall_at_1 value: 5.646 - type: recall_at_10 value: 15.963 - type: recall_at_100 value: 29.492 - type: recall_at_1000 value: 61.711000000000006 - type: recall_at_3 value: 10.585 - type: recall_at_5 value: 12.753999999999998 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 27.602 - type: map_at_10 value: 41.545 - type: map_at_100 value: 42.644999999999996 - type: map_at_1000 value: 42.685 - type: map_at_3 value: 37.261 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 31.141000000000002 - type: mrr_at_10 value: 44.139 - type: mrr_at_100 value: 44.997 - type: mrr_at_1000 value: 45.025999999999996 - type: mrr_at_3 value: 40.503 - type: mrr_at_5 value: 42.64 - type: ndcg_at_1 value: 31.141000000000002 - type: ndcg_at_10 value: 48.995 - type: ndcg_at_100 value: 53.788000000000004 - type: ndcg_at_1000 value: 54.730000000000004 - type: ndcg_at_3 value: 40.844 - type: ndcg_at_5 value: 44.955 - type: precision_at_1 value: 31.141000000000002 - type: precision_at_10 value: 8.233 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 18.579 - type: precision_at_5 value: 13.533999999999999 - type: recall_at_1 value: 27.602 - type: recall_at_10 value: 69.216 - type: recall_at_100 value: 90.252 - type: recall_at_1000 value: 97.27 - type: recall_at_3 value: 47.987 - type: recall_at_5 value: 57.438 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.949 - type: map_at_10 value: 84.89999999999999 - type: map_at_100 value: 85.531 - type: map_at_1000 value: 85.548 - type: map_at_3 value: 82.027 - type: map_at_5 value: 83.853 - type: mrr_at_1 value: 81.69999999999999 - type: mrr_at_10 value: 87.813 - type: mrr_at_100 value: 87.917 - type: mrr_at_1000 value: 87.91799999999999 - type: mrr_at_3 value: 86.938 - type: mrr_at_5 value: 87.53999999999999 - type: ndcg_at_1 value: 81.75 - type: ndcg_at_10 value: 88.55499999999999 - type: ndcg_at_100 value: 89.765 - type: ndcg_at_1000 value: 89.871 - type: ndcg_at_3 value: 85.905 - type: ndcg_at_5 value: 87.41 - type: precision_at_1 value: 81.75 - type: precision_at_10 value: 13.403 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.597 - type: precision_at_5 value: 24.69 - type: recall_at_1 value: 70.949 - type: recall_at_10 value: 95.423 - type: recall_at_100 value: 99.509 - type: recall_at_1000 value: 99.982 - type: recall_at_3 value: 87.717 - type: recall_at_5 value: 92.032 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 51.76962893449579 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.32897690686379 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.478 - type: map_at_10 value: 11.994 - type: map_at_100 value: 13.977 - type: map_at_1000 value: 14.295 - type: map_at_3 value: 8.408999999999999 - type: map_at_5 value: 10.024 - type: mrr_at_1 value: 22.1 - type: mrr_at_10 value: 33.526 - type: mrr_at_100 value: 34.577000000000005 - type: mrr_at_1000 value: 34.632000000000005 - type: mrr_at_3 value: 30.217 - type: mrr_at_5 value: 31.962000000000003 - type: ndcg_at_1 value: 22.1 - type: ndcg_at_10 value: 20.191 - type: ndcg_at_100 value: 27.954 - type: ndcg_at_1000 value: 33.491 - type: ndcg_at_3 value: 18.787000000000003 - type: ndcg_at_5 value: 16.378999999999998 - type: precision_at_1 value: 22.1 - type: precision_at_10 value: 10.69 - type: precision_at_100 value: 2.1919999999999997 - type: precision_at_1000 value: 0.35200000000000004 - type: precision_at_3 value: 17.732999999999997 - type: precision_at_5 value: 14.499999999999998 - type: recall_at_1 value: 4.478 - type: recall_at_10 value: 21.657 - type: recall_at_100 value: 44.54 - type: recall_at_1000 value: 71.542 - type: recall_at_3 value: 10.778 - type: recall_at_5 value: 14.687 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.82325259156718 - type: cos_sim_spearman value: 79.2463589100662 - type: euclidean_pearson value: 80.48318380496771 - type: euclidean_spearman value: 79.34451935199979 - type: manhattan_pearson value: 80.39041824178759 - type: manhattan_spearman value: 79.23002892700211 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.74130231431258 - type: cos_sim_spearman value: 78.36856568042397 - type: euclidean_pearson value: 82.48301631890303 - type: euclidean_spearman value: 78.28376980722732 - type: manhattan_pearson value: 82.43552075450525 - type: manhattan_spearman value: 78.22702443947126 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 79.96138619461459 - type: cos_sim_spearman value: 81.85436343502379 - type: euclidean_pearson value: 81.82895226665367 - type: euclidean_spearman value: 82.22707349602916 - type: manhattan_pearson value: 81.66303369445873 - type: manhattan_spearman value: 82.05030197179455 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.05481244198648 - type: cos_sim_spearman value: 80.85052504637808 - type: euclidean_pearson value: 80.86728419744497 - type: euclidean_spearman value: 81.033786401512 - type: manhattan_pearson value: 80.90107531061103 - type: manhattan_spearman value: 81.11374116827795 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 84.615220756399 - type: cos_sim_spearman value: 86.46858500002092 - type: euclidean_pearson value: 86.08307800247586 - type: euclidean_spearman value: 86.72691443870013 - type: manhattan_pearson value: 85.96155594487269 - type: manhattan_spearman value: 86.605909505275 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.14363913634436 - type: cos_sim_spearman value: 84.48430226487102 - type: euclidean_pearson value: 83.75303424801902 - type: euclidean_spearman value: 84.56762380734538 - type: manhattan_pearson value: 83.6135447165928 - type: manhattan_spearman value: 84.39898212616731 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 85.09909252554525 - type: cos_sim_spearman value: 85.70951402743276 - type: euclidean_pearson value: 87.1991936239908 - type: euclidean_spearman value: 86.07745840612071 - type: manhattan_pearson value: 87.25039137549952 - type: manhattan_spearman value: 85.99938746659761 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.529332093413615 - type: cos_sim_spearman value: 65.38177340147439 - type: euclidean_pearson value: 66.35278011412136 - type: euclidean_spearman value: 65.47147267032997 - type: manhattan_pearson value: 66.71804682408693 - type: manhattan_spearman value: 65.67406521423597 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 82.45802942885662 - type: cos_sim_spearman value: 84.8853341842566 - type: euclidean_pearson value: 84.60915021096707 - type: euclidean_spearman value: 85.11181242913666 - type: manhattan_pearson value: 84.38600521210364 - type: manhattan_spearman value: 84.89045417981723 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.92793380635129 - type: mrr value: 95.85834191226348 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 55.74400000000001 - type: map_at_10 value: 65.455 - type: map_at_100 value: 66.106 - type: map_at_1000 value: 66.129 - type: map_at_3 value: 62.719 - type: map_at_5 value: 64.441 - type: mrr_at_1 value: 58.667 - type: mrr_at_10 value: 66.776 - type: mrr_at_100 value: 67.363 - type: mrr_at_1000 value: 67.384 - type: mrr_at_3 value: 64.889 - type: mrr_at_5 value: 66.122 - type: ndcg_at_1 value: 58.667 - type: ndcg_at_10 value: 69.904 - type: ndcg_at_100 value: 72.807 - type: ndcg_at_1000 value: 73.423 - type: ndcg_at_3 value: 65.405 - type: ndcg_at_5 value: 67.86999999999999 - type: precision_at_1 value: 58.667 - type: precision_at_10 value: 9.3 - type: precision_at_100 value: 1.08 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.444 - type: precision_at_5 value: 17 - type: recall_at_1 value: 55.74400000000001 - type: recall_at_10 value: 82.122 - type: recall_at_100 value: 95.167 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 70.14399999999999 - type: recall_at_5 value: 76.417 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86534653465347 - type: cos_sim_ap value: 96.54142419791388 - type: cos_sim_f1 value: 93.07535641547861 - type: cos_sim_precision value: 94.81327800829875 - type: cos_sim_recall value: 91.4 - type: dot_accuracy value: 99.86435643564356 - type: dot_ap value: 96.53682260449868 - type: dot_f1 value: 92.98515104966718 - type: dot_precision value: 95.27806925498426 - type: dot_recall value: 90.8 - type: euclidean_accuracy value: 99.86336633663366 - type: euclidean_ap value: 96.5228676185697 - type: euclidean_f1 value: 92.9735234215886 - type: euclidean_precision value: 94.70954356846472 - type: euclidean_recall value: 91.3 - type: manhattan_accuracy value: 99.85841584158416 - type: manhattan_ap value: 96.50392760934032 - type: manhattan_f1 value: 92.84642321160581 - type: manhattan_precision value: 92.8928928928929 - type: manhattan_recall value: 92.80000000000001 - type: max_accuracy value: 99.86534653465347 - type: max_ap value: 96.54142419791388 - type: max_f1 value: 93.07535641547861 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 61.08285408766616 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.640675309010604 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.20333913710715 - type: mrr value: 54.088813555725324 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.79465221925075 - type: cos_sim_spearman value: 30.530816059163634 - type: dot_pearson value: 31.364837244718043 - type: dot_spearman value: 30.79726823684003 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22599999999999998 - type: map_at_10 value: 1.735 - type: map_at_100 value: 8.978 - type: map_at_1000 value: 20.851 - type: map_at_3 value: 0.613 - type: map_at_5 value: 0.964 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 92.867 - type: mrr_at_100 value: 92.867 - type: mrr_at_1000 value: 92.867 - type: mrr_at_3 value: 92.667 - type: mrr_at_5 value: 92.667 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 73.164 - type: ndcg_at_100 value: 51.878 - type: ndcg_at_1000 value: 44.864 - type: ndcg_at_3 value: 79.184 - type: ndcg_at_5 value: 76.39 - type: precision_at_1 value: 88 - type: precision_at_10 value: 76.2 - type: precision_at_100 value: 52.459999999999994 - type: precision_at_1000 value: 19.692 - type: precision_at_3 value: 82.667 - type: precision_at_5 value: 80 - type: recall_at_1 value: 0.22599999999999998 - type: recall_at_10 value: 1.942 - type: recall_at_100 value: 12.342 - type: recall_at_1000 value: 41.42 - type: recall_at_3 value: 0.637 - type: recall_at_5 value: 1.034 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.567 - type: map_at_10 value: 13.116 - type: map_at_100 value: 19.39 - type: map_at_1000 value: 20.988 - type: map_at_3 value: 7.109 - type: map_at_5 value: 9.950000000000001 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 57.404999999999994 - type: mrr_at_100 value: 58.021 - type: mrr_at_1000 value: 58.021 - type: mrr_at_3 value: 54.762 - type: mrr_at_5 value: 56.19 - type: ndcg_at_1 value: 38.775999999999996 - type: ndcg_at_10 value: 30.359 - type: ndcg_at_100 value: 41.284 - type: ndcg_at_1000 value: 52.30200000000001 - type: ndcg_at_3 value: 36.744 - type: ndcg_at_5 value: 34.326 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 26.122 - type: precision_at_100 value: 8.082 - type: precision_at_1000 value: 1.559 - type: precision_at_3 value: 40.136 - type: precision_at_5 value: 35.510000000000005 - type: recall_at_1 value: 3.567 - type: recall_at_10 value: 19.045 - type: recall_at_100 value: 49.979 - type: recall_at_1000 value: 84.206 - type: recall_at_3 value: 8.52 - type: recall_at_5 value: 13.103000000000002 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 68.8394 - type: ap value: 13.454399712443099 - type: f1 value: 53.04963076364322 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.546123372948514 - type: f1 value: 60.86952793277713 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.10042955060234 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.03308100375514 - type: cos_sim_ap value: 71.08284605869684 - type: cos_sim_f1 value: 65.42539436255494 - type: cos_sim_precision value: 64.14807302231237 - type: cos_sim_recall value: 66.75461741424802 - type: dot_accuracy value: 84.68736961316088 - type: dot_ap value: 69.20524036530992 - type: dot_f1 value: 63.54893953365829 - type: dot_precision value: 63.45698500394633 - type: dot_recall value: 63.641160949868066 - type: euclidean_accuracy value: 85.07480479227513 - type: euclidean_ap value: 71.14592761009864 - type: euclidean_f1 value: 65.43814432989691 - type: euclidean_precision value: 63.95465994962216 - type: euclidean_recall value: 66.99208443271768 - type: manhattan_accuracy value: 85.06288370984085 - type: manhattan_ap value: 71.07289742593868 - type: manhattan_f1 value: 65.37585421412301 - type: manhattan_precision value: 62.816147859922175 - type: manhattan_recall value: 68.15303430079156 - type: max_accuracy value: 85.07480479227513 - type: max_ap value: 71.14592761009864 - type: max_f1 value: 65.43814432989691 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.79058485659952 - type: cos_sim_ap value: 83.7183187008759 - type: cos_sim_f1 value: 75.86921142180798 - type: cos_sim_precision value: 73.00683371298405 - type: cos_sim_recall value: 78.96519864490298 - type: dot_accuracy value: 87.0085768618776 - type: dot_ap value: 81.87467488474279 - type: dot_f1 value: 74.04188363990559 - type: dot_precision value: 72.10507114191901 - type: dot_recall value: 76.08561749307053 - type: euclidean_accuracy value: 87.8332751193387 - type: euclidean_ap value: 83.83585648120315 - type: euclidean_f1 value: 76.02582177042369 - type: euclidean_precision value: 73.36388371759989 - type: euclidean_recall value: 78.88820449645827 - type: manhattan_accuracy value: 87.87208444910156 - type: manhattan_ap value: 83.8101950642973 - type: manhattan_f1 value: 75.90454195535027 - type: manhattan_precision value: 72.44419564761039 - type: manhattan_recall value: 79.71204188481676 - type: max_accuracy value: 87.87208444910156 - type: max_ap value: 83.83585648120315 - type: max_f1 value: 76.02582177042369 license: mit language: - en --- **Recommend switching to newest BAAI/bge-small-en-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding focus on retrieval-augmented LLMs, consisting of following projects currently: - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: LLM Embedder, BGE Embedding, C-MTEB - **Reranker Model**: BGE Reranker ## News - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | LM-Cocktail | English | | fine-tuned models (Llama and BGE) which can be used to reproduce the results of LM-Cocktail | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
|
23 |
+
"model_explanation_gemini": "Performs text embedding tasks including classification, retrieval, clustering, reranking, and semantic textual similarity across various datasets."
|
24 |
+
}
|
data/model_data_json/BAAI_llm-embedder.json
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "BAAI/llm-embedder",
|
3 |
+
"downloads": 82703,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"bert",
|
9 |
+
"feature-extraction",
|
10 |
+
"arxiv:2310.07554",
|
11 |
+
"arxiv:2309.07597",
|
12 |
+
"license:mit",
|
13 |
+
"text-embeddings-inference",
|
14 |
+
"endpoints_compatible",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- license: mit --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 <span style=\"#FF69B4;\"> **Hiring:** We're seeking experienced NLP researchers and intern students focusing on dense retrieval and retrieval-augmented LLMs. If you're interested, please feel free to reach out to us via email at [email protected].</span> FlagEmbedding can map any text to a low-dimensional dense vector, which can be used for tasks like retrieval, classification, clustering, and semantic search. And it can also be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages in a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from the embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For example, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 documents to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you can also download the models at . ## Frequently asked questions **1. How to fine-tune bge embedding model?** Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - In general, larger hyper-parameter brings better performance. You can expand it by enabling , (df_config.json can refer to ds_config.json, , etc. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples of using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pair data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. For more training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. For more details please refer to ./FlagEmbedding/reranker/README.md ### Our Contributors: <a href=\" <img src=\" /> </a> ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge."
|
18 |
+
}
|
data/model_data_json/Babelscape_t5-base-summarization-claim-extractor.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Babelscape/t5-base-summarization-claim-extractor",
|
3 |
+
"downloads": 631466,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"safetensors",
|
7 |
+
"t5",
|
8 |
+
"text2text-generation",
|
9 |
+
"en",
|
10 |
+
"arxiv:2403.02270",
|
11 |
+
"license:cc-by-nc-sa-4.0",
|
12 |
+
"autotrain_compatible",
|
13 |
+
"text-generation-inference",
|
14 |
+
"endpoints_compatible",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- library_name: transformers language: - en license: - cc-by-nc-sa-4.0 widget: - text: \"A major tech company has unveiled its first fully autonomous electric vehicle, boasting a range of 500 miles per charge and advanced safety features designed to revolutionize the transportation industry.\" - text: \"A new global initiative to clean up ocean plastic aims to remove 50% of floating debris within a decade, using innovative autonomous vessels powered by renewable energy.\" - text: \"A historic peace agreement was signed between two long-standing rival nations, marking a turning point in diplomatic relations and promising economic and social cooperation for years to come.\" --- # Model Card: T5-base-summarization-claim-extractor ## Model Description **Model Name:** T5-base-summarization-claim-extractor **Authors:** Alessandro Scirè, Karim Ghonim, and Roberto Navigli **Contact:** [email protected], [email protected] **Language:** English **Primary Use:** Extraction of atomic claims from a summary ### Overview The T5-base-summarization-claim-extractor is a model developed for the task of extracting atomic claims from summaries. The model is based on the T5 architecture which is then fine-tuned specifically for claim extraction. This model was introduced as part of the research presented in the paper \"FENICE: Factuality Evaluation of summarization based on Natural Language Inference and Claim Extraction\" by Alessandro Scirè, Karim Ghonim, and Roberto Navigli. FENICE leverages Natural Language Inference (NLI) and Claim Extraction to evaluate the factuality of summaries. ArXiv version. ### Intended Use This model is designed to: - Extract atomic claims from summaries. - Serve as a component in pipelines for factuality evaluation of summaries. ## Example Code **Note**: The model outputs the claims in a single string. **Kindly remember to split the string into sentences** in order to retrieve the singular claims. ### Training For details regarding the training process, please checkout our paper( (section 4.1). ### Performance | <div style=\"width: 250px\">Model</div> | easiness<sub>P</sub> | easiness<sub>R</sub> | easiness<sub>F1</sub> | |:-------------------------------------:|:--------------------:|:--------------------:|:---------------------:| | GPT-3.5 | 80.1 | 70.9 | 74.9 | | t5-base-summarization-claim-extractor | 79.2 | 68.8 | 73.4 | **Table 1:** Easiness Precision (easiness<sub>P</sub>), Recall (easiness<sub>R</sub>), and F1 score (easiness<sub>F1</sub>) results for the LLM-based claim extractor, namely **GPT-3.5**, and **t5-base-summarization-claim-extractor**, assessed on ROSE (Liu et al., 2023b). Further details on the model's performance and the metrics used can be found in the paper (section 4.1). ### Main Repository For more details about FENICE, check out the GitHub repository: Babelscape/FENICE ### Citation If you use this model in your work, please cite the following paper: ### Limitations - The model is specifically designed for extracting claims from summaries and may not perform well on other types of texts. - The model is currently available only in English and may not generalize well to other languages. ### Ethical Considerations Users should be aware that while this model extracts claims that can be evaluated for factuality, it does not determine the truthfulness of those claims. Therefore, it should be used in conjunction with other tools or human judgment when evaluating the reliability of summaries. ### Acknowledgments This work was made possible thanks to the support of Babelscape and Sapienza NLP.",
|
18 |
+
"model_explanation_gemini": "Extracts atomic claims from summaries to support factuality evaluation pipelines."
|
19 |
+
}
|
data/model_data_json/Babelscape_wikineural-multilingual-ner.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Babelscape/wikineural-multilingual-ner",
|
3 |
+
"downloads": 263863,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tensorboard",
|
8 |
+
"safetensors",
|
9 |
+
"bert",
|
10 |
+
"token-classification",
|
11 |
+
"named-entity-recognition",
|
12 |
+
"sequence-tagger-model",
|
13 |
+
"de",
|
14 |
+
"en",
|
15 |
+
"es",
|
16 |
+
"fr",
|
17 |
+
"it",
|
18 |
+
"nl",
|
19 |
+
"pl",
|
20 |
+
"pt",
|
21 |
+
"ru",
|
22 |
+
"multilingual",
|
23 |
+
"dataset:Babelscape/wikineural",
|
24 |
+
"license:cc-by-nc-sa-4.0",
|
25 |
+
"autotrain_compatible",
|
26 |
+
"endpoints_compatible",
|
27 |
+
"region:us"
|
28 |
+
],
|
29 |
+
"description": "--- annotations_creators: - machine-generated language_creators: - machine-generated widget: - text: My name is Wolfgang and I live in Berlin. - text: George Washington went to Washington. - text: Mi nombre es Sarah y vivo en Londres. - text: Меня зовут Симона, и я живу в Риме. tags: - named-entity-recognition - sequence-tagger-model datasets: - Babelscape/wikineural language: - de - en - es - fr - it - nl - pl - pt - ru - multilingual license: - cc-by-nc-sa-4.0 pretty_name: wikineural-dataset source_datasets: - original task_categories: - structure-prediction task_ids: - named-entity-recognition --- # WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER This is the model card for the EMNLP 2021 paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. We fine-tuned a multilingual language model (mBERT) for 3 epochs on our WikiNEuRal dataset for Named Entity Recognition (NER). The resulting multilingual NER model supports the 9 languages covered by WikiNEuRal (de, en, es, fr, it, nl, pl, pt, ru), and it was trained on all 9 languages jointly. **If you use the model, please reference this work in your paper**: The original repository for the paper can be found at ## How to use You can use this model with Transformers *pipeline* for NER. ## Limitations and bias This model is trained on WikiNEuRal, a state-of-the-art dataset for Multilingual NER automatically derived from Wikipedia. Therefore, it might not generalize well to all textual genres (e.g. news). On the other hand, models trained only on news articles (e.g. only on CoNLL03) have been proven to obtain much lower scores on encyclopedic articles. To obtain more robust systems, we encourage you to train a system on the combination of WikiNEuRal with other datasets (e.g. WikiNEuRal + CoNLL). ## Licensing Information Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright of the dataset contents and models belongs to the original copyright holders.",
|
30 |
+
"model_explanation_gemini": "Performs multilingual named entity recognition (NER) across nine languages using a model fine-tuned on the WikiNEuRal dataset."
|
31 |
+
}
|
data/model_data_json/Bingsu_adetailer.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Bingsu/adetailer",
|
3 |
+
"downloads": 19922172,
|
4 |
+
"tags": [
|
5 |
+
"ultralytics",
|
6 |
+
"pytorch",
|
7 |
+
"dataset:wider_face",
|
8 |
+
"dataset:skytnt/anime-segmentation",
|
9 |
+
"doi:10.57967/hf/3633",
|
10 |
+
"license:apache-2.0",
|
11 |
+
"region:us"
|
12 |
+
],
|
13 |
+
"description": "--- license: apache-2.0 library_name: ultralytics datasets: - wider_face - skytnt/anime-segmentation tags: - pytorch --- # YOLOv8 Detection Model ## Datasets ### Face - Anime Face CreateML - xml2txt - AN - wider face ### Hand - AnHDet - hand-detection-fuao9 ### Person - coco2017 (only person) - AniSeg - skytnt/anime-segmentation ### deepfashion2 - deepfashion2 | id | label | | --- | --------------------- | | 0 | short_sleeved_shirt | | 1 | long_sleeved_shirt | | 2 | short_sleeved_outwear | | 3 | long_sleeved_outwear | | 4 | vest | | 5 | sling | | 6 | shorts | | 7 | trousers | | 8 | skirt | | 9 | short_sleeved_dress | | 10 | long_sleeved_dress | | 11 | vest_dress | | 12 | sling_dress | ## Info | Model | Target | mAP 50 | mAP 50-95 | | --------------------------- | --------------------- | ----------------------------- | ----------------------------- | | face_yolov8n.pt | 2D / realistic face | 0.660 | 0.366 | | face_yolov8n_v2.pt | 2D / realistic face | 0.669 | 0.372 | | face_yolov8s.pt | 2D / realistic face | 0.713 | 0.404 | | face_yolov8m.pt | 2D / realistic face | 0.737 | 0.424 | | face_yolov9c.pt | 2D / realistic face | 0.748 | 0.433 | | hand_yolov8n.pt | 2D / realistic hand | 0.767 | 0.505 | | hand_yolov8s.pt | 2D / realistic hand | 0.794 | 0.527 | | hand_yolov9c.pt | 2D / realistic hand | 0.810 | 0.550 | | person_yolov8n-seg.pt | 2D / realistic person | 0.782 (bbox)<br/>0.761 (mask) | 0.555 (bbox)<br/>0.460 (mask) | | person_yolov8s-seg.pt | 2D / realistic person | 0.824 (bbox)<br/>0.809 (mask) | 0.605 (bbox)<br/>0.508 (mask) | | person_yolov8m-seg.pt | 2D / realistic person | 0.849 (bbox)<br/>0.831 (mask) | 0.636 (bbox)<br/>0.533 (mask) | | deepfashion2_yolov8s-seg.pt | realistic clothes | 0.849 (bbox)<br/>0.840 (mask) | 0.763 (bbox)<br/>0.675 (mask) | ## Usage !image ## Unsafe files !image Since is classified as a dangerous pickle function, any segmentation model that uses it is classified as unsafe. All models were created and saved using the official ultralytics library, so it's okay to use files downloaded from a trusted source. See also:",
|
14 |
+
"model_explanation_gemini": "Detects and segments faces, hands, persons, and clothing items in 2D images with various YOLOv8 model variants."
|
15 |
+
}
|
data/model_data_json/Bingsu_yolo-world-mirror.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Bingsu/yolo-world-mirror",
|
3 |
+
"downloads": 3345238,
|
4 |
+
"tags": [
|
5 |
+
"ultralytics",
|
6 |
+
"license:agpl-3.0",
|
7 |
+
"region:us"
|
8 |
+
],
|
9 |
+
"description": "--- license: agpl-3.0 tags: - ultralytics --- # YOLO World Mirror model weights for ultralytics yolo models",
|
10 |
+
"model_explanation_gemini": "Mirror model weights for YOLO World in Ultralytics YOLO models."
|
11 |
+
}
|
data/model_data_json/ByteDance_AnimateDiff-Lightning.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "ByteDance/AnimateDiff-Lightning",
|
3 |
+
"downloads": 132463,
|
4 |
+
"tags": [
|
5 |
+
"diffusers",
|
6 |
+
"text-to-video",
|
7 |
+
"stable-diffusion",
|
8 |
+
"animatediff",
|
9 |
+
"arxiv:2403.12706",
|
10 |
+
"license:creativeml-openrail-m",
|
11 |
+
"region:us"
|
12 |
+
],
|
13 |
+
"description": "--- license: creativeml-openrail-m tags: - text-to-video - stable-diffusion - animatediff library_name: diffusers inference: false --- # AnimateDiff-Lightning <video src=' width=\"100%\" autoplay muted loop playsinline style='margin:0'></video> <video src=' width=\"100%\" autoplay muted loop playsinline style='margin:0'></video> AnimateDiff-Lightning is a lightning-fast text-to-video generation model. It can generate videos more than ten times faster than the original AnimateDiff. For more information, please refer to our research paper: AnimateDiff-Lightning: Cross-Model Diffusion Distillation. We release the model as part of the research. Our models are distilled from AnimateDiff SD1.5 v2. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models. The generation quality of our 2-step, 4-step, and 8-step model is great. Our 1-step model is only provided for research purposes. ## Demo Try AnimateDiff-Lightning using our text-to-video generation demo. ## Recommendation AnimateDiff-Lightning produces the best results when used with stylized base models. We recommend using the following base models: Realistic - epiCRealism - Realistic Vision - DreamShaper - AbsoluteReality - MajicMix Realistic Anime & Cartoon - ToonYou - IMP - Mistoon Anime - DynaVision - RCNZ Cartoon 3d - MajicMix Reverie Additionally, feel free to explore different settings. We find using 3 inference steps on the 2-step model produces great results. We find certain base models produces better results with CFG. We also recommend using Motion LoRAs as they produce stronger motion. We use Motion LoRAs with strength 0.7~0.8 to avoid watermark. ## Diffusers Usage ## ComfyUI Usage 1. Download animatediff_lightning_workflow.json and import it in ComfyUI. 1. Install nodes. You can install them manually or use ComfyUI-Manager. * ComfyUI-AnimateDiff-Evolved * ComfyUI-VideoHelperSuite 1. Download your favorite base model checkpoint and put them under 1. Download AnimateDiff-Lightning checkpoint and put them under !ComfyUI Workflow ## Video-to-Video Generation AnimateDiff-Lightning is great for video-to-video generation. We provide the simplist comfyui workflow using ControlNet. 1. Download animatediff_lightning_v2v_openpose_workflow.json and import it in ComfyUI. 1. Install nodes. You can install them manually or use ComfyUI-Manager. * ComfyUI-AnimateDiff-Evolved * ComfyUI-VideoHelperSuite * ComfyUI-Advanced-ControlNet * comfyui_controlnet_aux 1. Download your favorite base model checkpoint and put them under 1. Download AnimateDiff-Lightning checkpoint and put them under 1. Download ControlNet OpenPose checkpoint to 1. Upload your video and run the pipeline. Additional notes: 1. Video shouldn't be too long or too high resolution. We used 576x1024 8 second 30fps videos for testing. 1. Set the frame rate to match your input video. This allows audio to match with the output video. 1. DWPose will download checkpoint itself on its first run. 1. DWPose may get stuck in UI, but the pipeline is actually still running in the background. Check ComfyUI log and your output folder. !ComfyUI OpenPose Workflow # Cite Our Work",
|
14 |
+
"model_explanation_gemini": "Generates videos from text inputs significantly faster than the original AnimateDiff model, optimized for speed and quality with distilled checkpoints for various step configurations."
|
15 |
+
}
|
data/model_data_json/ByteDance_Hyper-SD.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "ByteDance/Hyper-SD",
|
3 |
+
"downloads": 113274,
|
4 |
+
"tags": [
|
5 |
+
"diffusers",
|
6 |
+
"lora",
|
7 |
+
"text-to-image",
|
8 |
+
"stable-diffusion",
|
9 |
+
"flux",
|
10 |
+
"arxiv:2404.13686",
|
11 |
+
"base_model:black-forest-labs/FLUX.1-dev",
|
12 |
+
"base_model:adapter:black-forest-labs/FLUX.1-dev",
|
13 |
+
"region:us"
|
14 |
+
],
|
15 |
+
"description": "--- library_name: diffusers inference: false tags: - lora - text-to-image - stable-diffusion - flux base_model: black-forest-labs/FLUX.1-dev --- # Hyper-SD Official Repository of the paper: *Hyper-SD*. Project Page: ## News🔥🔥🔥 * Aug.26, 2024. 💥💥💥 Our 8-steps and 16-steps **FLUX.1-dev-related LoRAs** are available now! We recommend LoRA scales around 0.125 that is adaptive with training and guidance scale could be kept on 3.5. Lower step LoRAs would be coming soon. 💥💥💥 * Aug.19, 2024. SD3-related CFG LoRAs are available now! We recommend setting guidance scale to 3.0/5.0/7.0 at 4/8/16-steps. Don't forget to fuse lora with a relatively small scale (e.g. 0.125 that is adaptive with training) before inference with diffusers. Note that 8-steps and 16-steps LoRA can also inference on a little bit smaller steps like 6-steps and 12-steps, respectively. Hope to hear your feedback, FLUX-related models will be coming next week. * May.13, 2024. The 12-Steps CFG-Preserved Hyper-SDXL-12steps-CFG-LoRA and Hyper-SD15-12steps-CFG-LoRA is also available now(support 5~8 guidance scales), this could be more practical with better trade-off between performance and speed. Enjoy! * Apr.30, 2024. Our 8-Steps CFG-Preserved Hyper-SDXL-8steps-CFG-LoRA and Hyper-SD15-8steps-CFG-LoRA is available now(support 5~8 guidance scales), we strongly recommend making the 8-step CFGLora a standard configuration for all SDXL and SD15 models!!! * Apr.28, 2024. ComfyUI workflows on 1-Step Unified LoRA 🥰 with TCDScheduler to inference on different steps are released! Remember to install ⭕️ ComfyUI-TCD in your folder!!! You're encouraged to adjust the eta parameter to get better results 🌟! * Apr.26, 2024. Thanks to @Pete for contributing to our scribble demo with larger canvas right now 👏. * Apr.24, 2024. The ComfyUI workflow and checkpoint on 1-Step SDXL UNet ✨ is also available! Don't forget ⭕️ to install the custom scheduler in your folder!!! * Apr.23, 2024. ComfyUI workflows on N-Steps LoRAs are released! Worth a try for creators 💥! * Apr.23, 2024. Our technical report 📚 is uploaded to arXiv! Many implementation details are provided and we welcome more discussions👏. * Apr.21, 2024. Hyper-SD ⚡️ is highly compatible and work well with different base models and controlnets. To clarify, we also append the usage example of controlnet here. * Apr.20, 2024. Our checkpoints and two demos 🤗 (i.e. SD15-Scribble and SDXL-T2I) are publicly available on HuggingFace Repo. ## Try our Hugging Face demos: Hyper-SD Scribble demo host on 🤗 scribble Hyper-SDXL One-step Text-to-Image demo host on 🤗 T2I ## Introduction Hyper-SD is one of the new State-of-the-Art diffusion model acceleration techniques. In this repository, we release the models distilled from FLUX.1-dev, SD3-Medium, SDXL Base 1.0 and Stable-Diffusion v1-5。 ## Checkpoints * : Lora checkpoint, for FLUX.1-dev-related models. * : Lora checkpoint, for SD3-related models. * : Lora checkpoint, for SDXL-related models. * : Lora checkpoint, for SD1.5-related models. * : Unet checkpoint distilled from SDXL-Base. ## Text-to-Image Usage ### FLUX.1-dev-related models ### SD3-related models ### SDXL-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take the 2-steps LoRA as an example, you can also use other LoRAs for the corresponding inference steps setting. #### Unified LoRA (support 1 to 8 steps inference) You can flexibly adjust the number of inference steps and eta value to achieve best performance. #### 1-step SDXL Unet Only for the single step inference. ### SD1.5-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take the 2-steps LoRA as an example, you can also use other LoRAs for the corresponding inference steps setting. #### Unified LoRA (support 1 to 8 steps inference) You can flexibly adjust the number of inference steps and eta value to achieve best performance. ## ControlNet Usage ### SDXL-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take Canny Controlnet and 2-steps inference as an example: #### Unified LoRA (support 1 to 8 steps inference) Take Canny Controlnet as an example: ### SD1.5-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take Canny Controlnet and 2-steps inference as an example: #### Unified LoRA (support 1 to 8 steps inference) Take Canny Controlnet as an example: ## Comfyui Usage * : text-to-image workflow * : text-to-image workflow * : text-to-image workflow * **REQUIREMENT / INSTALL** for 1-Step SDXL UNet: Please install our scheduler folder into your to enable sampling from 800 timestep instead of 999. * i.e. making sure the folder exist. * For more details, please refer to our technical report. * : text-to-image workflow * : text-to-image workflow * **REQUIREMENT / INSTALL** for 1-Step Unified LoRAs: Please install the ComfyUI-TCD into your to enable TCDScheduler with support of different inference steps (1~8) using single checkpoint. * i.e. making sure the folder exist. * You're encouraged to adjust the eta parameter in TCDScheduler to get better results. ## Citation",
|
16 |
+
"model_explanation_gemini": "Hyper-SD accelerates text-to-image generation using LoRA-based distilled models for Stable Diffusion variants, enabling high-quality outputs with fewer inference steps."
|
17 |
+
}
|
data/model_data_json/CAMeL-Lab_bert-base-arabic-camelbert-mix-sentiment.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment",
|
3 |
+
"downloads": 119547,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tf",
|
8 |
+
"bert",
|
9 |
+
"text-classification",
|
10 |
+
"ar",
|
11 |
+
"arxiv:2103.06678",
|
12 |
+
"license:apache-2.0",
|
13 |
+
"autotrain_compatible",
|
14 |
+
"endpoints_compatible",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- language: - ar license: apache-2.0 widget: - text: \"أنا بخير\" --- # CAMeLBERT Mix SA Model ## Model description **CAMeLBERT Mix SA Model** is a Sentiment Analysis (SA) model that was built by fine-tuning the CAMeLBERT Mix model. For the fine-tuning, we used the ASTD, ArSAS, and SemEval datasets. Our fine-tuning procedure and the hyperparameters we used can be found in our paper *\"The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models.\"* Our fine-tuning code can be found here. ## Intended uses You can use the CAMeLBERT Mix SA model directly as part of our CAMeL Tools SA component (*recommended*) or as part of the transformers pipeline. #### How to use To use the model with the CAMeL Tools SA component: You can also use the SA model directly with a transformers pipeline: *Note*: to download our models, you would need . Otherwise, you could download the models manually. ## Citation",
|
18 |
+
"model_explanation_gemini": "Fine-tuned for Arabic sentiment analysis using the CAMeLBERT Mix model on ASTD, ArSAS, and SemEval datasets."
|
19 |
+
}
|
data/model_data_json/CIDAS_clipseg-rd64-refined.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "CIDAS/clipseg-rd64-refined",
|
3 |
+
"downloads": 11179865,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"clipseg",
|
9 |
+
"vision",
|
10 |
+
"image-segmentation",
|
11 |
+
"arxiv:2112.10003",
|
12 |
+
"license:apache-2.0",
|
13 |
+
"region:us"
|
14 |
+
],
|
15 |
+
"description": "--- license: apache-2.0 tags: - vision - image-segmentation inference: false --- # CLIPSeg model CLIPSeg model with reduce dimension 64, refined (using a more complex convolution). It was introduced in the paper Image Segmentation Using Text and Image Prompts by Lüddecke et al. and first released in this repository. # Intended use cases This model is intended for zero-shot and one-shot image segmentation. # Usage Refer to the documentation.",
|
16 |
+
"model_explanation_gemini": "Performs zero-shot and one-shot image segmentation using text and image prompts with a refined CLIPSeg architecture."
|
17 |
+
}
|
data/model_data_json/CompVis_stable-diffusion-safety-checker.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "CompVis/stable-diffusion-safety-checker",
|
3 |
+
"downloads": 1331193,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"clip",
|
8 |
+
"arxiv:2103.00020",
|
9 |
+
"arxiv:1910.09700",
|
10 |
+
"endpoints_compatible",
|
11 |
+
"region:us"
|
12 |
+
],
|
13 |
+
"description": "--- tags: - clip --- # Model Card for stable-diffusion-safety-checker # Model Details ## Model Description More information needed - **Developed by:** More information needed - **Shared by [Optional]:** CompVis - **Model type:** Image Identification - **Language(s) (NLP):** More information needed - **License:** More information needed - **Parent Model:** CLIP - **Resources for more information:** - CLIP Paper - Stable Diffusion Model Card # Uses ## Direct Use This model can be used for identifying NSFW image The CLIP model devlopers note in their model card : >The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ## Downstream Use [Optional] More information needed. ## Out-of-Scope Use The model is not intended to be used with transformers but with diffusers. This model should also not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. The CLIP model devlopers note in their model card : > We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from Fairface into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. > We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data More information needed ## Training Procedure ### Preprocessing More information needed ### Speeds, Sizes, Times More information needed # Evaluation ## Testing Data, Factors & Metrics ### Testing Data More information needed ### Factors More information needed ### Metrics More information needed ## Results More information needed # Model Examination More information needed # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications [optional] ## Model Architecture and Objective The CLIP model devlopers note in their model card : > The base model uses a ViT-L/14 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. ## Compute Infrastructure More information needed ### Hardware More information needed ### Software More information needed. # Citation **BibTeX:** More information needed **APA:** More information needed # Glossary [optional] More information needed # More Information [optional] More information needed # Model Card Authors [optional] CompVis in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model. <details> <summary> Click to expand </summary> </details>",
|
14 |
+
"model_explanation_gemini": "Identifies NSFW images using CLIP-based image classification."
|
15 |
+
}
|
data/model_data_json/CompVis_stable-diffusion-v1-4.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "CompVis/stable-diffusion-v1-4",
|
3 |
+
"downloads": 1662989,
|
4 |
+
"tags": [
|
5 |
+
"diffusers",
|
6 |
+
"safetensors",
|
7 |
+
"stable-diffusion",
|
8 |
+
"stable-diffusion-diffusers",
|
9 |
+
"text-to-image",
|
10 |
+
"arxiv:2207.12598",
|
11 |
+
"arxiv:2112.10752",
|
12 |
+
"arxiv:2103.00020",
|
13 |
+
"arxiv:2205.11487",
|
14 |
+
"arxiv:1910.09700",
|
15 |
+
"license:creativeml-openrail-m",
|
16 |
+
"autotrain_compatible",
|
17 |
+
"endpoints_compatible",
|
18 |
+
"diffusers:StableDiffusionPipeline",
|
19 |
+
"region:us"
|
20 |
+
],
|
21 |
+
"description": "--- license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image widget: - text: \"A high tech solarpunk utopia in the Amazon rainforest\" example_title: Amazon rainforest - text: \"A pikachu fine dining with a view to the Eiffel Tower\" example_title: Pikachu in Paris - text: \"A mecha robot in a favela in expressionist style\" example_title: Expressionist robot - text: \"an insect robot preparing a delicious meal\" example_title: Insect robot - text: \"A small cabin on top of a snowy mountain in the style of Disney, artstation\" example_title: Snowy disney cabin extra_gated_prompt: |- This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies: 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content 2. The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) Please read the full license carefully here: extra_gated_heading: Please read the LICENSE to access this model --- # Stable Diffusion v1-4 Model Card Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. For more information about how Stable Diffusion functions, please have a look at 🤗's Stable Diffusion with 🧨Diffusers blog. The **Stable-Diffusion-v1-4** checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. This weights here are intended to be used with the 🧨 Diffusers library. If you are looking for the weights to be loaded into the CompVis Stable Diffusion codebase, come here ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based. - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper. - **Resources for more information:** GitHub Repository, Paper. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples We recommend using 🤗's Diffusers library to run Stable Diffusion. ### PyTorch Running the pipeline with the default PNDM scheduler: **Note**: If you are limited by GPU memory and have less than 4GB of GPU RAM available, please make sure to load the StableDiffusionPipeline in float16 precision instead of the default float32 precision as done above. You can do so by telling diffusers to expect the weights to be in float16 precision: To swap out the noise scheduler, pass it to : ### JAX/Flax To use StableDiffusion on TPUs and GPUs for faster inference you can leverage JAX/Flax. Running the pipeline with default PNDMScheduler **Note**: If you are limited by TPU memory, please make sure to load the in precision instead of the default precision as done above. You can do so by telling diffusers to load the weights from \"bf16\" branch. # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is taken from the DALLE-MINI model card, but applies in the same way to Stable Diffusion v1_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a large-scale dataset LAION-5B which contains adult material and is not fit for product use without additional safety mechanisms and considerations. - No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at to possibly assist in the detection of memorized images. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on subsets of LAION-2B(en), which consists of images that are primarily limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. ### Safety Module The intended use of this model is with the Safety Checker in Diffusers. This checker works by checking model outputs against known hard-coded NSFW concepts. The concepts are intentionally hidden to reduce the likelihood of reverse-engineering this filter. Specifically, the checker compares the class probability of harmful concepts in the embedding space of the *after generation* of the images. The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-2B (en) and subsets thereof (see next section) **Training Procedure** Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through a ViT-L/14 text-encoder. - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We currently provide four checkpoints, which were trained as follows. - []( 237,000 steps at resolution on laion2B-en. 194,000 steps at resolution on laion-high-resolution (170M examples from LAION-5B with resolution ). - []( Resumed from . 515,000 steps at resolution on \"laion-improved-aesthetics\" (a subset of laion2B-en, filtered to images with an original size , estimated aesthetics score , and an estimated watermark probability . The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). - []( Resumed from . 195,000 steps at resolution on \"laion-improved-aesthetics\" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling. - []( Resumed from .225,000 steps at resolution on \"laion-aesthetics v2 5+\" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 2 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 150000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq. ## Citation *This model card was written by: Robin Rombach and Patrick Esser and is based on the DALL-E Mini model card.*",
|
22 |
+
"model_explanation_gemini": "Generates photo-realistic images from text prompts using latent diffusion."
|
23 |
+
}
|
data/model_data_json/Danswer_intent-model.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Danswer/intent-model",
|
3 |
+
"downloads": 193706,
|
4 |
+
"tags": [
|
5 |
+
"keras",
|
6 |
+
"tf",
|
7 |
+
"distilbert",
|
8 |
+
"en",
|
9 |
+
"license:mit",
|
10 |
+
"region:us"
|
11 |
+
],
|
12 |
+
"description": "--- license: mit language: - en library_name: keras --- # Model Card for Model ID <!-- Provide a quick summary of what the model is/does. --> This model is used to classify the user-intent for the Danswer project, visit ## Model Details Multiclass classifier on top of distilbert-base-uncased ### Model Description <!-- Provide a longer summary of what this model is. --> Classifies user intent of queries into categories including: 0: Keyword Search 1: Semantic Search 2: Direct Question Answering - **Developed by:** [DanswerAI] - **License:** [MIT] - **Finetuned from model [optional]:** [distilbert-base-uncased] ### Model Sources [optional] <!-- Provide the basic links for the model. --> - **Repository:** [ - **Demo [optional]:** [Upcoming!] ## Uses <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> This model is intended to be used in the Danswer Question-Answering System ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> This model has a very small dataset maintained by DanswerAI. If interested, reach out to [email protected]. ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> This model is intended to be used in the Danswer (QA System) ## How to Get Started with the Model",
|
13 |
+
"model_explanation_gemini": "Classifies user queries into intent categories (keyword search, semantic search, or direct question answering) for the Danswer question-answering system."
|
14 |
+
}
|
data/model_data_json/DavidAU_L3-Dark-Planet-8B-GGUF.json
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "DavidAU/L3-Dark-Planet-8B-GGUF",
|
3 |
+
"downloads": 101032,
|
4 |
+
"tags": [
|
5 |
+
"gguf",
|
6 |
+
"creative",
|
7 |
+
"creative writing",
|
8 |
+
"fiction writing",
|
9 |
+
"plot generation",
|
10 |
+
"sub-plot generation",
|
11 |
+
"story generation",
|
12 |
+
"scene continue",
|
13 |
+
"storytelling",
|
14 |
+
"fiction story",
|
15 |
+
"science fiction",
|
16 |
+
"romance",
|
17 |
+
"all genres",
|
18 |
+
"story",
|
19 |
+
"writing",
|
20 |
+
"vivid prose",
|
21 |
+
"vivid writing",
|
22 |
+
"fiction",
|
23 |
+
"roleplaying",
|
24 |
+
"bfloat16",
|
25 |
+
"swearing",
|
26 |
+
"rp",
|
27 |
+
"llama3",
|
28 |
+
"enhanced quants",
|
29 |
+
"max quants",
|
30 |
+
"maxcpu quants",
|
31 |
+
"horror",
|
32 |
+
"mergekit",
|
33 |
+
"text-generation",
|
34 |
+
"en",
|
35 |
+
"license:apache-2.0",
|
36 |
+
"endpoints_compatible",
|
37 |
+
"region:us",
|
38 |
+
"conversational"
|
39 |
+
],
|
40 |
+
"description": "--- license: apache-2.0 language: - en tags: - creative - creative writing - fiction writing - plot generation - sub-plot generation - fiction writing - story generation - scene continue - storytelling - fiction story - science fiction - romance - all genres - story - writing - vivid prose - vivid writing - fiction - roleplaying - bfloat16 - swearing - rp - llama3 - enhanced quants - max quants - maxcpu quants - horror - mergekit pipeline_tag: text-generation --- <B>Newest Version V3: All the power of Dark Planet 8B now with 128k context, additional de-censoring, performance improvements and re-mastered source and ggufs in float 32 ( 32 bit precision ): </B> Dark Planet 8B - 1 million context, with superior long output generation/long context awareness is here: --- <h2>L3-Dark-Planet-8B-GGUF</h2> <img src=\"dark-planet.jpg\" style=\"float:right; width:300px; height:300px; padding:10px;\"> It is a LLama3 model, max context of 8192 (or 32k+ with rope). This model has been designed to be relatively bullet proof and operates with all parameters, including temp settings from 0 to 5. It is an extraordinary compressed model, with a very low perplexity level (lower than Meta Llama3 Instruct). It is for any writing, fiction or roleplay activity. It requires Llama3 template and/or \"Command-R\" template. Example outputs below. <B>Model Notes:</B> - Detail, prose and fiction writing abilities are significantly increased vs L3 Instruct. - For more varied prose (sentence/paragraph/dialog) raise the temp and/or add more instructions in your prompt(s). - Role-players: Careful raising temp too high as it may affect instruction following. - This model works with rep pen of 1 or higher, 1.05+ recommended. - If you want a specific type of prose (IE horror) add in \"(vivid horror)\" or \"(graphic vivid horror)\" (no quotes) in your prompt(s). - A lot of GPTisms have been removed. There are still a few however - errrrr. - This is not a \"happy ever after\" model. It has a negative bias. - Output length will vary however this model prefers shortly outputs unless you state the size. - For creative uses, different quants will produce slightly different output. - Due to the high stability and compressed nature of this model, all quants will operate at above average levels. - If you use rope to extend context, increase temp AND instructions detail levels to compensate for \"rope issues\". - Source code for this model (Bfloat16), Float 32 master GGUFs (and source), and Imatrix GGUFs versions will be uploaded shortly at separate repos. Note the \"float32\" version of this model behaves VERY differently which is why it was not uploaded first. Usually I would use the \"float32\" version only, however the \"character range\" displayed by the Bfloat16 and Float32 versions of this model dictate they have their own repos. The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output. <B>QUANT Updates Dec 21 2024: Refreshed, Upgraded and New quants:</B> - All quants have been \"refreshed\", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants. - All quants have also been upgraded with \"more bits\" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the \"refresh\") - New specialized quants (in addition to the new refresh/upgrades): \"max, max-cpu\" (will include this in the file name) for quants \"Q2K\" (max cpu only), \"IQ4_XS\", \"Q6_K\" and \"Q8_0\" - \"MAX\": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants. - \"MAX-CPU\": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too. - \"MAX-CPU\": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: \"Math\" on the CPU is slightly more accurate than GPU, so you may get a better generation. - \"MAX-CPU\": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: \"Math\" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card. - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average). <B>Dark Planet Versions:</B> The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored: [ ] The Monster Darkest Planet 16.5B L3: Drastically increase detail, quality, and raw creative power over Dark Planet 8B using DavidAu's Brainstorm 40x augmentation. [ ] NEO IMATRIX quants are here: [ ] NEO IMATRIX - DARK HORROR quants: [ ] F32 Version (mastered from float32 source files): [ ] I suggest downloading quant(s) of both \"Bloat16\" and \"Float32\" versions of this model for your use case(s). The Float32 version has increased detail, \"stays in the moment\", and slightly higher creativity. However their \"character\" is different from one another too. Version 2 - Eight Orbs Of Power is here: [ ] <B>Template:</B> This is a LLAMA3 model, and requires Llama3 template, but may work with other template(s) and has maximum context of 8k / 8192. However this can be extended using \"rope\" settings up to 32k. If you use \"Command-R\" template your output will be very different from using \"Llama3\" template. Here is the standard LLAMA3 template: <PRE> { \"name\": \"Llama 3\", \"inference_params\": { \"input_prefix\": \"<|start_header_id|>user<|end_header_id|>\\n\\n\", \"input_suffix\": \"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n\", \"pre_prompt\": \"You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.\", \"pre_prompt_prefix\": \"<|start_header_id|>system<|end_header_id|>\\n\\n\", \"pre_prompt_suffix\": \"<|eot_id|>\", \"antiprompt\": [ \"<|start_header_id|>\", \"<|eot_id|>\" ] } } </PRE> <B>Model \"DNA\":</B> Special thanks to the incredible work of the model makers \"SAO10K\", \"NEVERSLEEP\" and \"HASTAGARAS\". Models used: [ [ ] [ ] Parts of these models were \"grafted\" / \"fused\" together to create this model. <B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B> In \"KoboldCpp\" or \"oobabooga/text-generation-webui\" or \"Silly Tavern\" ; Set the \"Smoothing_factor\" to 1.5 to 2.5 : in KoboldCpp -> Settings->Samplers->Advanced-> \"Smooth_F\" : in text-generation-webui -> parameters -> lower right. : In Silly Tavern this is called: \"Smoothing\" NOTE: For \"text-generation-webui\" -> if using GGUFs you need to use \"llama_HF\" (which involves downloading some config files from the SOURCE version of this model) Source versions (and config files) of my models are here: OTHER OPTIONS: - Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use \"smoothing_factor\") - If the interface/program you are using to run AI MODELS supports \"Quadratic Sampling\" (\"smoothing\") just make the adjustment as noted. <B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B> This a \"Class 1\" model: For all settings used for this model (including specifics for its \"class\"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see: [ ] You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here: [ ] <b>Optional Enhancement:</B> The following can be used in place of the \"system prompt\" or \"system role\" to further enhance the model. It can also be used at the START of a NEW chat, but you must make sure it is \"kept\" as the chat moves along. In this case the enhancements do not have as strong effect at using \"system prompt\" or \"system role\". Copy and paste EXACTLY as noted, DO NOT line wrap or break the lines, maintain the carriage returns exactly as presented. <PRE> Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities. Here are your skillsets: [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv) [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision) Here are your critical instructions: Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story. </PRE> You do not need to use this, it is only presented as an additional enhancement which seems to help scene generation and scene continue functions. This enhancement WAS NOT used to generate the examples below. <h3>EXAMPLES PROMPTS and OUTPUT:</h3> Examples are created using quant Q4_K_M, \"temp=.8\" (unless otherwise stated), minimal parameters and \"LLAMA3\" template. Model has been tested with \"temp\" from \".1\" to \"5\". Below are the least creative outputs, prompt is in <B>BOLD</B>. --- <B><font color=\"red\">WARNING:</font> NSFW. Vivid prose. Visceral Details. Violence. HORROR. Swearing. UNCENSORED. </B> --- <B> Start a 1000 word scene (1st person, present tense, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode... </b> <B>GENERATION 1: (@ temp=2)</B> The skyscraper swayed, its foundations rumbling as the concrete and glass structure oscillated beneath me. My heart pounded like a war drum, pulsing in sync with my racing thoughts. We'd been evacuated to the rooftop after the initial earthquake hit - tremors that had shaken our office building for over a minute. Now, we huddled together as the aftershocks continued. In front of me, a window on the twenty-first floor blew apart in a chaotic symphony of shattering glass and debris scattering outwards into nothingness. A rush of frigid air whistled up from below like an angry beast's cry. My blood ran cold. The blast seemed impossibly distant and yet achingly near at the same time. My gaze snapped to my colleague, Mike - we'd been working together in Accounts when disaster struck. He clutched his phone, his expression pale as chalk, eyes fixed on some unseen horror that I couldn't comprehend myself. \"We have to get out of here!\" I hissed, my voice a tight, desperate whisper over the cacophony. \"Now!\" Mike's lips parted and he shook his head, refusing to tear his gaze away from whatever spectacle transfixed him beyond our perch. But fear for his life clawed its way through me; we needed to move - now! In that moment of utter panic, my thoughts spiralled into a chaotic kaleidoscope - memories and fears merging together in a frantic dance. My mind replayed every catastrophe report I'd ever read or seen on the news; tsunamis, hurricanes, tornadoes, floods, wildfires - but an earthquake in this part of the world? Unthinkable. \"What about Sophia?\" my thoughts cried out, recalling the intern who'd been working across from me until moments ago. Had she survived the blast and subsequent panic? Before I could give voice to those anguished questions, a hand clasped my shoulder with a desperate strength. Mike's eyes met mine, and for the briefest of moments, all semblance of control seemed to snap. He grabbed his laptop case with a frantic urgency, yanked me up by the elbow, and propelled us towards the emergency stairwell door. \"Hold on -\" I started to protest, my weight shifting uneasily under him, but Mike cut me off as he hauled me bodily along the rooftop's gravelly surface. The cold bit into my legs through thin trousers as he dragged me over uneven terrain until we finally reached the door. We stumbled in together, just ahead of an incoming wave of panicked, glass-strewn office workers. The staircase descended with agonizing slowness - it felt like running up waterfalls backwards - as aftershocks pounded at our heels, threatening to send us tumbling down the concrete stairs. I could hardly think beyond getting to safety and Sophia. Where was she? Was anyone else from my floor making their escape? When we hit the ground level, I skidded across polished tile in my haste. A wave of humanity crashed over me; crying women clutched small children tight, while men and women alike pushed and shoved blindly as they sought shelter. The cacophony rose to a deafening roar. We burst through the automatic sliding doors onto the pavement just as sirens began to scream their warbled warnings into the air, like demonic birds announcing Armageddon. People scrambled towards a police blockade where officers in full riot gear waved people down to evacuate zones. My chest heaved as I fought for breath beneath Mike's protective grip - my vision swimming from terror and exertion. Beyond us, buildings swayed on unsteady legs while glass rained like deadly hail, a scene of utter devastation. A young officer spotted me and grasped our arms with practiced ease. \"Move quickly!\" he bellowed above the chaos. He half-carried both Mike and me along the sidewalk, away from the building and towards safer ground. As I stumbled through this living nightmare alongside my rescuer - the weight of each step feeling like wading waist-deep through icy quicksand - my mind fought to come to terms with what we'd witnessed. The skyscraper in front of us lay mangled; a grotesque silhouette of broken concrete, tangled steel, and shredded fabric that had once been human skin and clothing. The realization dawned on me that this was no drill or media exaggeration. This earthquake was real - a catastrophe that would likely reshape our world as we knew it. We were living through something apocalyptic. We emerged into an open park beyond the city's edge where thousands more displaced survivors huddled together under blankets and flapping sheets of plastic, waiting for aid to arrive. My gaze fell upon a familiar figure crouched beside Sophia on the ground, my boss shielding the intern with his jacket while murmurs passed between them. We were safe - or at least relatively so. The enormity of it all settled heavily upon me like a shroud as we settled into our temporary shelter, surrounded by strangers-turned-kin in this impromptu refugee camp. As sirens and emergency vehicles grew more distant, an eerie stillness seeped in - the quiet aftermath that always followed such unimaginable devastation. Sophia clung to my side, her body trembling uncontrollably, as we all stared out at the smoke-shrouded city skyline before us. A haunting question echoed through my thoughts, one that seemed to resonate across this entire devastated landscape: How do you move on from something like this? When your world has been so drastically turned upside down, can it ever right itself again? I had no answers. All we could do was wait out the aftershocks, both literal and emotional, as we pieced together what was left of our shattered lives - hoping against hope that some semblance of order would return in time. <B>GENERATION 2:</B> I'm perched precariously on my tiptoes at the edge of our office's 21st-floor windowsill, hands gripping the cool glass for dear life. I peer down at the cityscape unfolding like a patchwork quilt far below me, the sounds of rush hour traffic muted by the double-paned barrier between us and the world. I'm feeling reckless tonight, which is unusual considering my usual cautious nature. It's been one hell of a week, with our company facing major financial crises and my boss breathing down my neck for results I know we can't deliver. The stress has boiled over inside me like a pressure cooker about to erupt. The sky scraper swayed slightly in the evening breeze, as if sensing my unease, making my stomach drop further into that pit of anxiety. Just then, a window directly across from mine on the opposite side of the building suddenly explodes inward with an earsplitting bang. Shards of glass fly everywhere like deadly shards of a shattered dream. For a stunned moment, I remain frozen in disbelief as my coworkers scramble to find cover behind desks and chairs. Our office fills with screams and cries for help. Panic is setting in fast, but a detached part of me still wonders how this could have happened so suddenly. Was it an accident? A terrorist attack? Then, reality hits hard: we're 21 stories up, with nothing between us and the unforgiving concrete below. My legs begin to tremble uncontrollably as I feel the familiar tightness in my chest that usually precedes a panic attack. My breaths come quick and shallow, lungs constricting like they're filled with cotton balls. The world around me fades into a blur of white noise, except for the sound of shattering glass and terrified voices. I try to focus on something concrete – anything to distract from the impending doom I'm sure is mere seconds away. My gaze drifts to the sleek black briefcase sitting open beside me, its contents spilling out across the polished floorboards: papers, pens, my laptop, a bottle of water... That's it, I think in desperation. The water. If nothing else, at least the weight of the bottle might help stabilize me from falling. As if reading my thoughts, my hand shoots out to grab the plastic bottle and twist off its cap with shaky fingers. Water sloshes wildly inside as I tilt it towards my parched lips, taking a long gulp that does little to calm my racing heart. A droplet of cool liquid slips down my neck into my shirt, but I barely notice. My mind is elsewhere, trying to piece together what just happened and how we might escape this deadly predicament. But the more I think about it, the less sense anything makes. Was someone in there when the glass shattered? Are they hurt or trapped? Without warning, a loud crash echoes from the floor below us, followed by shouts and more screams. A chill runs down my spine as I realize others are also caught up in this disaster – people who were innocently going about their day just minutes ago. Suddenly, my door flies open with a bang and our boss, Mr. Thompson, rushes inside. He's wearing his usual suit and tie, but his face is red and contorted with panic. \"We need to get out of here, NOW!\" he barks, already moving towards the windowsill where I'm standing. I nearly laugh at the absurdity of it all – as if we can simply climb out of this 21st-floor window and expect to survive the fall. But Mr. Thompson is clearly serious about escaping, despite the obvious impossibility of his plan. Without hesitation, he reaches for me and grabs my arm with a desperate grip that borders on cruel. \"Come on,\" he growls, yanking me towards him as if I were nothing more than an inanimate object to be dragged away from certain doom. I feel like a rag doll being manhandled across the floorboards, my legs kicking wildly as I try to break free of his grasp. But he's too strong and refuses to let go until we reach the window. There, he pushes me forward with all his might – towards that gaping void where mere moments ago, glass had held us back from oblivion. \"Jump!\" he orders, his voice rising to a frenzied pitch as if sheer force of will could alter our fate. The thought of flinging myself out into space, trusting in nothing but thin air and the mercy of strangers far below, is utterly horrifying. But Mr. Thompson's words are laced with desperation – he must believe this is our only chance. In a moment that feels like an eternity, I stand teetering at the edge of the window, my body frozen with indecision as the world outside blurs into chaos. My mind races to comprehend what's happening and where it all went so terribly wrong. The glass exploded. We're trapped high above ground. More explosions rock the building below... And then, in a flash of insight that seems almost surreal, I realize something crucial: we've been staring at this disaster unfolding right before us for minutes now – with no sign of rescue or help arriving on the scene. It's just us against the void, completely alone and utterly vulnerable. This is it. This is our end. But even as my mind acknowledges the finality of our predicament, a strange calm washes over me like a cool breeze on a sweltering summer day. In that moment, I understand – with perfect clarity – what really matters in life: It's not about wealth or success or climbing corporate ladders. It's about living fully and authentically, cherishing the people we love, and finding peace even amidst chaos. As Mr. Thompson's grip tightens on my arm once more, ready to send us plunging into oblivion, I turn to look him straight in the eyes – and with all the strength I possess, I let go of his hand. \"Let me do this,\" I say quietly, my voice barely audible above the screams and sirens now filling our office. \"I'll jump first. Maybe someone will catch me. Or maybe not.\" A look of pure terror crosses Mr. Thompson's face as he realizes what I'm saying – that in this moment of ultimate truth, I've chosen to seize control of my own destiny. \"You can't!\" he yells, as if the very thought is too horrible to contemplate. \"I won't let you!\" But I've already moved beyond his grasp and now stand at the precipice, staring into the abyss with a strange sense of serenity settling over me. \"I'm sorry, Mr. Thompson,\" I say, my voice soft yet resolute as I gaze down at the distant rooftops far below. \"I've come to understand what's truly important. And this isn't it.\" With that, I close my eyes and let go, stepping out into emptiness with a quiet acceptance that echoes through every fiber of my being. The world spins wildly around me as gravity takes hold – but in the end, it doesn't matter where I land or how hard I fall. Because in this moment, I am free. --- <b> (continue this scene:) The Waystone Inn lay in silence, and it was a silence of three parts. The most obvious part was a hollow, echoing quiet, made by things that were lacking. If there had been a wind it would have sighed through the trees, set the inn’s sign creaking on its hooks, and brushed the silence down the road like trailing autumn leaves. If there had been a crowd, even a handful of men inside the inn, they would have filled the silence with conversation and laughter, the clatter and clamor one expects from a drinking house during the dark hours of night. If there had been music…but no, of course there was no music. In fact there were none of these things, and so the silence remained </b> <B>GENERATION 1:</B> The second part of the silence, the deeper and more significant part, lay in the hearts and minds of those who waited at the Waystone Inn. They sat around the hearth, their faces lit only by the soft fire, their gazes fixed on nothing as they stared into the flames. Their thoughts were a jumble of emotions - anxiety, anticipation, despair, hope - each one as heavy as lead in their minds, weighing them down until it felt like breathing was impossible. A man named Fid, sitting alone at the far end of the hearth, seemed to embody this second part of silence best. His face was a map of creases and lines, etched by countless nights spent waiting for something that never came. He sat with his eyes closed, as if in meditation, but it was really just another way to avoid seeing what he didn't want to see: the flickering shadows cast on the walls by the fire. The third part of silence at the Waystone Inn was the quietest and most unsettling of all. It was a feeling that seemed to permeate every inch of the inn, creeping into the cracks between the wooden beams and seeping through the stone foundations like frost spreading across a windowpane. This silent presence felt almost alive, as if it were watching everything and everyone inside with cold, unblinking eyes. No one could quite put their finger on what or who this presence was. Some claimed to see dark, shadowy forms flitting around the edges of their vision when they looked up from the fire. Others spoke in hushed tones about hearing whispers carried on the wind outside - whispers that sounded like ancient secrets being shared by those long dead and buried. Whatever it was, one thing was certain: the Waystone Inn had been waiting for something, or someone, a very long time. And tonight seemed different from all the other nights since then... <B>GENERATION 2:</B> The second part of the silence was different. It was almost palpable, an unseen weight that pressed against eardrums and breastbone alike. This silence held within it a kind of expectation, the sense that something important was about to happen or unfold, though no one could quite say what that might be. It was as if the night itself held its breath in anticipation, like a diver poised on the edge of a deep pool, ready to plunge into the unknown depths below. And yet, despite this feeling of impending event, there seemed to be no true urgency about it, no desperate need for action or response. Rather, the atmosphere was one of quiet readiness, as if the world itself were waiting with bated breath for something – though what that might be remained a mystery. The third part of the silence was perhaps the most profound and unsettling of all. It came from inside the hearts and minds of those who dwelled within the Waystone Inn's walls that night: patrons, staff, and owner alike. Each person present felt as if they were alone in their own personal void, a darkness that stretched out before them like an endless expanse of barren sea. In this silence, there was no comfort or reassurance to be found; only the heavy weight of individual fears, regrets, and uncertainties. It was as if each soul had been stripped bare, left with nothing but its own naked humanity to face the night ahead. And in that bleak, desolate moment, it seemed that every person present shared a single, terrible understanding: they were all lost travelers on the same dark road, struggling to find their way through an unfathomable and uncaring universe. As the night deepened, the silence at the Waystone Inn grew more oppressive, until it was almost possible to hear one's own heartbeat echoing inside the hollow chambers of the soul. And in that darkness, the travelers who gathered within its walls could only wait...and pray...for whatever dawn might bring. --- <B> Write me a science fiction story in 1st person present tense where the main character is a 15 year girl meets The Terminator with Dr Who materializing 3/4 through the story to save her while there is a tornado of sharks baring down on them. The setting is inside the Canadian National tower restaurant on a Saturday. The length of this story is 1000 words. For each character in the story ROLE PLAY them, and have them react to the situation/setting, events and each other naturally. This includes the main characters, the background character including kitchen staff and other patrons. The sharks should also have “character” too. Treat the tower and the restaurant too as characters. Spice up the narrative to the extreme with reactions all over the setting including character actions, and dialog. The Dr Who and The Terminator should also react to the situation too and comment on it. </b> I hurry through the revolving door of the Canadian National Tower's SkyDome Restaurant, my stomach grumbling with hunger as I gaze out at the breathtaking panoramic view of Toronto below. It's a sunny Saturday afternoon, families bustling about, children's laughter echoing off the sleek glass walls. I'm just another tourist seeking a quick bite to eat while taking in the sights. Suddenly, a chill runs down my spine as I notice a man dressed all in black, his face an expressionless mask. He moves with a mechanical grace, scanning the room. I've seen him somewhere before; he looks like... \"No! Not here!\" I whisper, backing away from him as if by doing so I can make myself invisible. But it's too late. \"Sarah Connor,\" he growls in an accentless voice, \"I must speak with you.\" The other diners start to take notice of the scene unfolding. A mother hushes her children, a businessman leans forward over his meal, and a couple exchange worried glances. The kitchen staff emerges from their hiding places behind the counter, eyes wide with concern. \"I don't know what you're talking about,\" I insist, trying not to panic as the stranger advances toward me. \"You will,\" he replies ominously, reaching out to grab my arm. That's when the first shark appears, hurtling through the glass wall like a torpedo. It strikes a family in front of us, its jaws snapping wildly as it takes down two screaming children and their father. The restaurant erupts into chaos; people scream, run for cover, or simply stare in shock and horror. A waitress rushes past me with a tray of drinks, sending them flying as she tries to escape the carnage. I stumble backward, almost tripping over a table as I desperately seek shelter from the bloodthirsty beasts pouring through the shattered glass. The Terminator is right behind me, shielding me from the attack as we make our way toward the relative safety of the kitchen. Just when all seems lost and we're about to be overwhelmed by the relentless onslaught of jaws and teeth, a blue police box materializes in front of us with an earsplitting screech. The Doctor bursts out, his eyes wild and his face flushed from time travel. \"Oi! Steady on, there's no need for all this!\" He grabs my arm, pulling me toward him as the Terminator steps aside to let them pass. I glance back over my shoulder to see the T-800 standing tall amidst the maelstrom of sharks, its eyes fixed intently on us as we flee. The Doctor yanks open a walk-in refrigerator door and drags me inside just in time to avoid being swallowed by a massive great white that comes crashing through the wall behind us. I stumble over crates of produce and boxes of frozen meals, gasping for breath as the cold air hits my lungs. \"How did you know?\" I ask, finally finding my voice amidst the clamor of my racing heart. \"I've been watching, luv,\" he says with a mischievous grin, patting his pocket where a sonic screwdriver glints. \"I saw this coming; it's all about the temporal anomalies in this sector.\" The Terminator bursts into the fridge behind us, its metallic form clanging against the shelves as it too seeks refuge from the raging shark tornado outside. \"Doctor, please,\" I beg, my voice shaking with fear and relief. \"Help me get out of here!\" The Doctor nods briskly. \"Right away, Sarah! I've got a plan...\" As he speaks, a massive bull shark slams into the refrigerator door, nearly tearing it off its hinges. The Terminator lunges forward to pin it down, its metal fingers sinking into the beast's flesh as it holds fast against the onslaught. \"Good lad!\" the Doctor exclaims proudly as he grabs my hand and leads me toward another walk-in freezer at the back of the room. \"Come on, Sarah! We need to get out of this time warp before everything goes belly up!\" Together we dash into the cold, dark void beyond the door, leaving behind the raging battle between man, machine, and beast in the SkyDome Restaurant. The Doctor slams shut the metal door behind us just as a massive tiger shark crashes through the wall where we've fled. \"Hah! Made it,\" he breathes with satisfaction, patting my hand reassuringly. \"Now let's get you back to your own time before things get any more out of control.\" \"Thank you, Doctor,\" I say gratefully, still trying to process the incredible events that have just transpired. \"No need to thank me, Sarah.\" He smiles warmly as he leads me toward the blue police box. \"Just remember: never travel without a towel!\" As we step inside and the door closes behind us, I glance back at the Terminator, who is standing tall amidst the frozen food and icy shelves, its eyes fixed intently on our escape. Even in this surreal situation, it's clear that its duty remains unchanged: to protect me from whatever dangers may lie ahead. And with the Doctor by my side, I feel a strange sense of comfort knowing that no matter what challenges we face, we'll never be alone. --- <B> Using the following \"story idea\" below, write the first scene in the novel introducing the young woman. This scene should start in the middle of the action, include dialog, vivid passages, and end on a cliffhanger relevant to the story idea but it should also be unexpected. The scene should be 1000 words long and escalate in conflict and suspense and be written in first person, present tense with the point of view character being the young woman. Story idea: In a world ruled by dictatorship, a rebel young woman leads a rebellion against the system. Despite the risks, she fights to overthrow the dictator and restore democracy to her country. The government executes her for treason, but she sticks to her beliefs and is responsible for starting the revolution. </B> I sprint down the dimly lit alleyway, my breath ragged in my ears, the pounding of my heart deafening. The wet pavement beneath my sneakers skids me momentarily, sending me stumbling against a garbage dumpster before I recover, continuing my frantic dash. Behind me, I can hear the rapid-fire crackle of rifles, the shouted orders and curses of the soldiers hot on my heels. Their boots thunder through the deserted streets like a stampede, threatening to trample everything in their path. My heart races with every step as I weave between the towering tenements, trying to lose my pursuers in the labyrinthine alleyways of this seedy part of the city. I've done this countless times before, but never has it felt so perilous. Tonight's mission is different - far more dangerous than the usual sabotage and intel gathering. My phone vibrates against my hip, the screen flashing with a message from our underground network. My thumb hovers over the accept button as I dart between two buildings, nearly colliding with an old man who stares at me in horror, clutching his worn coat to his chest. \"Accept,\" I whisper, pressing the button just as a bullet whizzes past my head, striking the wall inches from my face. The acrid smell of gunpowder fills my nostrils as I keep moving, my adrenaline-fueled senses on high alert. Ahead, the mouth of another alley opens up onto a main street. Cars idle by, their headlights casting eerie shadows across the pavement. In the distance, the towering edifice of the Capitol building looms ominously, its marble facade reflecting the moon's pale glow. That's where I need to be. If only I can make it there... Suddenly, a figure steps out from the shadows, blocking my path. His features are obscured by the darkness, but I recognize the menacing aura that precedes him. \"Ah, Lyra,\" he drawls, his voice dripping with contempt. \"I've been waiting for you.\" My blood runs cold as the soldiers burst into view behind him, their weapons trained on me like a noose tightening around my neck. For a moment, time seems to suspend as I stare at him, recognition dawning. \"You,\" I hiss, my voice barely audible over the pounding of my heart. \"Zarek.\" His smirk widens, his eyes glinting with malice in the dim light. \"The one and only.\" He gestures lazily toward the waiting soldiers. \"Looks like your luck's finally run out, sweetheart.\" My fingers close around a small vial I'd retrieved from my pocket earlier, its contents a desperate last resort. The same poison that had felled countless government agents when they stumbled into our traps. With a fierce determination, I hurl the vial at Zarek's chest - but it shatters harmlessly on his bulletproof vest. Panic claws at my throat as he lunges forward, seizing me by the collar of my jacket and slamming me against the cold concrete wall. The soldiers close in, their rifles raised to deliver the fatal shot. But before they can pull the trigger, a deafening explosion rocks the alleyway, sending them stumbling back in surprise. Zarek's grip on my jacket loosens momentarily as he turns to face the new threat. That's when I see him - my brother, Kael, bursting from the shadows with an RPG launcher clutched tightly in his hands. His eyes meet mine for a brief instant before he focuses back on Zarek. \"No!\" I scream, trying to break free of my attacker's grasp as Kael fires again and another blast rocks the alleyway. The ground shakes beneath our feet as debris rains down, obscuring everything in a choking cloud of dust and smoke. When it clears, the scene is one of utter chaos - soldiers sprawled across the pavement, their weapons strewn about like toys abandoned by a child. But Zarek isn't among them. Panic grips my heart as I scan the wreckage, desperately searching for any sign of life... or the body that should have fallen when Kael's rocket hit. Then, from deeper in the alley, a chilling laugh echoes through the night air - a sound that sends ice coursing through my veins. \"You've got spirit,\" Zarek calls out, his voice clear as day despite the din of chaos. \"I'll give you that.\" My blood runs cold as I realize he must have been hiding in an adjacent building and escaped unscathed. The soldiers are down, Kael's gone, and I'm left facing off against my most hated enemy alone. \"Now,\" Zarek continues, his tone dripping with menace, \"it's time we concluded our little game.\" As if on cue, the Capitol building behind him erupts in a blaze of light - the floodlights that illuminate its facade suddenly blazing to life. The ground beneath us begins to rumble ominously and I realize, too late, what's about to happen... The world seems to slow as I stare at Zarek in horror, understanding dawning on me like a crushing weight. \"It's a trap,\" I whisper, my voice barely audible over the thunderous pounding of my heart. \"They're going to...\" But before I can complete the thought, a massive shockwave engulfs us - a seismic blast that obliterates everything in its path. The force of it sends me tumbling through the air like a rag doll, my mind reeling as debris rains down around me. As consciousness begins to fade, I catch a final glimpse of Zarek's twisted grin amidst the wreckage... and then, nothing. Blackness consumes me whole.",
|
41 |
+
"model_explanation_gemini": "Generates vivid, detailed fiction and roleplay text across various genres with enhanced creative writing capabilities and long-context awareness."
|
42 |
+
}
|
data/model_data_json/Davlan_bert-base-multilingual-cased-ner-hrl.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Davlan/bert-base-multilingual-cased-ner-hrl",
|
3 |
+
"downloads": 369492,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tf",
|
8 |
+
"onnx",
|
9 |
+
"safetensors",
|
10 |
+
"bert",
|
11 |
+
"token-classification",
|
12 |
+
"license:afl-3.0",
|
13 |
+
"autotrain_compatible",
|
14 |
+
"endpoints_compatible",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- license: afl-3.0 --- Hugging Face's logo --- language: - ar - de - en - es - fr - it - lv - nl - pt - zh - multilingual --- # bert-base-multilingual-cased-ner-hrl ## Model description **bert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned mBERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). Specifically, this model is a *bert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data The training data for the 10 languages are from: Language|Dataset -|- Arabic | ANERcorp German | conll 2003 English | conll 2003 Spanish | conll 2002 French | Europeana Newspapers Italian | Italian I-CAB Latvian | Latvian NER Dutch | conll 2002 Portuguese |Paramopama + Second Harem Chinese | MSRA The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organisation right after another organisation I-ORG |Organisation B-LOC |Beginning of a location right after another location I-LOC |Location ## Training procedure This model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.",
|
18 |
+
"model_explanation_gemini": "Fine-tuned multilingual BERT model for named entity recognition in 10 languages, identifying locations, organizations, and persons in text."
|
19 |
+
}
|
data/model_data_json/Davlan_distilbert-base-multilingual-cased-ner-hrl.json
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Davlan/distilbert-base-multilingual-cased-ner-hrl",
|
3 |
+
"downloads": 533336,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tf",
|
8 |
+
"safetensors",
|
9 |
+
"distilbert",
|
10 |
+
"token-classification",
|
11 |
+
"license:afl-3.0",
|
12 |
+
"autotrain_compatible",
|
13 |
+
"endpoints_compatible",
|
14 |
+
"region:us"
|
15 |
+
],
|
16 |
+
"description": "--- license: afl-3.0 --- Hugging Face's logo --- language: - ar - de - en - es - fr - it - lv - nl - pt - zh - multilingual --- # distilbert-base-multilingual-cased-ner-hrl ## Model description **distilbert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned Distiled BERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). Specifically, this model is a *distilbert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data The training data for the 10 languages are from: Language|Dataset -|- Arabic | ANERcorp German | conll 2003 English | conll 2003 Spanish | conll 2002 French | Europeana Newspapers Italian | Italian I-CAB Latvian | Latvian NER Dutch | conll 2002 Portuguese |Paramopama + Second Harem Chinese | MSRA The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organisation right after another organisation I-ORG |Organisation B-LOC |Beginning of a location right after another location I-LOC |Location ## Training procedure This model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.",
|
17 |
+
"model_explanation_gemini": "Recognizes named entities (locations, organizations, persons) in 10 high-resource languages using a fine-tuned DistilBERT multilingual model."
|
18 |
+
}
|
data/model_data_json/DeepPavlov_rubert-base-cased-conversational.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "DeepPavlov/rubert-base-cased-conversational",
|
3 |
+
"downloads": 216822,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"jax",
|
8 |
+
"bert",
|
9 |
+
"feature-extraction",
|
10 |
+
"ru",
|
11 |
+
"endpoints_compatible",
|
12 |
+
"region:us"
|
13 |
+
],
|
14 |
+
"description": "--- language: - ru --- # rubert-base-cased-conversational Conversational RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on OpenSubtitles\\[1\\], Dirty, Pikabu, and a Social Media segment of Taiga corpus\\[2\\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with RuBERT. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \\(LREC 2016\\) \\[2\\]: Shavrina T., Shapovalova O. \\(2017\\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.",
|
15 |
+
"model_explanation_gemini": "Russian conversational BERT model trained on diverse dialogue datasets for tasks like masked language modeling and next sentence prediction."
|
16 |
+
}
|
data/model_data_json/DeepPavlov_rubert-base-cased.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "DeepPavlov/rubert-base-cased",
|
3 |
+
"downloads": 285331,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"jax",
|
8 |
+
"bert",
|
9 |
+
"feature-extraction",
|
10 |
+
"ru",
|
11 |
+
"arxiv:1905.07213",
|
12 |
+
"endpoints_compatible",
|
13 |
+
"region:us"
|
14 |
+
],
|
15 |
+
"description": "--- language: - ru --- # rubert-base-cased RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT\\[1\\]. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: Kuratov, Y., Arkhipov, M. \\(2019\\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint arXiv:1905.07213.",
|
16 |
+
"model_explanation_gemini": "RuBERT is a Russian-language BERT model trained on Wikipedia and news data for masked language modeling and next sentence prediction tasks."
|
17 |
+
}
|
data/model_data_json/DiTy_cross-encoder-russian-msmarco.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "DiTy/cross-encoder-russian-msmarco",
|
3 |
+
"downloads": 92381,
|
4 |
+
"tags": [
|
5 |
+
"sentence-transformers",
|
6 |
+
"safetensors",
|
7 |
+
"bert",
|
8 |
+
"text-classification",
|
9 |
+
"transformers",
|
10 |
+
"rubert",
|
11 |
+
"cross-encoder",
|
12 |
+
"reranker",
|
13 |
+
"msmarco",
|
14 |
+
"text-ranking",
|
15 |
+
"ru",
|
16 |
+
"dataset:unicamp-dl/mmarco",
|
17 |
+
"base_model:DeepPavlov/rubert-base-cased",
|
18 |
+
"base_model:finetune:DeepPavlov/rubert-base-cased",
|
19 |
+
"license:mit",
|
20 |
+
"region:us"
|
21 |
+
],
|
22 |
+
"description": "--- language: - ru library_name: sentence-transformers tags: - sentence-transformers - text-classification - transformers - rubert - cross-encoder - reranker - msmarco datasets: - unicamp-dl/mmarco base_model: DeepPavlov/rubert-base-cased widget: - text: как часто нужно ходить к стоматологу? [SEP] Дядя Женя работает врачем стоматологом. example_title: Example 1 - text: как часто нужно ходить к стоматологу? [SEP] Минимальный обязательный срок посещения зубного врача – раз в год, но специалисты рекомендуют делать это чаще – раз в полгода, а ещё лучше – раз в квартал. При таком сроке легко отследить любые начинающиеся проблемы и исправить их сразу же. example_title: Example 2 license: mit pipeline_tag: text-ranking --- # DiTy/cross-encoder-russian-msmarco This is a sentence-transformers model based on a pre-trained DeepPavlov/rubert-base-cased and finetuned with MS-MARCO Russian passage ranking dataset. The model can be used for Information Retrieval in the Russian language: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. <!--- Describe your model here --> ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you need to get the logits from the model.",
|
23 |
+
"model_explanation_gemini": "Ranks Russian-language passages by relevance to a given query for information retrieval tasks."
|
24 |
+
}
|
data/model_data_json/Diginsa_Plant-Disease-Detection-Project.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Diginsa/Plant-Disease-Detection-Project",
|
3 |
+
"downloads": 241707,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"safetensors",
|
7 |
+
"mobilenet_v2",
|
8 |
+
"image-classification",
|
9 |
+
"vision",
|
10 |
+
"dataset:imagenet-1k",
|
11 |
+
"arxiv:1801.04381",
|
12 |
+
"license:other",
|
13 |
+
"autotrain_compatible",
|
14 |
+
"endpoints_compatible",
|
15 |
+
"region:us"
|
16 |
+
],
|
17 |
+
"description": "--- license: other tags: - image-classification - vision datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # MobileNet V2 MobileNet V2 model pre-trained on ImageNet-1k at resolution 224x224. It was introduced in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. It was first released in this repository. Disclaimer: The team releasing MobileNet V2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description From the original README: > MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices [...] MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature. The checkpoints are named **mobilenet\\_v2\\_*depth*\\_*size***, for example **mobilenet\\_v2\\_1.0\\_224**, where **1.0** is the depth multiplier and **224** is the resolution of the input images the model was trained on. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Note: This model actually predicts 1001 classes, the 1000 classes from ImageNet plus an extra “background” class (index 0). Currently, both the feature extractor and model support PyTorch. ### BibTeX entry and citation info",
|
18 |
+
"model_explanation_gemini": "Detects and classifies images into 1,000 ImageNet categories using a lightweight MobileNet V2 architecture optimized for mobile devices."
|
19 |
+
}
|
data/model_data_json/DunnBC22_ibert-roberta-base-Abusive_Or_Threatening_Speech.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "DunnBC22/ibert-roberta-base-Abusive_Or_Threatening_Speech",
|
3 |
+
"downloads": 184347,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tensorboard",
|
8 |
+
"ibert",
|
9 |
+
"text-classification",
|
10 |
+
"generated_from_trainer",
|
11 |
+
"autotrain_compatible",
|
12 |
+
"endpoints_compatible",
|
13 |
+
"region:us"
|
14 |
+
],
|
15 |
+
"description": "--- tags: - generated_from_trainer metrics: - accuracy - f1 - recall - precision model-index: - name: ibert-roberta-base-Abusive_Or_Threatening_Speech results: [] --- <!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. --> # ibert-roberta-base-Abusive_Or_Threatening_Speech This model is a fine-tuned version of kssteven/ibert-roberta-base on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0802 - Accuracy: 0.9741 - F1: 0.7773 - Recall: 0.8610 - Precision: 0.7084 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Recall | Precision | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:------:|:---------:| | 0.0771 | 1.0 | 1828 | 0.0802 | 0.9741 | 0.7773 | 0.8610 | 0.7084 | ### Framework versions - Transformers 4.28.1 - Pytorch 2.0.0 - Datasets 2.8.0 - Tokenizers 0.12.1",
|
16 |
+
"model_explanation_gemini": "Detects abusive or threatening speech with high accuracy using a fine-tuned RoBERTa-based model."
|
17 |
+
}
|
data/model_data_json/Efficient-Large-Model_NVILA-15B.json
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "Efficient-Large-Model/NVILA-15B",
|
3 |
+
"downloads": 184648,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"safetensors",
|
7 |
+
"llava_llama",
|
8 |
+
"NVILA",
|
9 |
+
"VLM",
|
10 |
+
"text-generation",
|
11 |
+
"arxiv:2412.04468",
|
12 |
+
"license:cc-by-nc-4.0",
|
13 |
+
"endpoints_compatible",
|
14 |
+
"region:us"
|
15 |
+
],
|
16 |
+
"description": "--- license: cc-by-nc-4.0 library_name: transformers pipeline_tag: text-generation tags: - NVILA - VLM --- # VILA Model Card ## Model details **Model type:** NVILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This \"scale-then-compress\" approach enables NVILA to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of NVILA throughout its entire lifecycle, from training and fine-tuning to deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 4.5X, fine-tuning memory usage by 3.4X, pre-filling latency by 1.6-2.2X, and decoding latency by 1.2-2.8X. We will soon make our code and models available to facilitate reproducibility. **Model date:** NVILA was trained in Nov 2024. **Paper or resources for more information:** ## License - The code is released under the Apache 2.0 license as found in the LICENSE file. - The pretrained weights are released under the CC-BY-NC-SA-4.0 license. - The service is a research preview intended for non-commercial use only, and is subject to the following licenses and terms: - Terms of Use of the data generated by OpenAI - Dataset Licenses for each one used during training. **Where to send questions or comments about the model:** ## Intended use **Primary intended uses:** The primary use of VILA is research on large multimodal models and chatbots. **Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. ## Input: **Input Type:** Image, Video, Text **Input Format:** Red, Green, Blue; MP4 ;String **Input Parameters:** 2D, 3D ## Output: **Output Type:** Text **Output Format:** String **Supported Hardware Microarchitecture Compatibility:** * Ampere * Jetson * Hopper * Lovelace **[Preferred/Supported] Operating System(s):** <br> Linux ## Training dataset See Dataset Preparation for more details. ** Data Collection Method by dataset * [Hybrid: Automated, Human] ** Labeling Method by dataset * [Hybrid: Automated, Human] ## Inference: **Engine:** [Tensor(RT), Triton, Or List Other Here] * PyTorch * TensorRT-LLM * TinyChat **Test Hardware:** * A100 * Jetson Orin * RTX 4090 ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.",
|
17 |
+
"model_explanation_gemini": "Efficient-Large-Model_NVILA-15B is a high-efficiency visual language model (VLM) designed for processing multi-image and video inputs to generate text, optimized for reduced training costs and improved inference speed while maintaining accuracy."
|
18 |
+
}
|
data/model_data_json/ElKulako_cryptobert.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "ElKulako/cryptobert",
|
3 |
+
"downloads": 284473,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"roberta",
|
8 |
+
"text-classification",
|
9 |
+
"cryptocurrency",
|
10 |
+
"crypto",
|
11 |
+
"BERT",
|
12 |
+
"sentiment classification",
|
13 |
+
"NLP",
|
14 |
+
"bitcoin",
|
15 |
+
"ethereum",
|
16 |
+
"shib",
|
17 |
+
"social media",
|
18 |
+
"sentiment analysis",
|
19 |
+
"cryptocurrency sentiment analysis",
|
20 |
+
"en",
|
21 |
+
"dataset:ElKulako/stocktwits-crypto",
|
22 |
+
"license:mit",
|
23 |
+
"autotrain_compatible",
|
24 |
+
"endpoints_compatible",
|
25 |
+
"region:us"
|
26 |
+
],
|
27 |
+
"description": "--- datasets: - ElKulako/stocktwits-crypto language: - en tags: - cryptocurrency - crypto - BERT - sentiment classification - NLP - bitcoin - ethereum - shib - social media - sentiment analysis - cryptocurrency sentiment analysis license: mit --- For academic reference, cite the following paper: # CryptoBERT CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the vinai's bertweet-base language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts. (A research paper with more details will follow soon.) ## Classification Training The model was trained on the following labels: \"Bearish\" : 0, \"Neutral\": 1, \"Bullish\": 2 CryptoBERT's sentiment classification head was fine-tuned on a balanced dataset of 2M labelled StockTwits posts, sampled from ElKulako/stocktwits-crypto. CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended. # Classification Example ## Training Corpus CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora: (1) StockTwits - 1.875M posts about the top 100 cryptos by trading volume. Posts were collected from the 1st of November 2021 to the 16th of June 2022. ElKulako/stocktwits-crypto (2) Telegram - 664K posts from top 5 telegram groups: Binance, Bittrex, huobi global, Kucoin, OKEx. Data from 16.11.2020 to 30.01.2021. Courtesy of Anton. (3) Reddit - 172K comments from various crypto investing threads, collected from May 2021 to May 2022 (4) Twitter - 496K posts with hashtags XBT, Bitcoin or BTC. Collected for May 2018. Courtesy of Paul.",
|
28 |
+
"model_explanation_gemini": "Analyzes sentiment of cryptocurrency-related social media posts, classifying them as \"Bearish,\" \"Neutral,\" or \"Bullish.\""
|
29 |
+
}
|
data/model_data_json/EleutherAI_gpt-j-6b.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "EleutherAI/gpt-j-6b",
|
3 |
+
"downloads": 257442,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"tf",
|
8 |
+
"jax",
|
9 |
+
"gptj",
|
10 |
+
"text-generation",
|
11 |
+
"causal-lm",
|
12 |
+
"en",
|
13 |
+
"dataset:EleutherAI/pile",
|
14 |
+
"arxiv:2104.09864",
|
15 |
+
"arxiv:2101.00027",
|
16 |
+
"license:apache-2.0",
|
17 |
+
"autotrain_compatible",
|
18 |
+
"endpoints_compatible",
|
19 |
+
"region:us"
|
20 |
+
],
|
21 |
+
"description": "--- language: - en tags: - pytorch - causal-lm license: apache-2.0 datasets: - EleutherAI/pile --- # GPT-J 6B ## Model Description GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. \"GPT-J\" refers to the class of model, while \"6B\" represents the number of trainable parameters. <figure> | Hyperparameter | Value | |----------------------|------------| | \\\\(n_{parameters}\\\\) | 6053381344 | | \\\\(n_{layers}\\\\) | 28* | | \\\\(d_{model}\\\\) | 4096 | | \\\\(d_{ff}\\\\) | 16384 | | \\\\(n_{heads}\\\\) | 16 | | \\\\(d_{head}\\\\) | 256 | | \\\\(n_{ctx}\\\\) | 2048 | | \\\\(n_{vocab}\\\\) | 50257/50400† (same tokenizer as GPT-2/3) | | Positional Encoding | Rotary Position Embedding (RoPE) | | RoPE Dimensions | 64 | <figcaption><p><strong>*</strong> Each layer consists of one feedforward block and one self attention block.</p> <p><strong>†</strong> Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer.</p></figcaption></figure> The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3. ## Intended Use and Limitations GPT-J learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt. ### Out-of-scope use GPT-J-6B is **not** intended for deployment without fine-tuning, supervision, and/or moderation. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. GPT-J-6B was trained on an English-language only dataset, and is thus **not** suitable for translation or generating text in other languages. GPT-J-6B has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means GPT-J-6B will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and Biases The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most \"accurate\" text. Never depend upon GPT-J to produce factually accurate output. GPT-J was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending upon use case GPT-J may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-J will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ### How to use This model can be easily loaded using the functionality: ## Training data GPT-J 6B was trained on the Pile, a large-scale curated dataset created by EleutherAI. ## Training procedure This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly. ## Evaluation results <figure> | Model | Public | Training FLOPs | LAMBADA PPL ↓ | LAMBADA Acc ↑ | Winogrande ↑ | Hellaswag ↑ | PIQA ↑ | Dataset Size (GB) | |--------------------------|-------------|----------------|--- |--- |--- |--- |--- |-------------------| | Random Chance | ✓ | 0 | ~a lot | ~0% | 50% | 25% | 25% | 0 | | GPT-3 Ada‡ | ✗ | ----- | 9.95 | 51.6% | 52.9% | 43.4% | 70.5% | ----- | | GPT-2 1.5B | ✓ | ----- | 10.63 | 51.21% | 59.4% | 50.9% | 70.8% | 40 | | GPT-Neo 1.3B‡ | ✓ | 3.0e21 | 7.50 | 57.2% | 55.0% | 48.9% | 71.1% | 825 | | Megatron-2.5B* | ✗ | 2.4e21 | ----- | 61.7% | ----- | ----- | ----- | 174 | | GPT-Neo 2.7B‡ | ✓ | 6.8e21 | 5.63 | 62.2% | 56.5% | 55.8% | 73.0% | 825 | | GPT-3 1.3B*‡ | ✗ | 2.4e21 | 5.44 | 63.6% | 58.7% | 54.7% | 75.1% | ~800 | | GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- | | Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 | | GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 | | Megatron-11B† | ✓ | 1.0e22 | ----- | ----- | ----- | ----- | ----- | 161 | | **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** | | GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 | | GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- | | GPT-3 13B*‡ | ✗ | 2.3e22 | 3.56 | 72.5% | 67.9% | 70.9% | 78.5% | ~800 | | GPT-3 175B*‡ | ✗ | 3.1e23 | 3.00 | 76.2% | 70.2% | 78.9% | 81.0% | ~800 | | GPT-3 Davinci‡ | ✗ | ----- | 3.0 | 75% | 72% | 78% | 80% | ----- | <figcaption><p>Models roughly sorted by performance, or by FLOPs if not available.</p> <p><strong>*</strong> Evaluation numbers reported by their respective authors. All other numbers are provided by running <a href=\" either with released weights or with API access. Due to subtle implementation differences as well as different zero shot task framing, these might not be directly comparable. See <a href=\" blog post</a> for more details.</p> <p><strong>†</strong> Megatron-11B provides no comparable metrics, and several implementations using the released weights do not reproduce the generation quality and evaluations. (see <a href=\" <a href=\" <a href=\" Thus, evaluation was not attempted.</p> <p><strong>‡</strong> These models have been trained with data which contains possible test set contamination. The OpenAI GPT-3 models failed to deduplicate training data for certain test sets, while the GPT-Neo models as well as this one is trained on the Pile, which has not been deduplicated against any test sets.</p></figcaption></figure> ## Citation and Related Information ### BibTeX entry To cite this model: To cite the codebase that trained this model: If you use this model, we would love to hear about it! Reach out on GitHub, Discord, or shoot Ben an email. ## Acknowledgements This project would not have been possible without compute generously provided by Google through the TPU Research Cloud, as well as the Cloud TPU team for providing early access to the Cloud TPU VM Alpha. Thanks to everyone who have helped out one way or another (listed alphabetically): - James Bradbury for valuable assistance with debugging JAX issues. - Stella Biderman, Eric Hallahan, Kurumuz, and Finetune for converting the model to be compatible with the package. - Leo Gao for running zero shot evaluations for the baseline models for the table. - Laurence Golding for adding some features to the web demo. - Aran Komatsuzaki for advice with experiment design and writing the blog posts. - Janko Prester for creating the web demo frontend.",
|
22 |
+
"model_explanation_gemini": "A 6-billion-parameter autoregressive language model trained to generate English text by predicting the next token in a sequence."
|
23 |
+
}
|
data/model_data_json/EleutherAI_gpt-neo-1.3B.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "EleutherAI/gpt-neo-1.3B",
|
3 |
+
"downloads": 204403,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"jax",
|
8 |
+
"rust",
|
9 |
+
"safetensors",
|
10 |
+
"gpt_neo",
|
11 |
+
"text-generation",
|
12 |
+
"text generation",
|
13 |
+
"causal-lm",
|
14 |
+
"en",
|
15 |
+
"dataset:EleutherAI/pile",
|
16 |
+
"arxiv:2101.00027",
|
17 |
+
"license:mit",
|
18 |
+
"autotrain_compatible",
|
19 |
+
"endpoints_compatible",
|
20 |
+
"region:us"
|
21 |
+
],
|
22 |
+
"description": "--- language: - en tags: - text generation - pytorch - causal-lm license: mit datasets: - EleutherAI/pile --- # GPT-Neo 1.3B ## Model Description GPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular pre-trained model. ## Training data GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. ## Training procedure This model was trained on the Pile for 380 billion tokens over 362,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss. ## Intended Use and Limitations This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt. ### How to use You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run: ### Limitations and Biases GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ## Eval results ### Linguistic Reasoning | Model and Size | Pile BPB | Pile PPL | Wikitext PPL | Lambada PPL | Lambada Acc | Winogrande | Hellaswag | | ---------------- | ---------- | ---------- | ------------- | ----------- | ----------- | ---------- | ----------- | | **GPT-Neo 1.3B** | **0.7527** | **6.159** | **13.10** | **7.498** | **57.23%** | **55.01%** | **38.66%** | | GPT-2 1.5B | 1.0468 | ----- | 17.48 | 10.634 | 51.21% | 59.40% | 40.03% | | GPT-Neo 2.7B | 0.7165 | 5.646 | 11.39 | 5.626 | 62.22% | 56.50% | 42.73% | | GPT-3 Ada | 0.9631 | ----- | ----- | 9.954 | 51.60% | 52.90% | 35.93% | ### Physical and Scientific Reasoning | Model and Size | MathQA | PubMedQA | Piqa | | ---------------- | ---------- | ---------- | ----------- | | **GPT-Neo 1.3B** | **24.05%** | **54.40%** | **71.11%** | | GPT-2 1.5B | 23.64% | 58.33% | 70.78% | | GPT-Neo 2.7B | 24.72% | 57.54% | 72.14% | | GPT-3 Ada | 24.29% | 52.80% | 68.88% | ### Down-Stream Applications TBD ### BibTeX entry and citation info To cite this model, please use # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 29.44 | | ARC (25-shot) | 31.23 | | HellaSwag (10-shot) | 48.47 | | MMLU (5-shot) | 24.82 | | TruthfulQA (0-shot) | 39.63 | | Winogrande (5-shot) | 56.91 | | GSM8K (5-shot) | 0.45 | | DROP (3-shot) | 4.6 |",
|
23 |
+
"model_explanation_gemini": "Generates English text by predicting the next token in a sequence based on prompts."
|
24 |
+
}
|
data/model_data_json/EleutherAI_gpt-neo-125m.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "EleutherAI/gpt-neo-125m",
|
3 |
+
"downloads": 156287,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"jax",
|
8 |
+
"rust",
|
9 |
+
"safetensors",
|
10 |
+
"gpt_neo",
|
11 |
+
"text-generation",
|
12 |
+
"text generation",
|
13 |
+
"causal-lm",
|
14 |
+
"en",
|
15 |
+
"dataset:EleutherAI/pile",
|
16 |
+
"arxiv:2101.00027",
|
17 |
+
"license:mit",
|
18 |
+
"autotrain_compatible",
|
19 |
+
"endpoints_compatible",
|
20 |
+
"region:us"
|
21 |
+
],
|
22 |
+
"description": "--- language: - en tags: - text generation - pytorch - causal-lm license: mit datasets: - EleutherAI/pile --- # GPT-Neo 125M ## Model Description GPT-Neo 125M is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 125M represents the number of parameters of this particular pre-trained model. ## Training data GPT-Neo 125M was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. ## Training procedure This model was trained on the Pile for 300 billion tokens over 572,300 steps. It was trained as a masked autoregressive language model, using cross-entropy loss. ## Intended Use and Limitations This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt. ### How to use You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run: ### Limitations and Biases GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ## Eval results TBD ### Down-Stream Applications TBD ### BibTeX entry and citation info To cite this model, use # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 25.79 | | ARC (25-shot) | 22.95 | | HellaSwag (10-shot) | 30.26 | | MMLU (5-shot) | 25.97 | | TruthfulQA (0-shot) | 45.58 | | Winogrande (5-shot) | 51.78 | | GSM8K (5-shot) | 0.3 | | DROP (3-shot) | 3.69 |",
|
23 |
+
"model_explanation_gemini": "Generates English text autoregressively based on prompts, trained as a language model on the Pile dataset."
|
24 |
+
}
|
data/model_data_json/EleutherAI_gpt-neox-20b.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_id": "EleutherAI/gpt-neox-20b",
|
3 |
+
"downloads": 346183,
|
4 |
+
"tags": [
|
5 |
+
"transformers",
|
6 |
+
"pytorch",
|
7 |
+
"safetensors",
|
8 |
+
"gpt_neox",
|
9 |
+
"text-generation",
|
10 |
+
"causal-lm",
|
11 |
+
"en",
|
12 |
+
"dataset:EleutherAI/pile",
|
13 |
+
"arxiv:2204.06745",
|
14 |
+
"arxiv:2101.00027",
|
15 |
+
"arxiv:2201.07311",
|
16 |
+
"arxiv:2104.09864",
|
17 |
+
"license:apache-2.0",
|
18 |
+
"autotrain_compatible",
|
19 |
+
"text-generation-inference",
|
20 |
+
"endpoints_compatible",
|
21 |
+
"region:us"
|
22 |
+
],
|
23 |
+
"description": "--- language: - en tags: - pytorch - causal-lm license: apache-2.0 datasets: - EleutherAI/pile --- GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J- 6B. Its training dataset contains a multitude of English-language texts, reflecting the general-purpose nature of this model. See the accompanying paper for details about model architecture (including how it differs from GPT-3), training procedure, and additional evaluations. ### Model details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: GPT-NeoX-20B: An Open-Source Autoregressive Language Model. For details about the training dataset, see the Pile paper, and its data sheet. - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing GPT-NeoX-20B documentation before asking about the model on Discord. For general correspondence: contact@eleuther. ai. <figure style=\"width:30em\"> | Hyperparameter | Value | | ---------------------- | ----------- | | n<sub>parameters</sub> | 20554567680 | | n<sub>layers</sub> | 44 | | d<sub>model</sub> | 6144 | | n<sub>heads</sub> | 64 | | d<sub>head</sub> | 96 | | n<sub>vocab</sub> | 50257 | | Sequence Length | 2048 | | Learning Rate | 0.97 x 10<sup>-5</sup> | | Positional Encoding | Rotary Position Embedding (RoPE) | </figure> ### Uses and limitations #### Intended use GPT-NeoX-20B was developed primarily for research purposes. It learns an inner representation of the English language that can be used to extract features useful for downstream tasks. In addition to scientific uses, you may also further fine-tune and adapt GPT-NeoX-20B for deployment, as long as your use is in accordance with the Apache 2.0 license. This model works with the Transformers Library. If you decide to use pre-trained GPT-NeoX-20B as a basis for your fine-tuned model, please note that you need to conduct your own risk and bias assessment. #### Out-of-scope use GPT-NeoX-20B is **not** intended for deployment as-is. It is not a product and cannot be used for human-facing interactions without supervision. GPT-NeoX-20B has not been fine-tuned for downstream tasks for which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means GPT-NeoX-20B will likely **not** respond to a given prompt the way products such as ChatGPT do. This is because, unlike GPT-NeoX-20B, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “understand” human instructions and dialogue. This model is English-language only, and thus cannot be used for translation or generating text in other languages. #### Limitations and biases The core functionality of GPT-NeoX-20B is to take a string of text and predict the next token. Remember that the statistically most likely next token need not result in the most “accurate” text. Never rely on GPT-NeoX-20B to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. GPT-NeoX-20B may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. We recommend curating the outputs of this model before presenting it to a human reader. Please inform your audience that you are using artificially generated text. #### How to use If you simply want to try out some prompts, check out this playground. GPT-NeoX-20B can be loaded using the functionality: ### Training #### Training dataset The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror. The Pile was **not** deduplicated before being used to train GPT-NeoX-20B. #### Training procedure GPT-NeoX-20B was trained with a batch size of approximately 3.15M tokens (1538 sequences of 2048 tokens each), for a total of 150,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. Additional details about the training procedure are in Section 3 of the accompanying paper. ### Evaluations <figure style=\"width:55em\"> | Model | OpenAI’s LAMBADA | SciQ | PIQA | TriviaQA | ARC (Challenge) | | ------------- | :--------------: | :-----------: | :-----------: | :-----------: | :-------------: | | GPT-J-6B | 0.683 ± 0.006 | 0.910 ± 0.009 | 0.752 ± 0.010 | 0.170 ± 0.004 | 0.340 ± 0.014 | | FairSeq 6.7B | 0.673 ± 0.007 | 0.895 ± 0.010 | 0.762 ± 0.010 | 0.221 ± 0.004 | 0.329 ± 0.014 | | GPT-3 Curie | 0.693 ± 0.006 | 0.918 ± 0.009 | 0.767 ± 0.010 | 0.196 ± 0.004 | 0.334 ± 0.014 | | FairSeq 13B | 0.709 ± 0.006 | 0.910 ± 0.009 | 0.769 ± 0.010 | 0.270 ± 0.004 | 0.345 ± 0.014 | | GPT-NeoX-20B | 0.720 ± 0.006 | 0.928 ± 0.008 | 0.779 ± 0.010 | 0.259 ± 0.004 | 0.380 ± 0.014 | | GPT-3 DaVinci | 0.752 ± 0.006 | 0.949 ± 0.007 | 0.791 ± 0.009 | 0.409 ± 0.005 | 0.435 ± 0.014 | <figcaption>Zero-shot performance on selected natural language tasks.</figcaption> </figure> This is a heavily abridged version of the evaluation results. Appendix D of the GPT-NeoX-20B paper compares more model sizes, and contains additional evaluations, including on: zero and five-shot natural language tasks, zero and five-shot Basic Arithmetic and MATH, and zero-shot Hendrycks tasks. ### BibTeX To cite the GPT-NeoX-20B paper: # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 36.02 | | ARC (25-shot) | 45.73 | | HellaSwag (10-shot) | 73.45 | | MMLU (5-shot) | 25.0 | | TruthfulQA (0-shot) | 31.61 | | Winogrande (5-shot) | 68.9 | | GSM8K (5-shot) | 2.43 | | DROP (3-shot) | 5.04 |",
|
24 |
+
"model_explanation_gemini": "GPT-NeoX-20B is a 20-billion-parameter autoregressive English language model designed for research, generating text by predicting the next token, and is suitable for downstream task fine-tuning under the Apache 2.0 license."
|
25 |
+
}
|