shayan5422 commited on
Commit
21cad66
·
verified ·
1 Parent(s): e09709f

Upload 3710 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. index.faiss +1 -1
  2. index_to_metadata.pkl +2 -2
  3. model_data_json/01-ai_Yi-1.5-34B-Chat.json +7 -1
  4. model_data_json/01-ai_Yi-1.5-6B-Chat.json +7 -1
  5. model_data_json/01-ai_Yi-1.5-9B-Chat.json +7 -1
  6. model_data_json/01-ai_Yi-34B-200K.json +7 -1
  7. model_data_json/01-ai_Yi-6B.json +7 -1
  8. model_data_json/1-800-BAD-CODE_xlm-roberta_punctuation_fullstop_truecase.json +7 -1
  9. model_data_json/1aurent_swin_tiny_patch4_window7_224.CTransPath.json +6 -1
  10. model_data_json/1bitLLM_bitnet_b1_58-large.json +7 -1
  11. model_data_json/3loi_SER-Odyssey-Baseline-WavLM-Multi-Attributes.json +6 -1
  12. model_data_json/51la5_roberta-large-NER.json +7 -1
  13. model_data_json/AI-Growth-Lab_PatentSBERTa.json +7 -1
  14. model_data_json/AI-MO_Kimina-Prover-Preview-Distill-1.5B.json +7 -1
  15. model_data_json/AI-MO_Kimina-Prover-Preview-Distill-7B.json +7 -1
  16. model_data_json/AIDC-AI_Ovis2-4B.json +6 -1
  17. model_data_json/AIDC-AI_Ovis2-8B.json +6 -1
  18. model_data_json/AITeamVN_Vietnamese_Embedding.json +7 -1
  19. model_data_json/AdamCodd_vit-base-nsfw-detector.json +6 -1
  20. model_data_json/AhmedTaha012_finance-ner-v0.0.8-finetuned-ner.json +7 -1
  21. model_data_json/Alibaba-NLP_gme-Qwen2-VL-2B-Instruct.json +7 -1
  22. model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json +0 -0
  23. model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json +0 -0
  24. model_data_json/Alibaba-NLP_gte-base-en-v1.5.json +6 -1
  25. model_data_json/Alibaba-NLP_gte-large-en-v1.5.json +6 -1
  26. model_data_json/Alibaba-NLP_gte-modernbert-base.json +7 -1
  27. model_data_json/Alibaba-NLP_gte-multilingual-base.json +6 -1
  28. model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json +6 -1
  29. model_data_json/Alibaba-NLP_gte-reranker-modernbert-base.json +7 -1
  30. model_data_json/Alpha-VLLM_Lumina-Image-2.0.json +6 -1
  31. model_data_json/Alpha-VLLM_Lumina-mGPT-7B-768.json +7 -1
  32. model_data_json/Alvenir_bert-punct-restoration-da.json +7 -1
  33. model_data_json/Amrrs_wav2vec2-large-xlsr-53-tamil.json +7 -1
  34. model_data_json/Angelakeke_RaTE-NER-Deberta.json +7 -1
  35. model_data_json/Anonymous-Pineapple_roberta-base.json +7 -1
  36. model_data_json/Anzhc_Anzhcs_YOLOs.json +6 -1
  37. model_data_json/Arki05_Grok-1-GGUF.json +6 -1
  38. model_data_json/ArthurZ_Ilama-3.2-1B.json +7 -1
  39. model_data_json/Aryn_deformable-detr-DocLayNet.json +6 -1
  40. model_data_json/AskUI_PTA-1.json +6 -1
  41. model_data_json/AstraMindAI_xtts2-gpt.json +7 -1
  42. model_data_json/AstraMindAI_xttsv2.json +6 -1
  43. model_data_json/AutonLab_MOMENT-1-large.json +6 -1
  44. model_data_json/AutonLab_MOMENT-1-small.json +6 -1
  45. model_data_json/BAAI_Emu3-Chat-hf.json +6 -1
  46. model_data_json/BAAI_Emu3-VisionTokenizer.json +6 -1
  47. model_data_json/BAAI_bge-base-en-v1.5.json +7 -1
  48. model_data_json/BAAI_bge-base-en.json +7 -1
  49. model_data_json/BAAI_bge-base-zh-v1.5.json +7 -1
  50. model_data_json/BAAI_bge-base-zh.json +7 -1
index.faiss CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4b0169bb8310e9b947e7ee3e32ef4e537cca521443386702a996442cd361b13c
3
  size 11378733
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ebb492bfcce3d9b18e87c67817c9ea837ab5e716d83a80d763910176afceb73
3
  size 11378733
index_to_metadata.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5e58b1ff1a29667afe7ae10faa2aa6b7078f92d6700212b6e6a6eb9dd247e5fc
3
- size 2680700
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:916c3f3b72febd784f86af1f42a935d6e640355fa3455dfb27521a7f97c6f149
3
+ size 2804190
model_data_json/01-ai_Yi-1.5-34B-Chat.json CHANGED
@@ -15,5 +15,11 @@
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
- "model_explanation_gemini": "A chat-focused AI model fine-tuned for enhanced coding, math, reasoning, and instruction-following while retaining strong language understanding and comprehension. \n\n**Features**: \n- Upgraded from Yi with 500B-token pretraining and 3M fine-tuning samples \n- Supports context lengths of 4K, 16K, and 32K tokens \n- Outperforms similarly sized open-source models in benchmarks \n\n**Comparison**: \nYi-1.5-34B-C"
 
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
+ "model_explanation_gemini": "A chat-focused AI model fine-tuned for enhanced coding, math, reasoning, and instruction-following while retaining strong language understanding and comprehension. \n\n**Features**: \n- Upgraded from Yi with 500B-token pretraining and 3M fine-tuning samples \n- Supports context lengths of 4K, 16K, and 32K tokens \n- Outperforms similarly sized open-source models in benchmarks \n\n**Comparison**: \nYi-1.5-34B-C",
19
+ "release_year": "2024",
20
+ "parameter_count": "34B",
21
+ "is_fine_tuned": false,
22
+ "category": "Large Language Model",
23
+ "model_family": "LLaMA",
24
+ "api_enhanced": true
25
  }
model_data_json/01-ai_Yi-1.5-6B-Chat.json CHANGED
@@ -15,5 +15,11 @@
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
- "model_explanation_gemini": "A chat-focused language model fine-tuned for enhanced coding, math, reasoning, and instruction-following while retaining strong language understanding and comprehension. \n\n**Features**: \n- Upgraded from Yi with 500B-token pretraining and 3M fine-tuning samples \n- Strengths in coding, math, reasoning, and instruction execution \n- Maintains language understanding, commonsense reasoning, and reading comprehension \n- Supports context lengths of 4K, 16K, or 32"
 
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
+ "model_explanation_gemini": "A chat-focused language model fine-tuned for enhanced coding, math, reasoning, and instruction-following while retaining strong language understanding and comprehension. \n\n**Features**: \n- Upgraded from Yi with 500B-token pretraining and 3M fine-tuning samples \n- Strengths in coding, math, reasoning, and instruction execution \n- Maintains language understanding, commonsense reasoning, and reading comprehension \n- Supports context lengths of 4K, 16K, or 32",
19
+ "release_year": "2024",
20
+ "parameter_count": "6B",
21
+ "is_fine_tuned": false,
22
+ "category": "Large Language Model",
23
+ "model_family": "LLaMA",
24
+ "api_enhanced": true
25
  }
model_data_json/01-ai_Yi-1.5-9B-Chat.json CHANGED
@@ -15,5 +15,11 @@
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
- "model_explanation_gemini": "\"Yi-1.5-9B-Chat is a fine-tuned chat model optimized for coding, math, reasoning, and instruction-following while maintaining strong language understanding and comprehension.\"\n\n**Features:** \n- Upgraded version of Yi with 500B-token pretraining and 3M fine-tuning samples \n- Enhanced performance in coding, math, reasoning, and instruction-following \n- Maintains strong language understanding and commonsense reasoning \n- Available in multiple context lengths ("
 
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 --- <div align=\"center\"> <picture> <img src=\" width=\"150px\"> </picture> </div> <p align=\"center\"> <a href=\" GitHub</a> • <a href=\" Discord</a> • <a href=\" Twitter</a> • <a href=\" WeChat</a> <br/> <a href=\" Paper</a> • <a href=\" Tech Blog</a> • <a href=\" FAQ</a> • <a href=\" Learning Hub</a> </p> # Intro Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension. <div align=\"center\"> Model | Context Length | Pre-trained Tokens | :------------: | :------------: | :------------: | | Yi-1.5 | 4K, 16K, 32K | 3.6T </div> # Models - Chat models <div align=\"center\"> | Name | Download | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| | Yi-1.5-34B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-Chat-16K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> - Base models <div align=\"center\"> | Name | Download | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Yi-1.5-34B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-34B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-9B-32K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | | Yi-1.5-6B | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | </div> # Benchmarks - Chat models Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks. !image/png Yi-1.5-9B-Chat is the top performer among similarly sized open-source models. !image/png - Base models Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks. !image/png Yi-1.5-9B is the top performer among similarly sized open-source models. !image/png # Quick Start For getting up and running with Yi-1.5 models quickly, see README.",
18
+ "model_explanation_gemini": "\"Yi-1.5-9B-Chat is a fine-tuned chat model optimized for coding, math, reasoning, and instruction-following while maintaining strong language understanding and comprehension.\"\n\n**Features:** \n- Upgraded version of Yi with 500B-token pretraining and 3M fine-tuning samples \n- Enhanced performance in coding, math, reasoning, and instruction-following \n- Maintains strong language understanding and commonsense reasoning \n- Available in multiple context lengths (",
19
+ "release_year": "2024",
20
+ "parameter_count": "9B",
21
+ "is_fine_tuned": false,
22
+ "category": "Large Language Model",
23
+ "model_family": "LLaMA",
24
+ "api_enhanced": true
25
  }
model_data_json/01-ai_Yi-34B-200K.json CHANGED
@@ -17,5 +17,11 @@
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 widget: - example_title: \"Yi-34B-Chat\" text: \"hi\" output: text: \" Hello! How can I assist you today?\" - example_title: \"Yi-34B\" text: \"There's a place where time stands still. A place of breath taking wonder, but also\" output: text: \" an eerie sense that something is just not right…\\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?\" pipeline_tag: text-generation --- <div align=\"center\"> <picture> <source media=\"(prefers-color-scheme: dark)\" srcset=\" width=\"200px\"> <source media=\"(prefers-color-scheme: light)\" srcset=\" width=\"200px\"> <img alt=\"specify theme context for images\" src=\" </picture> </br> </br> <div style=\"display: inline-block;\"> <a href=\" <img src=\" </a> </div> <div style=\"display: inline-block;\"> <a href=\"mailto:[email protected]\"> <img src=\" </a> </div> </div> <div align=\"center\"> <h3 align=\"center\">Building the Next Generation of Open-Source and Bilingual LLMs</h3> </div> <p align=\"center\"> 🤗 <a href=\" target=\"_blank\">Hugging Face</a> • 🤖 <a href=\" target=\"_blank\">ModelScope</a> • ✡️ <a href=\" target=\"_blank\">WiseModel</a> </p> <p align=\"center\"> 👩‍🚀 Ask questions or discuss ideas on <a href=\" target=\"_blank\"> GitHub </a> </p> <p align=\"center\"> 👋 Join us on <a href=\" target=\"_blank\"> 👾 Discord </a> or <a href=\"有官方的微信群嘛 · Issue #43 · 01-ai/Yi\" target=\"_blank\"> 💬 WeChat </a> </p> <p align=\"center\"> 📝 Check out <a href=\" Yi Tech Report </a> </p> <p align=\"center\"> 📚 Grow at <a href=\"#learning-hub\"> Yi Learning Hub </a> </p> <!-- DO NOT REMOVE ME --> <hr> <details open> <summary></b>📕 Table of Contents</b></summary> - What is Yi? - Introduction - Models - Chat models - Base models - Model info - News - How to use Yi? - Quick start - Choose your path - pip - docker - llama.cpp - conda-lock - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub - Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Base model performance - Chat model performance - Tech report - Citation - Who can use Yi? - Misc. - Acknowledgements - Disclaimer - License </details> <hr> # What is Yi? ## Introduction - 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example, - Yi-34B-Chat model **landed in second place (following GPT-4 Turbo)**, outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024). - Yi-34B model **ranked first among all existing open-source models** (such as Falcon-180B, Llama-70B, Claude) in **both English and Chinese** on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023). - 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem. <details style=\"display: inline;\"><summary> If you're interested in Yi's adoption of Llama architecture and license usage policy, see <span style=\"color: green;\">Yi's relation with Llama.</span> ⬇️</summary> <ul> <br> > 💡 TL;DR > > The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama. - Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018. - Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi. - Thanks to the Transformer and Llama architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems. - However, the Yi series models are NOT derivatives of Llama, as they do not use Llama's weights. - As Llama's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure. - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing Llama on the Alpaca Leaderboard in Dec 2023. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## News <details> <summary>🔥 <b>2024-07-29</b>: The <a href=\" Cookbook 1.0 </a> is released, featuring tutorials and examples in both Chinese and English.</summary> </details> <details> <summary>🎯 <b>2024-05-13</b>: The <a href=\" series models </a> are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.</summary> </details> <details> <summary>🎯 <b>2024-03-16</b>: The <code>Yi-9B-200K</code> is open-sourced and available to the public.</summary> </details> <details> <summary>🎯 <b>2024-03-08</b>: <a href=\" Tech Report</a> is published! </summary> </details> <details open> <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary> <br> In the \"Needle-in-a-Haystack\" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance. </details> <details open> <summary>🎯 <b>2024-03-06</b>: The <code>Yi-9B</code> is open-sourced and available to the public.</summary> <br> <code>Yi-9B</code> stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. </details> <details open> <summary>🎯 <b>2024-01-23</b>: The Yi-VL models, <code><a href=\" and <code><a href=\" are open-sourced and available to the public.</summary> <br> <code><a href=\" has ranked <strong>first</strong> among all existing open-source models in the latest benchmarks, including <a href=\" and <a href=\" (based on data available up to January 2024).</li> </details> <details> <summary>🎯 <b>2023-11-23</b>: <a href=\"#chat-models\">Chat models</a> are open-sourced and available to the public.</summary> <br>This release contains two chat models based on previously released base models, two 8-bit models quantized by GPTQ, and two 4-bit models quantized by AWQ. - - - - - - You can try some of them interactively at: - Hugging Face - Replicate </details> <details> <summary>🔔 <b>2023-11-23</b>: The Yi Series Models Community License Agreement is updated to <a href=\" </details> <details> <summary>🔥 <b>2023-11-08</b>: Invited test of Yi-34B chat model.</summary> <br>Application form: - English - Chinese </details> <details> <summary>🎯 <b>2023-11-05</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B-200K</code> and <code>Yi-34B-200K</code>, are open-sourced and available to the public.</summary> <br>This release contains two base models with the same parameter sizes as the previous release, except that the context window is extended to 200K. </details> <details> <summary>🎯 <b>2023-11-02</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B</code> and <code>Yi-34B</code>, are open-sourced and available to the public.</summary> <br>The first public release contains two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B. Both of them are trained with 4K sequence length and can be extended to 32K during inference time. </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Models Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements. If you want to deploy Yi models, make sure you meet the software and hardware requirements. ### Chat models | Model | Download | |---|---| |Yi-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub> ### Base models | Model | Download | |---|---| |Yi-34B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-200K|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. <br> - If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run to download the weight. </sup></sub> ### Model info - For chat and base models <table> <thead> <tr> <th>Model</th> <th>Intro</th> <th>Default context window</th> <th>Pretrained tokens</th> <th>Training Data Date</th> </tr> </thead> <tbody><tr> <td>6B series models</td> <td>They are suitable for personal and academic use.</td> <td rowspan=\"3\">4K</td> <td>3T</td> <td rowspan=\"3\">Up to June 2023</td> </tr> <tr> <td>9B series models</td> <td>It is the best at coding and math in the Yi series models.</td> <td>Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens.</td> </tr> <tr> <td>34B series models</td> <td>They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It&#39;s a cost-effective solution that&#39;s affordable and equipped with emergent ability.</td> <td>3T</td> </tr> </tbody></table> - For chat models <details style=\"display: inline;\"><summary>For chat model limitations, see the explanations below. ⬇️</summary> <ul> <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training. <br>However, this higher diversity might amplify certain existing issues, including: <li>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.</li> <li>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.</li> <li>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.</li> <li>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.</li> </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # How to use Yi? - Quick start - Choose your path - pip - docker - conda-lock - llama.cpp - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub ## Quick start > **💡 Tip**: If you want to get started with the Yi model and explore different methods for inference, check out the Yi Cookbook. ### Choose your path Select one of the following paths to begin your journey with Yi! !Quick start - Choose your path #### 🎯 Deploy Yi locally If you prefer to deploy Yi models locally, - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods: - pip - Docker - conda-lock - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use llama.cpp. #### 🎯 Not to deploy Yi locally If you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options. ##### 🙋‍♀️ Run Yi with APIs If you want to explore more features of Yi, you can adopt one of these methods: - Yi APIs (Yi official) - Early access has been granted to some applicants. Stay tuned for the next round of access! - Yi APIs (Replicate) ##### 🙋‍♀️ Run Yi in playground If you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options: - Yi-34B-Chat-Playground (Yi official) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). - Yi-34B-Chat-Playground (Replicate) ##### 🙋‍♀️ Chat with Yi If you want to chat with Yi, you can use one of these online services, which offer a similar user experience: - Yi-34B-Chat (Yi official on Hugging Face) - No registration is required. - Yi-34B-Chat (Yi official beta) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - pip This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference. #### Step 0: Prerequisites - Make sure Python 3.10 or a later version is installed. - If you want to run other Yi models, see software and hardware requirements. #### Step 1: Prepare your environment To set up the environment and install the required packages, execute the following command. #### Step 2: Download the Yi model You can download the weights and tokenizer of Yi models from the following sources: - Hugging Face - ModelScope - WiseModel #### Step 3: Perform inference You can perform inference with Yi chat or base models as below. ##### Perform inference with Yi chat model 1. Create a file named and copy the following content to it. 2. Run . Then you can see an output similar to the one below. 🥳 ##### Perform inference with Yi base model - Yi-34B The steps are similar to pip - Perform inference with Yi chat model. You can use the existing file []( Then you can see an output similar to the one below. 🥳 <details> <summary>Output. ⬇️ </summary> <br> **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry, **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up... </details> - Yi-9B Input Output <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - Docker <details> <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️</summary> <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> or <strong>4*4090</strong> locally and then performing inference. <h4>Step 0: Prerequisites</h4> <p>Make sure you've installed <a href=\" and <a href=\" <h4> Step 1: Start Docker </h4> <pre><code>docker run -it --gpus all \\ -v &lt;your-model-path&gt;: /models ghcr.io/01-ai/yi:latest </code></pre> <p>Alternatively, you can pull the Yi Docker image from <code>registry.lingyiwanwu.com/ci/01-ai/yi:latest</code>.</p> <h4>Step 2: Perform inference</h4> <p>You can perform inference with Yi chat or base models as below.</p> <h5>Perform inference with Yi chat model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-chat-model\">pip - Perform inference with Yi chat model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>model_path = '&lt;your-model-mount-path&gt;'</code> instead of <code>model_path = '&lt;your-model-path&gt;'</code>.</p> <h5>Perform inference with Yi base model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-base-model\">pip - Perform inference with Yi base model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>--model &lt;your-model-mount-path&gt;'</code> instead of <code>model &lt;your-model-path&gt;</code>.</p> </details> ### Quick start - conda-lock <details> <summary>You can use <code><a href=\" to generate fully reproducible lock files for conda environments. ⬇️</summary> <br> You can refer to <a href=\" for the exact versions of the dependencies. Additionally, you can utilize <code><a href=\" for installing these dependencies. <br> To install the dependencies, follow these steps: 1. Install micromamba by following the instructions available <a href=\" 2. Execute <code>micromamba install -y -n yi -f conda-lock.yml</code> to create a conda environment named <code>yi</code> and install the necessary dependencies. </details> ### Quick start - llama.cpp <a href=\" following tutorial </a> will guide you through every step of running a quantized model (<a href=\" locally and then performing inference. <details> <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️</summary> <br><a href=\" tutorial</a> guides you through every step of running a quantized model (<a href=\" locally and then performing inference.</p> - Step 0: Prerequisites - Step 1: Download llama.cpp - Step 2: Download Yi model - Step 3: Perform inference #### Step 0: Prerequisites - This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip. - Make sure []( is installed on your machine. #### Step 1: Download To clone the []( repository, run the following command. #### Step 2: Download Yi model 2.1 To clone XeIaso/yi-chat-6B-GGUF with just pointers, run the following command. 2.2 To download a quantized Yi model (yi-chat-6b.Q2_K.gguf), run the following command. #### Step 3: Perform inference To perform inference with the Yi model, you can use one of the following methods. - Method 1: Perform inference in terminal - Method 2: Perform inference in web ##### Method 1: Perform inference in terminal To compile using 4 threads and then conduct inference, navigate to the directory, and run the following command. > ##### Tips > > - Replace with the actual path of your model. > > - By default, the model operates in completion mode. > > - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run to check detailed descriptions and usage. Now you have successfully asked a question to the Yi model and got an answer! 🥳 ##### Method 2: Perform inference in web 1. To initialize a lightweight and swift chatbot, run the following command. Then you can get an output like this: 2. To access the chatbot interface, open your web browser and enter into the address bar. !Yi model chatbot interface - llama.cpp 3. Enter a question, such as \"How do you feed your pet fox? Please answer this question in 6 simple steps\" into the prompt window, and you will receive a corresponding answer. !Ask a question to Yi model - llama.cpp </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Web demo You can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario). Step 1: Prepare your environment. Step 2: Download the Yi model. Step 3. To start a web service locally, run the following command. You can access the web UI by entering the address provided in the console into your browser. !Quick start - web demo <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Fine-tuning Once finished, you can compare the finetuned model and the base model with the following command: <details style=\"display: inline;\"><summary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ </summary> <ul> ### Finetune code for Yi 6B and 34B #### Preparation ##### From Image By default, we use a small dataset from BAAI/COIG to finetune the base model. You can also prepare your customized dataset in the following format: And then mount them in the container to replace the default ones: ##### From Local Server Make sure you have conda. If not, use Then, create a conda env: #### Hardware Setup For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended. For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh). A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB. #### Quick Start Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like: Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-static. has example datasets, which are modified from BAAI/COIG into the scripts folder, copy and paste the script, and run. For example: For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes. For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient. #### Evaluation Then you'll see the answer from both the base model and the finetuned model. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quantization #### GPT-Q Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### GPT-Q quantization GPT-Q is a PTQ (Post-Training Quantization) method. It saves memory and provides potential speedups while retaining the accuracy of the model. Yi models can be GPT-Q quantized without a lot of efforts. We provide a step-by-step tutorial below. To run GPT-Q, we will use AutoGPTQ and exllama. And the huggingface transformers has integrated optimum and auto-gptq to perform GPTQ quantization on language models. ##### Do Quantization The script is provided for you to perform GPT-Q quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> #### AWQ Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### AWQ quantization AWQ is a PTQ (Post-Training Quantization) method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs. Yi models can be AWQ quantized without a lot of efforts. We provide a step-by-step tutorial below. To run AWQ, we will use AutoAWQ. ##### Do Quantization The script is provided for you to perform AWQ quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Deployment If you want to deploy Yi models, make sure you meet the software and hardware requirements. #### Software requirements Before using Yi quantized models, make sure you've installed the correct software listed below. | Model | Software |---|--- Yi 4-bit quantized models | AWQ and CUDA Yi 8-bit quantized models | GPTQ and CUDA #### Hardware requirements Before deploying Yi in your environment, make sure your hardware meets the following requirements. ##### Chat models | Model | Minimum VRAM | Recommended GPU Example | |:----------------------|:--------------|:-------------------------------------:| | Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)<br> 1 x RTX 4060 (8 GB) | | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB) <br> 1 x RTX 4060 (8 GB) | | Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)<br> 1 x A800 (80GB) | | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) <br> 1 x A100 (40 GB) | | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB) <br> 2 x RTX 4090 (24 GB)<br> 1 x A800 (40 GB) | Below are detailed minimum VRAM requirements under different batch use cases. | Model | batch=1 | batch=4 | batch=16 | batch=32 | | ----------------------- | ------- | ------- | -------- | -------- | | Yi-6B-Chat | 12 GB | 13 GB | 15 GB | 18 GB | | Yi-6B-Chat-4bits | 4 GB | 5 GB | 7 GB | 10 GB | | Yi-6B-Chat-8bits | 7 GB | 8 GB | 10 GB | 14 GB | | Yi-34B-Chat | 65 GB | 68 GB | 76 GB | > 80 GB | | Yi-34B-Chat-4bits | 19 GB | 20 GB | 30 GB | 40 GB | | Yi-34B-Chat-8bits | 35 GB | 37 GB | 46 GB | 58 GB | ##### Base models | Model | Minimum VRAM | Recommended GPU Example | |----------------------|--------------|:-------------------------------------:| | Yi-6B | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-200K | 50 GB | 1 x A800 (80 GB) | | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) | | Yi-34B | 72 GB | 4 x RTX 4090 (24 GB) <br> 1 x A800 (80 GB) | | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) | <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### FAQ <details> <summary> If you have any questions while using the Yi series models, the answers provided below could serve as a helpful reference for you. ⬇️</summary> <br> #### 💡Fine-tuning - <strong>Base model or Chat model - which to fine-tune?</strong> <br>The choice of pre-trained language model for fine-tuning hinges on the computational resources you have at your disposal and the particular demands of your task. - If you are working with a substantial volume of fine-tuning data (say, over 10,000 samples), the Base model could be your go-to choice. - On the other hand, if your fine-tuning data is not quite as extensive, opting for the Chat model might be a more fitting choice. - It is generally advisable to fine-tune both the Base and Chat models, compare their performance, and then pick the model that best aligns with your specific requirements. - <strong>Yi-34B versus Yi-34B-Chat for full-scale fine-tuning - what is the difference?</strong> <br> The key distinction between full-scale fine-tuning on and comes down to the fine-tuning approach and outcomes. - Yi-34B-Chat employs a Special Fine-Tuning (SFT) method, resulting in responses that mirror human conversation style more closely. - The Base model's fine-tuning is more versatile, with a relatively high performance potential. - If you are confident in the quality of your data, fine-tuning with could be your go-to. - If you are aiming for model-generated responses that better mimic human conversational style, or if you have doubts about your data quality, might be your best bet. #### 💡Quantization - <strong>Quantized model versus original model - what is the performance gap?</strong> - The performance variance is largely contingent on the quantization method employed and the specific use cases of these models. For instance, when it comes to models provided by the AWQ official, from a Benchmark standpoint, quantization might result in a minor performance drop of a few percentage points. - Subjectively speaking, in situations like logical reasoning, even a 1% performance shift could impact the accuracy of the output results. #### 💡General - <strong>Where can I source fine-tuning question answering datasets?</strong> - You can find fine-tuning question answering datasets on platforms like Hugging Face, with datasets like m-a-p/COIG-CQIA readily available. - Additionally, Github offers fine-tuning frameworks, such as hiyouga/LLaMA-Factory, which integrates pre-made datasets. - <strong>What is the GPU memory requirement for fine-tuning Yi-34B FP16?</strong> <br> The GPU memory needed for fine-tuning 34B FP16 hinges on the specific fine-tuning method employed. For full parameter fine-tuning, you'll need 8 GPUs each with 80 GB; however, more economical solutions like Lora require less. For more details, check out hiyouga/LLaMA-Factory. Also, consider using BF16 instead of FP16 for fine-tuning to optimize performance. - <strong>Are there any third-party platforms that support chat functionality for the Yi-34b-200k model?</strong> <br> If you're looking for third-party Chats, options include fireworks.ai. </details> ### Learning hub <details> <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️</summary> <br> Welcome to the Yi learning hub! Whether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more. The content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions! At the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below. With all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳 #### Tutorials ##### Blog tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | 使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG 应用(三):AI 电影推荐 | 2024-05-20 | 苏洋 | | 使用autodl服务器,在A40显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度18 words-s | 2024-05-20 | fly-iot | | Yi-VL 最佳实践 | 2024-05-20 | ModelScope | | 一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型 | 2024-05-13 | Second State | | 零一万物开源Yi-1.5系列大模型 | 2024-05-13 | 刘聪 | | 零一万物Yi-1.5系列模型发布并开源! 34B-9B-6B 多尺寸,魔搭社区推理微调最佳实践教程来啦! | 2024-05-13 | ModelScope | | Yi-34B 本地部署简单测试 | 2024-05-13 | 漆妮妮 | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(上) | 2024-05-13 | Words worth | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(下篇) | 2024-05-13 | Words worth | | Ollama新增两个命令,开始支持零一万物Yi-1.5系列模型 | 2024-05-13 | AI工程师笔记 | | 使用零一万物 200K 模型和 Dify 快速搭建模型应用 | 2024-05-13 | 苏洋 | | (持更) 零一万物模型折腾笔记:社区 Yi-34B 微调模型使用 | 2024-05-13 | 苏洋 | | Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探 | 2024-05-11 | 江湖评谈 | | 技术布道 Vue及Python调用零一万物模型和Prompt模板(通过百度千帆大模型平台) | 2024-05-11 | MumuLab | | 多模态大模型Yi-VL-plus体验 效果很棒 | 2024-04-27 | 大家好我是爱因 | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度23 words-s | 2024-04-27 | fly-iot | | Getting Started with Yi-1.5-9B-Chat | 2024-04-27 | Second State | | 基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记 | 2024-04-24 | 正经人王同学 | | 【AI开发:语言】一、Yi-34B超大模型本地部署CPU和GPU版 | 2024-04-21 | My的梦想已实现 | | 【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words-s,vllm要求算力在7以上的显卡就可以 | 2024-03-22 | fly-iot | | 零一万物大模型部署+微调总结 | 2024-03-22 | v_wus | | 零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案 | 2024-03-02 | 郝铠锋 | | Yi-34B微调训练 | 2024-03-02 | lsjlnd | | 实测零一万物Yi-VL多模态语言模型:能准确“识图吃瓜” | 2024-02-02 | 苏洋 | | 零一万物开源Yi-VL多模态大模型,魔搭社区推理&微调最佳实践来啦! | 2024-01-26 | ModelScope | | 单卡 3 小时训练 Yi-6B 大模型 Agent:基于 Llama Factory 实战 | 2024-01-22 | 郑耀威 | | 零一科技Yi-34B Chat大模型环境搭建&推理 | 2024-01-15 | 要养家的程序员 | | 基于LLaMA Factory,单卡3小时训练专属大模型 Agent | 2024-01-15 | 机器学习社区 | | 双卡 3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录 | 2024-01-02 | 漆妮妮 | | 【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型(baichuan2-13b、InternLM-20b、Yi-34b) | 2024-01-02 | aq_Seabiscuit | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | 零一万物模型官方 Yi-34B 模型本地离线运行部署使用笔记(物理机和docker两种部署方式),200K 超长文本内容,34B 干翻一众 70B 模型,打榜分数那么高,这模型到底行不行? | 2023-12-28 | 代码讲故事 | | LLM - 大模型速递之 Yi-34B 入门与 LoRA 微调 | 2023-12-18 | BIT_666 | | 通过vllm框架进行大模型推理 | 2023-12-18 | 土山炮 | | CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案 | 2023-12-12 | 苏洋 | | 零一万物模型折腾笔记:官方 Yi-34B 模型基础使用 | 2023-12-10 | 苏洋 | | Running Yi-34B-Chat locally using LlamaEdge | 2023-11-30 | Second State | | 本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存 | 2023-11-26 | 苏洋 | ##### GitHub Project | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------- | | yi-openai-proxy | 2024-05-11 | 苏洋 | | 基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集 | 2024-04-29 | 正经人王同学 | | 基于视频网站和零一万物大模型构建大语言模型高质量训练数据集 | 2024-04-25 | 正经人王同学 | | 基于零一万物yi-34b-chat-200k输入任意文章地址,点击按钮即可生成无广告或推广内容的简要笔记,并生成分享图给好友 | 2024-04-24 | 正经人王同学 | | Food-GPT-Yi-model | 2024-04-21 | Hubert S | ##### Video tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | Run dolphin-2.2-yi-34b on IoT Devices | 2023-11-30 | Second State | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | Install Yi 34B Locally - Chinese English Bilingual LLM | 2023-11-05 | Fahd Mirza | | Dolphin Yi 34b - Brand New Foundational Model TESTED | 2023-11-27 | Matthew Berman | | Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来 | 2024-01-28 | 漆妮妮 | | 4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型 | 2024-05-14 | titan909 | | Yi-1.5: True Apache 2.0 Competitor to LLAMA-3 | 2024-05-13 | Prompt Engineering | | Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks | 2024-05-13 | Fahd Mirza | | how to install Ollama and run Yi 6B | 2024-05-13 | Ridaa Davids | | 地表最强混合智能AI助手:llama3_70B+Yi_34B+Qwen1.5_110B | 2024-05-04 | 朱扎特 | | ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答 | 2024-05-03 | 朱扎特 | | 基于Yi-34B的领域知识问答项目演示 | 2024-05-02 | 朱扎特 | | 使用RTX4090+GaLore算法 全参微调Yi-6B大模型 | 2024-03-24 | 小工蚂创始人 | | 无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行 | 2024-03-20 | 刘悦的技术博客 | | 无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲 | 2024-03-16 | 刘悦的技术博客 | | 量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署 | 2024-03-05 | 白鸽巢 | | Yi-VL-34B(5):使用3个3090显卡24G版本,运行Yi-VL-34B模型,支持命令行和web界面方式,理解图片的内容转换成文字 | 2024-02-27 | fly-iot | | Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏 | 2024-02-25 | 魚蟲蟲 | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2 | 2024-02-23 | 魚蟲蟲 | | 【wails】(2):使用go-llama.cpp 运行 yi-01-6b大模型,使用本地CPU运行,速度还可以,等待下一版本更新 | 2024-02-20 | fly-iot | | 【xinference】(6):在autodl上,使用xinference部署yi-vl-chat和qwen-vl-chat模型,可以使用openai调用成功 | 2024-02-06 | fly-iot | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1 | 2024-02-05 | 魚蟲蟲 | | 2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南 | 2024-01-30 | 小饭护法要转码 | | Best Story Writing AI Model - Install Yi 6B 200K Locally on Windows | 2024-01-22 | Fahd Mirza | | Mac 本地运行大语言模型方法与常见问题指南(Yi 34B 模型+32 GB 内存测试) | 2024-01-21 | 小吴苹果机器人 | | 【Dify知识库】(11):Dify0.4.9改造支持MySQL,成功接入yi-6b 做对话,本地使用fastchat启动,占8G显存,完成知识库配置 | 2024-01-21 | fly-iot | | 这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥 | 2024-01-20 | 晓漫吧 | | 大模型推理 NvLink 桥接器有用吗|双卡 A6000 测试一下 | 2024-01-17 | 漆妮妮 | | 大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能 | 2024-01-15 | 漆妮妮 | | C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来 | 2024-01-11 | 漆妮妮 | | 双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录 | 2024-01-01 | 漆妮妮 | | 手把手教学!使用 vLLM 快速部署 Yi-34B-Chat | 2023-12-26 | 白鸽巢 | | 如何训练企业自己的大语言模型?Yi-6B LORA微调演示 #小工蚁 | 2023-12-21 | 小工蚂创始人 | | Yi-34B(4):使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words/s | 2023-12-02 | fly-iot | | 使用autodl服务器,RTX 3090 * 3 显卡上运行, Yi-34B-Chat模型,显存占用60G | 2023-12-01 | fly-iot | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,用vllm优化,增加 --num-gpu 2,速度23 words/s | 2023-12-01 | fly-iot | | Yi大模型一键本地部署 技术小白玩转AI | 2023-12-01 | 技术小白玩转AI | | 01.AI's Yi-6B: Overview and Fine-Tuning | 2023-11-28 | AI Makerspace | | Yi 34B Chat LLM outperforms Llama 70B | 2023-11-27 | DLExplorer | | How to run open source models on mac Yi 34b on m3 Max | 2023-11-26 | TECHNO PREMIUM | | Yi-34B - 200K - The BEST & NEW CONTEXT WINDOW KING | 2023-11-24 | Prompt Engineering | | Yi 34B : The Rise of Powerful Mid-Sized Models - Base,200k & Chat | 2023-11-24 | Sam Witteveen | | 在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b(还可作为私有OpenAI API服务器) | 2023-11-15 | Second State | | Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server) | 2023-11-14 | Second State | | How to Install Yi 34B 200K Llamafied on Windows Laptop | 2023-11-11 | Fahd Mirza | </details> # Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Chat model performance - Base model performance - Yi-34B and Yi-34B-200K - Yi-9B ## Ecosystem Yi has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity. - Upstream - Downstream - Serving - Quantization - Fine-tuning - API ### Upstream The Yi series models follow the same model architecture as Llama. By choosing Yi, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency. For example, the Yi series models are saved in the format of the Llama model. You can directly use and to load the model. For more information, see Use the chat model. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Downstream > 💡 Tip > > - Feel free to create a PR and share the fantastic work you've built using the Yi series models. > > - To help others quickly understand your work, it is recommended to use the format of . #### Serving If you want to get up with Yi in a few minutes, you can use the following services built upon Yi. - Yi-34B-Chat: you can chat with Yi using one of the following platforms: - Yi-34B-Chat | Hugging Face - Yi-34B-Chat | Yi Platform: **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in English or Chinese) and experience it firsthand! - Yi-6B-Chat (Replicate): you can use this model with more options by setting additional parameters and calling APIs. - ScaleLLM: you can use this service to run Yi models locally with added flexibility and customization. #### Quantization If you have limited computational capabilities, you can use Yi's quantized models as follows. These quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage. - TheBloke/Yi-34B-GPTQ - TheBloke/Yi-34B-GGUF - TheBloke/Yi-34B-AWQ #### Fine-tuning If you're seeking to explore the diverse capabilities within Yi's thriving family, you can delve into Yi's fine-tuned models as below. - TheBloke Models: this site hosts numerous fine-tuned models derived from various LLMs including Yi. This is not an exhaustive list for Yi, but to name a few sorted on downloads: - TheBloke/dolphin-2_2-yi-34b-AWQ - TheBloke/Yi-34B-Chat-AWQ - TheBloke/Yi-34B-Chat-GPTQ - SUSTech/SUS-Chat-34B: this model ranked first among all models below 70B and outperformed the twice larger deepseek-llm-67b-chat. You can check the result on the Open LLM Leaderboard. - OrionStarAI/OrionStar-Yi-34B-Chat-Llama: this model excelled beyond other models (such as GPT-4, Qwen-14B-Chat, Baichuan2-13B-Chat) in C-Eval and CMMLU evaluations on the OpenCompass LLM Leaderboard. - NousResearch/Nous-Capybara-34B: this model is trained with 200K context length and 3 epochs on the Capybara dataset. #### API - amazing-openai-api: this tool converts Yi model APIs into the OpenAI API format out of the box. - LlamaEdge: this tool builds an OpenAI-compatible API server for Yi-34B-Chat using a portable Wasm (WebAssembly) file, powered by Rust. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Tech report For detailed capabilities of the Yi series model, see Yi: Open Foundation Models by 01.AI. ### Citation ## Benchmarks - Chat model performance - Base model performance ### Chat model performance Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more. !Chat model performance <details> <summary> Evaluation methods and challenges. ⬇️ </summary> - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed. - **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. - **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results. <strong>*</strong>: C-Eval results are evaluated on the validation datasets </details> ### Base model performance #### Yi-34B and Yi-34B-200K The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more. !Base model performance <details> <summary> Evaluation methods. ⬇️</summary> - **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass. - **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences. - **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content. - **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline. - **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension. - **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category \"Math & Code\". - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated. </details> #### Yi-9B Yi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. !Yi-9B benchmark - details - In terms of **overall** ability (Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - overall - In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - code - In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - math - In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - text <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Who can use Yi? Everyone! 🙌 ✅ The code and weights of the Yi series models are distributed under the Apache 2.0 license, which means the Yi series models are free for personal usage, academic purposes, and commercial use. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Misc. ### Acknowledgments A heartfelt thank you to each of you who have made contributions to the Yi community! You have helped Yi not just a project, but a vibrant, growing home for innovation. ![yi contributors]( <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Disclaimer We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### License The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license. If you create derivative works based on this model, please include the following attribution in your derivative works: This work is a derivative of [The Yi Series Model You Base On] by 01.AI, used under the Apache 2.0 License. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p>",
20
- "model_explanation_gemini": "A bilingual (English/Chinese) open-source large language model optimized for text generation, ranking highly in benchmarks for language understanding and reasoning.\n\n**Features:** \n- Text generation capability \n- Bilingual (English/Chinese) support \n- Strong performance in language understanding, reasoning, and comprehension \n- Open-source (Apache 2.0 license) \n\n**Comparison:** \nOutperforms other open-source models like Falcon-180B and Llama-70B in benchmarks, ranking second only to GPT"
 
 
 
 
 
 
21
  }
 
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 widget: - example_title: \"Yi-34B-Chat\" text: \"hi\" output: text: \" Hello! How can I assist you today?\" - example_title: \"Yi-34B\" text: \"There's a place where time stands still. A place of breath taking wonder, but also\" output: text: \" an eerie sense that something is just not right…\\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?\" pipeline_tag: text-generation --- <div align=\"center\"> <picture> <source media=\"(prefers-color-scheme: dark)\" srcset=\" width=\"200px\"> <source media=\"(prefers-color-scheme: light)\" srcset=\" width=\"200px\"> <img alt=\"specify theme context for images\" src=\" </picture> </br> </br> <div style=\"display: inline-block;\"> <a href=\" <img src=\" </a> </div> <div style=\"display: inline-block;\"> <a href=\"mailto:[email protected]\"> <img src=\" </a> </div> </div> <div align=\"center\"> <h3 align=\"center\">Building the Next Generation of Open-Source and Bilingual LLMs</h3> </div> <p align=\"center\"> 🤗 <a href=\" target=\"_blank\">Hugging Face</a> • 🤖 <a href=\" target=\"_blank\">ModelScope</a> • ✡️ <a href=\" target=\"_blank\">WiseModel</a> </p> <p align=\"center\"> 👩‍🚀 Ask questions or discuss ideas on <a href=\" target=\"_blank\"> GitHub </a> </p> <p align=\"center\"> 👋 Join us on <a href=\" target=\"_blank\"> 👾 Discord </a> or <a href=\"有官方的微信群嘛 · Issue #43 · 01-ai/Yi\" target=\"_blank\"> 💬 WeChat </a> </p> <p align=\"center\"> 📝 Check out <a href=\" Yi Tech Report </a> </p> <p align=\"center\"> 📚 Grow at <a href=\"#learning-hub\"> Yi Learning Hub </a> </p> <!-- DO NOT REMOVE ME --> <hr> <details open> <summary></b>📕 Table of Contents</b></summary> - What is Yi? - Introduction - Models - Chat models - Base models - Model info - News - How to use Yi? - Quick start - Choose your path - pip - docker - llama.cpp - conda-lock - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub - Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Base model performance - Chat model performance - Tech report - Citation - Who can use Yi? - Misc. - Acknowledgements - Disclaimer - License </details> <hr> # What is Yi? ## Introduction - 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example, - Yi-34B-Chat model **landed in second place (following GPT-4 Turbo)**, outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024). - Yi-34B model **ranked first among all existing open-source models** (such as Falcon-180B, Llama-70B, Claude) in **both English and Chinese** on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023). - 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem. <details style=\"display: inline;\"><summary> If you're interested in Yi's adoption of Llama architecture and license usage policy, see <span style=\"color: green;\">Yi's relation with Llama.</span> ⬇️</summary> <ul> <br> > 💡 TL;DR > > The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama. - Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018. - Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi. - Thanks to the Transformer and Llama architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems. - However, the Yi series models are NOT derivatives of Llama, as they do not use Llama's weights. - As Llama's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure. - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing Llama on the Alpaca Leaderboard in Dec 2023. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## News <details> <summary>🔥 <b>2024-07-29</b>: The <a href=\" Cookbook 1.0 </a> is released, featuring tutorials and examples in both Chinese and English.</summary> </details> <details> <summary>🎯 <b>2024-05-13</b>: The <a href=\" series models </a> are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.</summary> </details> <details> <summary>🎯 <b>2024-03-16</b>: The <code>Yi-9B-200K</code> is open-sourced and available to the public.</summary> </details> <details> <summary>🎯 <b>2024-03-08</b>: <a href=\" Tech Report</a> is published! </summary> </details> <details open> <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary> <br> In the \"Needle-in-a-Haystack\" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance. </details> <details open> <summary>🎯 <b>2024-03-06</b>: The <code>Yi-9B</code> is open-sourced and available to the public.</summary> <br> <code>Yi-9B</code> stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. </details> <details open> <summary>🎯 <b>2024-01-23</b>: The Yi-VL models, <code><a href=\" and <code><a href=\" are open-sourced and available to the public.</summary> <br> <code><a href=\" has ranked <strong>first</strong> among all existing open-source models in the latest benchmarks, including <a href=\" and <a href=\" (based on data available up to January 2024).</li> </details> <details> <summary>🎯 <b>2023-11-23</b>: <a href=\"#chat-models\">Chat models</a> are open-sourced and available to the public.</summary> <br>This release contains two chat models based on previously released base models, two 8-bit models quantized by GPTQ, and two 4-bit models quantized by AWQ. - - - - - - You can try some of them interactively at: - Hugging Face - Replicate </details> <details> <summary>🔔 <b>2023-11-23</b>: The Yi Series Models Community License Agreement is updated to <a href=\" </details> <details> <summary>🔥 <b>2023-11-08</b>: Invited test of Yi-34B chat model.</summary> <br>Application form: - English - Chinese </details> <details> <summary>🎯 <b>2023-11-05</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B-200K</code> and <code>Yi-34B-200K</code>, are open-sourced and available to the public.</summary> <br>This release contains two base models with the same parameter sizes as the previous release, except that the context window is extended to 200K. </details> <details> <summary>🎯 <b>2023-11-02</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B</code> and <code>Yi-34B</code>, are open-sourced and available to the public.</summary> <br>The first public release contains two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B. Both of them are trained with 4K sequence length and can be extended to 32K during inference time. </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Models Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements. If you want to deploy Yi models, make sure you meet the software and hardware requirements. ### Chat models | Model | Download | |---|---| |Yi-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub> ### Base models | Model | Download | |---|---| |Yi-34B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-200K|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. <br> - If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run to download the weight. </sup></sub> ### Model info - For chat and base models <table> <thead> <tr> <th>Model</th> <th>Intro</th> <th>Default context window</th> <th>Pretrained tokens</th> <th>Training Data Date</th> </tr> </thead> <tbody><tr> <td>6B series models</td> <td>They are suitable for personal and academic use.</td> <td rowspan=\"3\">4K</td> <td>3T</td> <td rowspan=\"3\">Up to June 2023</td> </tr> <tr> <td>9B series models</td> <td>It is the best at coding and math in the Yi series models.</td> <td>Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens.</td> </tr> <tr> <td>34B series models</td> <td>They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It&#39;s a cost-effective solution that&#39;s affordable and equipped with emergent ability.</td> <td>3T</td> </tr> </tbody></table> - For chat models <details style=\"display: inline;\"><summary>For chat model limitations, see the explanations below. ⬇️</summary> <ul> <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training. <br>However, this higher diversity might amplify certain existing issues, including: <li>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.</li> <li>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.</li> <li>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.</li> <li>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.</li> </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # How to use Yi? - Quick start - Choose your path - pip - docker - conda-lock - llama.cpp - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub ## Quick start > **💡 Tip**: If you want to get started with the Yi model and explore different methods for inference, check out the Yi Cookbook. ### Choose your path Select one of the following paths to begin your journey with Yi! !Quick start - Choose your path #### 🎯 Deploy Yi locally If you prefer to deploy Yi models locally, - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods: - pip - Docker - conda-lock - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use llama.cpp. #### 🎯 Not to deploy Yi locally If you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options. ##### 🙋‍♀️ Run Yi with APIs If you want to explore more features of Yi, you can adopt one of these methods: - Yi APIs (Yi official) - Early access has been granted to some applicants. Stay tuned for the next round of access! - Yi APIs (Replicate) ##### 🙋‍♀️ Run Yi in playground If you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options: - Yi-34B-Chat-Playground (Yi official) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). - Yi-34B-Chat-Playground (Replicate) ##### 🙋‍♀️ Chat with Yi If you want to chat with Yi, you can use one of these online services, which offer a similar user experience: - Yi-34B-Chat (Yi official on Hugging Face) - No registration is required. - Yi-34B-Chat (Yi official beta) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - pip This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference. #### Step 0: Prerequisites - Make sure Python 3.10 or a later version is installed. - If you want to run other Yi models, see software and hardware requirements. #### Step 1: Prepare your environment To set up the environment and install the required packages, execute the following command. #### Step 2: Download the Yi model You can download the weights and tokenizer of Yi models from the following sources: - Hugging Face - ModelScope - WiseModel #### Step 3: Perform inference You can perform inference with Yi chat or base models as below. ##### Perform inference with Yi chat model 1. Create a file named and copy the following content to it. 2. Run . Then you can see an output similar to the one below. 🥳 ##### Perform inference with Yi base model - Yi-34B The steps are similar to pip - Perform inference with Yi chat model. You can use the existing file []( Then you can see an output similar to the one below. 🥳 <details> <summary>Output. ⬇️ </summary> <br> **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry, **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up... </details> - Yi-9B Input Output <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - Docker <details> <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️</summary> <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> or <strong>4*4090</strong> locally and then performing inference. <h4>Step 0: Prerequisites</h4> <p>Make sure you've installed <a href=\" and <a href=\" <h4> Step 1: Start Docker </h4> <pre><code>docker run -it --gpus all \\ -v &lt;your-model-path&gt;: /models ghcr.io/01-ai/yi:latest </code></pre> <p>Alternatively, you can pull the Yi Docker image from <code>registry.lingyiwanwu.com/ci/01-ai/yi:latest</code>.</p> <h4>Step 2: Perform inference</h4> <p>You can perform inference with Yi chat or base models as below.</p> <h5>Perform inference with Yi chat model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-chat-model\">pip - Perform inference with Yi chat model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>model_path = '&lt;your-model-mount-path&gt;'</code> instead of <code>model_path = '&lt;your-model-path&gt;'</code>.</p> <h5>Perform inference with Yi base model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-base-model\">pip - Perform inference with Yi base model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>--model &lt;your-model-mount-path&gt;'</code> instead of <code>model &lt;your-model-path&gt;</code>.</p> </details> ### Quick start - conda-lock <details> <summary>You can use <code><a href=\" to generate fully reproducible lock files for conda environments. ⬇️</summary> <br> You can refer to <a href=\" for the exact versions of the dependencies. Additionally, you can utilize <code><a href=\" for installing these dependencies. <br> To install the dependencies, follow these steps: 1. Install micromamba by following the instructions available <a href=\" 2. Execute <code>micromamba install -y -n yi -f conda-lock.yml</code> to create a conda environment named <code>yi</code> and install the necessary dependencies. </details> ### Quick start - llama.cpp <a href=\" following tutorial </a> will guide you through every step of running a quantized model (<a href=\" locally and then performing inference. <details> <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️</summary> <br><a href=\" tutorial</a> guides you through every step of running a quantized model (<a href=\" locally and then performing inference.</p> - Step 0: Prerequisites - Step 1: Download llama.cpp - Step 2: Download Yi model - Step 3: Perform inference #### Step 0: Prerequisites - This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip. - Make sure []( is installed on your machine. #### Step 1: Download To clone the []( repository, run the following command. #### Step 2: Download Yi model 2.1 To clone XeIaso/yi-chat-6B-GGUF with just pointers, run the following command. 2.2 To download a quantized Yi model (yi-chat-6b.Q2_K.gguf), run the following command. #### Step 3: Perform inference To perform inference with the Yi model, you can use one of the following methods. - Method 1: Perform inference in terminal - Method 2: Perform inference in web ##### Method 1: Perform inference in terminal To compile using 4 threads and then conduct inference, navigate to the directory, and run the following command. > ##### Tips > > - Replace with the actual path of your model. > > - By default, the model operates in completion mode. > > - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run to check detailed descriptions and usage. Now you have successfully asked a question to the Yi model and got an answer! 🥳 ##### Method 2: Perform inference in web 1. To initialize a lightweight and swift chatbot, run the following command. Then you can get an output like this: 2. To access the chatbot interface, open your web browser and enter into the address bar. !Yi model chatbot interface - llama.cpp 3. Enter a question, such as \"How do you feed your pet fox? Please answer this question in 6 simple steps\" into the prompt window, and you will receive a corresponding answer. !Ask a question to Yi model - llama.cpp </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Web demo You can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario). Step 1: Prepare your environment. Step 2: Download the Yi model. Step 3. To start a web service locally, run the following command. You can access the web UI by entering the address provided in the console into your browser. !Quick start - web demo <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Fine-tuning Once finished, you can compare the finetuned model and the base model with the following command: <details style=\"display: inline;\"><summary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ </summary> <ul> ### Finetune code for Yi 6B and 34B #### Preparation ##### From Image By default, we use a small dataset from BAAI/COIG to finetune the base model. You can also prepare your customized dataset in the following format: And then mount them in the container to replace the default ones: ##### From Local Server Make sure you have conda. If not, use Then, create a conda env: #### Hardware Setup For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended. For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh). A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB. #### Quick Start Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like: Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-static. has example datasets, which are modified from BAAI/COIG into the scripts folder, copy and paste the script, and run. For example: For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes. For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient. #### Evaluation Then you'll see the answer from both the base model and the finetuned model. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quantization #### GPT-Q Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### GPT-Q quantization GPT-Q is a PTQ (Post-Training Quantization) method. It saves memory and provides potential speedups while retaining the accuracy of the model. Yi models can be GPT-Q quantized without a lot of efforts. We provide a step-by-step tutorial below. To run GPT-Q, we will use AutoGPTQ and exllama. And the huggingface transformers has integrated optimum and auto-gptq to perform GPTQ quantization on language models. ##### Do Quantization The script is provided for you to perform GPT-Q quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> #### AWQ Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### AWQ quantization AWQ is a PTQ (Post-Training Quantization) method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs. Yi models can be AWQ quantized without a lot of efforts. We provide a step-by-step tutorial below. To run AWQ, we will use AutoAWQ. ##### Do Quantization The script is provided for you to perform AWQ quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Deployment If you want to deploy Yi models, make sure you meet the software and hardware requirements. #### Software requirements Before using Yi quantized models, make sure you've installed the correct software listed below. | Model | Software |---|--- Yi 4-bit quantized models | AWQ and CUDA Yi 8-bit quantized models | GPTQ and CUDA #### Hardware requirements Before deploying Yi in your environment, make sure your hardware meets the following requirements. ##### Chat models | Model | Minimum VRAM | Recommended GPU Example | |:----------------------|:--------------|:-------------------------------------:| | Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)<br> 1 x RTX 4060 (8 GB) | | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB) <br> 1 x RTX 4060 (8 GB) | | Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)<br> 1 x A800 (80GB) | | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) <br> 1 x A100 (40 GB) | | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB) <br> 2 x RTX 4090 (24 GB)<br> 1 x A800 (40 GB) | Below are detailed minimum VRAM requirements under different batch use cases. | Model | batch=1 | batch=4 | batch=16 | batch=32 | | ----------------------- | ------- | ------- | -------- | -------- | | Yi-6B-Chat | 12 GB | 13 GB | 15 GB | 18 GB | | Yi-6B-Chat-4bits | 4 GB | 5 GB | 7 GB | 10 GB | | Yi-6B-Chat-8bits | 7 GB | 8 GB | 10 GB | 14 GB | | Yi-34B-Chat | 65 GB | 68 GB | 76 GB | > 80 GB | | Yi-34B-Chat-4bits | 19 GB | 20 GB | 30 GB | 40 GB | | Yi-34B-Chat-8bits | 35 GB | 37 GB | 46 GB | 58 GB | ##### Base models | Model | Minimum VRAM | Recommended GPU Example | |----------------------|--------------|:-------------------------------------:| | Yi-6B | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-200K | 50 GB | 1 x A800 (80 GB) | | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) | | Yi-34B | 72 GB | 4 x RTX 4090 (24 GB) <br> 1 x A800 (80 GB) | | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) | <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### FAQ <details> <summary> If you have any questions while using the Yi series models, the answers provided below could serve as a helpful reference for you. ⬇️</summary> <br> #### 💡Fine-tuning - <strong>Base model or Chat model - which to fine-tune?</strong> <br>The choice of pre-trained language model for fine-tuning hinges on the computational resources you have at your disposal and the particular demands of your task. - If you are working with a substantial volume of fine-tuning data (say, over 10,000 samples), the Base model could be your go-to choice. - On the other hand, if your fine-tuning data is not quite as extensive, opting for the Chat model might be a more fitting choice. - It is generally advisable to fine-tune both the Base and Chat models, compare their performance, and then pick the model that best aligns with your specific requirements. - <strong>Yi-34B versus Yi-34B-Chat for full-scale fine-tuning - what is the difference?</strong> <br> The key distinction between full-scale fine-tuning on and comes down to the fine-tuning approach and outcomes. - Yi-34B-Chat employs a Special Fine-Tuning (SFT) method, resulting in responses that mirror human conversation style more closely. - The Base model's fine-tuning is more versatile, with a relatively high performance potential. - If you are confident in the quality of your data, fine-tuning with could be your go-to. - If you are aiming for model-generated responses that better mimic human conversational style, or if you have doubts about your data quality, might be your best bet. #### 💡Quantization - <strong>Quantized model versus original model - what is the performance gap?</strong> - The performance variance is largely contingent on the quantization method employed and the specific use cases of these models. For instance, when it comes to models provided by the AWQ official, from a Benchmark standpoint, quantization might result in a minor performance drop of a few percentage points. - Subjectively speaking, in situations like logical reasoning, even a 1% performance shift could impact the accuracy of the output results. #### 💡General - <strong>Where can I source fine-tuning question answering datasets?</strong> - You can find fine-tuning question answering datasets on platforms like Hugging Face, with datasets like m-a-p/COIG-CQIA readily available. - Additionally, Github offers fine-tuning frameworks, such as hiyouga/LLaMA-Factory, which integrates pre-made datasets. - <strong>What is the GPU memory requirement for fine-tuning Yi-34B FP16?</strong> <br> The GPU memory needed for fine-tuning 34B FP16 hinges on the specific fine-tuning method employed. For full parameter fine-tuning, you'll need 8 GPUs each with 80 GB; however, more economical solutions like Lora require less. For more details, check out hiyouga/LLaMA-Factory. Also, consider using BF16 instead of FP16 for fine-tuning to optimize performance. - <strong>Are there any third-party platforms that support chat functionality for the Yi-34b-200k model?</strong> <br> If you're looking for third-party Chats, options include fireworks.ai. </details> ### Learning hub <details> <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️</summary> <br> Welcome to the Yi learning hub! Whether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more. The content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions! At the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below. With all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳 #### Tutorials ##### Blog tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | 使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG 应用(三):AI 电影推荐 | 2024-05-20 | 苏洋 | | 使用autodl服务器,在A40显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度18 words-s | 2024-05-20 | fly-iot | | Yi-VL 最佳实践 | 2024-05-20 | ModelScope | | 一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型 | 2024-05-13 | Second State | | 零一万物开源Yi-1.5系列大模型 | 2024-05-13 | 刘聪 | | 零一万物Yi-1.5系列模型发布并开源! 34B-9B-6B 多尺寸,魔搭社区推理微调最佳实践教程来啦! | 2024-05-13 | ModelScope | | Yi-34B 本地部署简单测试 | 2024-05-13 | 漆妮妮 | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(上) | 2024-05-13 | Words worth | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(下篇) | 2024-05-13 | Words worth | | Ollama新增两个命令,开始支持零一万物Yi-1.5系列模型 | 2024-05-13 | AI工程师笔记 | | 使用零一万物 200K 模型和 Dify 快速搭建模型应用 | 2024-05-13 | 苏洋 | | (持更) 零一万物模型折腾笔记:社区 Yi-34B 微调模型使用 | 2024-05-13 | 苏洋 | | Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探 | 2024-05-11 | 江湖评谈 | | 技术布道 Vue及Python调用零一万物模型和Prompt模板(通过百度千帆大模型平台) | 2024-05-11 | MumuLab | | 多模态大模型Yi-VL-plus体验 效果很棒 | 2024-04-27 | 大家好我是爱因 | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度23 words-s | 2024-04-27 | fly-iot | | Getting Started with Yi-1.5-9B-Chat | 2024-04-27 | Second State | | 基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记 | 2024-04-24 | 正经人王同学 | | 【AI开发:语言】一、Yi-34B超大模型本地部署CPU和GPU版 | 2024-04-21 | My的梦想已实现 | | 【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words-s,vllm要求算力在7以上的显卡就可以 | 2024-03-22 | fly-iot | | 零一万物大模型部署+微调总结 | 2024-03-22 | v_wus | | 零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案 | 2024-03-02 | 郝铠锋 | | Yi-34B微调训练 | 2024-03-02 | lsjlnd | | 实测零一万物Yi-VL多模态语言模型:能准确“识图吃瓜” | 2024-02-02 | 苏洋 | | 零一万物开源Yi-VL多模态大模型,魔搭社区推理&微调最佳实践来啦! | 2024-01-26 | ModelScope | | 单卡 3 小时训练 Yi-6B 大模型 Agent:基于 Llama Factory 实战 | 2024-01-22 | 郑耀威 | | 零一科技Yi-34B Chat大模型环境搭建&推理 | 2024-01-15 | 要养家的程序员 | | 基于LLaMA Factory,单卡3小时训练专属大模型 Agent | 2024-01-15 | 机器学习社区 | | 双卡 3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录 | 2024-01-02 | 漆妮妮 | | 【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型(baichuan2-13b、InternLM-20b、Yi-34b) | 2024-01-02 | aq_Seabiscuit | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | 零一万物模型官方 Yi-34B 模型本地离线运行部署使用笔记(物理机和docker两种部署方式),200K 超长文本内容,34B 干翻一众 70B 模型,打榜分数那么高,这模型到底行不行? | 2023-12-28 | 代码讲故事 | | LLM - 大模型速递之 Yi-34B 入门与 LoRA 微调 | 2023-12-18 | BIT_666 | | 通过vllm框架进行大模型推理 | 2023-12-18 | 土山炮 | | CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案 | 2023-12-12 | 苏洋 | | 零一万物模型折腾笔记:官方 Yi-34B 模型基础使用 | 2023-12-10 | 苏洋 | | Running Yi-34B-Chat locally using LlamaEdge | 2023-11-30 | Second State | | 本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存 | 2023-11-26 | 苏洋 | ##### GitHub Project | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------- | | yi-openai-proxy | 2024-05-11 | 苏洋 | | 基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集 | 2024-04-29 | 正经人王同学 | | 基于视频网站和零一万物大模型构建大语言模型高质量训练数据集 | 2024-04-25 | 正经人王同学 | | 基于零一万物yi-34b-chat-200k输入任意文章地址,点击按钮即可生成无广告或推广内容的简要笔记,并生成分享图给好友 | 2024-04-24 | 正经人王同学 | | Food-GPT-Yi-model | 2024-04-21 | Hubert S | ##### Video tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | Run dolphin-2.2-yi-34b on IoT Devices | 2023-11-30 | Second State | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | Install Yi 34B Locally - Chinese English Bilingual LLM | 2023-11-05 | Fahd Mirza | | Dolphin Yi 34b - Brand New Foundational Model TESTED | 2023-11-27 | Matthew Berman | | Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来 | 2024-01-28 | 漆妮妮 | | 4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型 | 2024-05-14 | titan909 | | Yi-1.5: True Apache 2.0 Competitor to LLAMA-3 | 2024-05-13 | Prompt Engineering | | Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks | 2024-05-13 | Fahd Mirza | | how to install Ollama and run Yi 6B | 2024-05-13 | Ridaa Davids | | 地表最强混合智能AI助手:llama3_70B+Yi_34B+Qwen1.5_110B | 2024-05-04 | 朱扎特 | | ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答 | 2024-05-03 | 朱扎特 | | 基于Yi-34B的领域知识问答项目演示 | 2024-05-02 | 朱扎特 | | 使用RTX4090+GaLore算法 全参微调Yi-6B大模型 | 2024-03-24 | 小工蚂创始人 | | 无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行 | 2024-03-20 | 刘悦的技术博客 | | 无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲 | 2024-03-16 | 刘悦的技术博客 | | 量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署 | 2024-03-05 | 白鸽巢 | | Yi-VL-34B(5):使用3个3090显卡24G版本,运行Yi-VL-34B模型,支持命令行和web界面方式,理解图片的内容转换成文字 | 2024-02-27 | fly-iot | | Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏 | 2024-02-25 | 魚蟲蟲 | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2 | 2024-02-23 | 魚蟲蟲 | | 【wails】(2):使用go-llama.cpp 运行 yi-01-6b大模型,使用本地CPU运行,速度还可以,等待下一版本更新 | 2024-02-20 | fly-iot | | 【xinference】(6):在autodl上,使用xinference部署yi-vl-chat和qwen-vl-chat模型,可以使用openai调用成功 | 2024-02-06 | fly-iot | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1 | 2024-02-05 | 魚蟲蟲 | | 2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南 | 2024-01-30 | 小饭护法要转码 | | Best Story Writing AI Model - Install Yi 6B 200K Locally on Windows | 2024-01-22 | Fahd Mirza | | Mac 本地运行大语言模型方法与常见问题指南(Yi 34B 模型+32 GB 内存测试) | 2024-01-21 | 小吴苹果机器人 | | 【Dify知识库】(11):Dify0.4.9改造支持MySQL,成功接入yi-6b 做对话,本地使用fastchat启动,占8G显存,完成知识库配置 | 2024-01-21 | fly-iot | | 这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥 | 2024-01-20 | 晓漫吧 | | 大模型推理 NvLink 桥接器有用吗|双卡 A6000 测试一下 | 2024-01-17 | 漆妮妮 | | 大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能 | 2024-01-15 | 漆妮妮 | | C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来 | 2024-01-11 | 漆妮妮 | | 双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录 | 2024-01-01 | 漆妮妮 | | 手把手教学!使用 vLLM 快速部署 Yi-34B-Chat | 2023-12-26 | 白鸽巢 | | 如何训练企业自己的大语言模型?Yi-6B LORA微调演示 #小工蚁 | 2023-12-21 | 小工蚂创始人 | | Yi-34B(4):使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words/s | 2023-12-02 | fly-iot | | 使用autodl服务器,RTX 3090 * 3 显卡上运行, Yi-34B-Chat模型,显存占用60G | 2023-12-01 | fly-iot | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,用vllm优化,增加 --num-gpu 2,速度23 words/s | 2023-12-01 | fly-iot | | Yi大模型一键本地部署 技术小白玩转AI | 2023-12-01 | 技术小白玩转AI | | 01.AI's Yi-6B: Overview and Fine-Tuning | 2023-11-28 | AI Makerspace | | Yi 34B Chat LLM outperforms Llama 70B | 2023-11-27 | DLExplorer | | How to run open source models on mac Yi 34b on m3 Max | 2023-11-26 | TECHNO PREMIUM | | Yi-34B - 200K - The BEST & NEW CONTEXT WINDOW KING | 2023-11-24 | Prompt Engineering | | Yi 34B : The Rise of Powerful Mid-Sized Models - Base,200k & Chat | 2023-11-24 | Sam Witteveen | | 在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b(还可作为私有OpenAI API服务器) | 2023-11-15 | Second State | | Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server) | 2023-11-14 | Second State | | How to Install Yi 34B 200K Llamafied on Windows Laptop | 2023-11-11 | Fahd Mirza | </details> # Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Chat model performance - Base model performance - Yi-34B and Yi-34B-200K - Yi-9B ## Ecosystem Yi has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity. - Upstream - Downstream - Serving - Quantization - Fine-tuning - API ### Upstream The Yi series models follow the same model architecture as Llama. By choosing Yi, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency. For example, the Yi series models are saved in the format of the Llama model. You can directly use and to load the model. For more information, see Use the chat model. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Downstream > 💡 Tip > > - Feel free to create a PR and share the fantastic work you've built using the Yi series models. > > - To help others quickly understand your work, it is recommended to use the format of . #### Serving If you want to get up with Yi in a few minutes, you can use the following services built upon Yi. - Yi-34B-Chat: you can chat with Yi using one of the following platforms: - Yi-34B-Chat | Hugging Face - Yi-34B-Chat | Yi Platform: **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in English or Chinese) and experience it firsthand! - Yi-6B-Chat (Replicate): you can use this model with more options by setting additional parameters and calling APIs. - ScaleLLM: you can use this service to run Yi models locally with added flexibility and customization. #### Quantization If you have limited computational capabilities, you can use Yi's quantized models as follows. These quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage. - TheBloke/Yi-34B-GPTQ - TheBloke/Yi-34B-GGUF - TheBloke/Yi-34B-AWQ #### Fine-tuning If you're seeking to explore the diverse capabilities within Yi's thriving family, you can delve into Yi's fine-tuned models as below. - TheBloke Models: this site hosts numerous fine-tuned models derived from various LLMs including Yi. This is not an exhaustive list for Yi, but to name a few sorted on downloads: - TheBloke/dolphin-2_2-yi-34b-AWQ - TheBloke/Yi-34B-Chat-AWQ - TheBloke/Yi-34B-Chat-GPTQ - SUSTech/SUS-Chat-34B: this model ranked first among all models below 70B and outperformed the twice larger deepseek-llm-67b-chat. You can check the result on the Open LLM Leaderboard. - OrionStarAI/OrionStar-Yi-34B-Chat-Llama: this model excelled beyond other models (such as GPT-4, Qwen-14B-Chat, Baichuan2-13B-Chat) in C-Eval and CMMLU evaluations on the OpenCompass LLM Leaderboard. - NousResearch/Nous-Capybara-34B: this model is trained with 200K context length and 3 epochs on the Capybara dataset. #### API - amazing-openai-api: this tool converts Yi model APIs into the OpenAI API format out of the box. - LlamaEdge: this tool builds an OpenAI-compatible API server for Yi-34B-Chat using a portable Wasm (WebAssembly) file, powered by Rust. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Tech report For detailed capabilities of the Yi series model, see Yi: Open Foundation Models by 01.AI. ### Citation ## Benchmarks - Chat model performance - Base model performance ### Chat model performance Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more. !Chat model performance <details> <summary> Evaluation methods and challenges. ⬇️ </summary> - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed. - **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. - **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results. <strong>*</strong>: C-Eval results are evaluated on the validation datasets </details> ### Base model performance #### Yi-34B and Yi-34B-200K The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more. !Base model performance <details> <summary> Evaluation methods. ⬇️</summary> - **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass. - **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences. - **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content. - **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline. - **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension. - **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category \"Math & Code\". - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated. </details> #### Yi-9B Yi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. !Yi-9B benchmark - details - In terms of **overall** ability (Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - overall - In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - code - In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - math - In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - text <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Who can use Yi? Everyone! 🙌 ✅ The code and weights of the Yi series models are distributed under the Apache 2.0 license, which means the Yi series models are free for personal usage, academic purposes, and commercial use. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Misc. ### Acknowledgments A heartfelt thank you to each of you who have made contributions to the Yi community! You have helped Yi not just a project, but a vibrant, growing home for innovation. ![yi contributors]( <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Disclaimer We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### License The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license. If you create derivative works based on this model, please include the following attribution in your derivative works: This work is a derivative of [The Yi Series Model You Base On] by 01.AI, used under the Apache 2.0 License. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p>",
20
+ "model_explanation_gemini": "A bilingual (English/Chinese) open-source large language model optimized for text generation, ranking highly in benchmarks for language understanding and reasoning.\n\n**Features:** \n- Text generation capability \n- Bilingual (English/Chinese) support \n- Strong performance in language understanding, reasoning, and comprehension \n- Open-source (Apache 2.0 license) \n\n**Comparison:** \nOutperforms other open-source models like Falcon-180B and Llama-70B in benchmarks, ranking second only to GPT",
21
+ "release_year": "2024",
22
+ "parameter_count": "34B",
23
+ "is_fine_tuned": false,
24
+ "category": "Large Language Model",
25
+ "model_family": "LLaMA",
26
+ "api_enhanced": true
27
  }
model_data_json/01-ai_Yi-6B.json CHANGED
@@ -17,5 +17,11 @@
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 widget: - example_title: \"Yi-34B-Chat\" text: \"hi\" output: text: \" Hello! How can I assist you today?\" - example_title: \"Yi-34B\" text: \"There's a place where time stands still. A place of breath taking wonder, but also\" output: text: \" an eerie sense that something is just not right…\\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?\" pipeline_tag: text-generation --- <div align=\"center\"> <picture> <source media=\"(prefers-color-scheme: dark)\" srcset=\" width=\"200px\"> <source media=\"(prefers-color-scheme: light)\" srcset=\" width=\"200px\"> <img alt=\"specify theme context for images\" src=\" </picture> </br> </br> <div style=\"display: inline-block;\"> <a href=\" <img src=\" </a> </div> <div style=\"display: inline-block;\"> <a href=\"mailto:[email protected]\"> <img src=\" </a> </div> </div> <div align=\"center\"> <h3 align=\"center\">Building the Next Generation of Open-Source and Bilingual LLMs</h3> </div> <p align=\"center\"> 🤗 <a href=\" target=\"_blank\">Hugging Face</a> • 🤖 <a href=\" target=\"_blank\">ModelScope</a> • ✡️ <a href=\" target=\"_blank\">WiseModel</a> </p> <p align=\"center\"> 👩‍🚀 Ask questions or discuss ideas on <a href=\" target=\"_blank\"> GitHub </a> </p> <p align=\"center\"> 👋 Join us on <a href=\" target=\"_blank\"> 👾 Discord </a> or <a href=\"有官方的微信群嘛 · Issue #43 · 01-ai/Yi\" target=\"_blank\"> 💬 WeChat </a> </p> <p align=\"center\"> 📝 Check out <a href=\" Yi Tech Report </a> </p> <p align=\"center\"> 📚 Grow at <a href=\"#learning-hub\"> Yi Learning Hub </a> </p> <!-- DO NOT REMOVE ME --> <hr> <details open> <summary></b>📕 Table of Contents</b></summary> - What is Yi? - Introduction - Models - Chat models - Base models - Model info - News - How to use Yi? - Quick start - Choose your path - pip - docker - llama.cpp - conda-lock - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub - Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Base model performance - Chat model performance - Tech report - Citation - Who can use Yi? - Misc. - Acknowledgements - Disclaimer - License </details> <hr> # What is Yi? ## Introduction - 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example, - Yi-34B-Chat model **landed in second place (following GPT-4 Turbo)**, outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024). - Yi-34B model **ranked first among all existing open-source models** (such as Falcon-180B, Llama-70B, Claude) in **both English and Chinese** on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023). - 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem. <details style=\"display: inline;\"><summary> If you're interested in Yi's adoption of Llama architecture and license usage policy, see <span style=\"color: green;\">Yi's relation with Llama.</span> ⬇️</summary> <ul> <br> > 💡 TL;DR > > The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama. - Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018. - Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi. - Thanks to the Transformer and Llama architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems. - However, the Yi series models are NOT derivatives of Llama, as they do not use Llama's weights. - As Llama's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure. - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing Llama on the Alpaca Leaderboard in Dec 2023. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## News <details> <summary>🔥 <b>2024-07-29</b>: The <a href=\" Cookbook 1.0 </a> is released, featuring tutorials and examples in both Chinese and English.</summary> </details> <details> <summary>🎯 <b>2024-05-13</b>: The <a href=\" series models </a> are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.</summary> </details> <details> <summary>🎯 <b>2024-03-16</b>: The <code>Yi-9B-200K</code> is open-sourced and available to the public.</summary> </details> <details> <summary>🎯 <b>2024-03-08</b>: <a href=\" Tech Report</a> is published! </summary> </details> <details open> <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary> <br> In the \"Needle-in-a-Haystack\" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance. </details> <details open> <summary>🎯 <b>2024-03-06</b>: The <code>Yi-9B</code> is open-sourced and available to the public.</summary> <br> <code>Yi-9B</code> stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. </details> <details open> <summary>🎯 <b>2024-01-23</b>: The Yi-VL models, <code><a href=\" and <code><a href=\" are open-sourced and available to the public.</summary> <br> <code><a href=\" has ranked <strong>first</strong> among all existing open-source models in the latest benchmarks, including <a href=\" and <a href=\" (based on data available up to January 2024).</li> </details> <details> <summary>🎯 <b>2023-11-23</b>: <a href=\"#chat-models\">Chat models</a> are open-sourced and available to the public.</summary> <br>This release contains two chat models based on previously released base models, two 8-bit models quantized by GPTQ, and two 4-bit models quantized by AWQ. - - - - - - You can try some of them interactively at: - Hugging Face - Replicate </details> <details> <summary>🔔 <b>2023-11-23</b>: The Yi Series Models Community License Agreement is updated to <a href=\" </details> <details> <summary>🔥 <b>2023-11-08</b>: Invited test of Yi-34B chat model.</summary> <br>Application form: - English - Chinese </details> <details> <summary>🎯 <b>2023-11-05</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B-200K</code> and <code>Yi-34B-200K</code>, are open-sourced and available to the public.</summary> <br>This release contains two base models with the same parameter sizes as the previous release, except that the context window is extended to 200K. </details> <details> <summary>🎯 <b>2023-11-02</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B</code> and <code>Yi-34B</code>, are open-sourced and available to the public.</summary> <br>The first public release contains two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B. Both of them are trained with 4K sequence length and can be extended to 32K during inference time. </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Models Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements. If you want to deploy Yi models, make sure you meet the software and hardware requirements. ### Chat models | Model | Download | |---|---| |Yi-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub> ### Base models | Model | Download | |---|---| |Yi-34B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-200K|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. <br> - If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run to download the weight. </sup></sub> ### Model info - For chat and base models <table> <thead> <tr> <th>Model</th> <th>Intro</th> <th>Default context window</th> <th>Pretrained tokens</th> <th>Training Data Date</th> </tr> </thead> <tbody><tr> <td>6B series models</td> <td>They are suitable for personal and academic use.</td> <td rowspan=\"3\">4K</td> <td>3T</td> <td rowspan=\"3\">Up to June 2023</td> </tr> <tr> <td>9B series models</td> <td>It is the best at coding and math in the Yi series models.</td> <td>Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens.</td> </tr> <tr> <td>34B series models</td> <td>They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It&#39;s a cost-effective solution that&#39;s affordable and equipped with emergent ability.</td> <td>3T</td> </tr> </tbody></table> - For chat models <details style=\"display: inline;\"><summary>For chat model limitations, see the explanations below. ⬇️</summary> <ul> <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training. <br>However, this higher diversity might amplify certain existing issues, including: <li>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.</li> <li>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.</li> <li>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.</li> <li>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.</li> </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # How to use Yi? - Quick start - Choose your path - pip - docker - conda-lock - llama.cpp - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub ## Quick start > **💡 Tip**: If you want to get started with the Yi model and explore different methods for inference, check out the Yi Cookbook. ### Choose your path Select one of the following paths to begin your journey with Yi! !Quick start - Choose your path #### 🎯 Deploy Yi locally If you prefer to deploy Yi models locally, - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods: - pip - Docker - conda-lock - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use llama.cpp. #### 🎯 Not to deploy Yi locally If you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options. ##### 🙋‍♀️ Run Yi with APIs If you want to explore more features of Yi, you can adopt one of these methods: - Yi APIs (Yi official) - Early access has been granted to some applicants. Stay tuned for the next round of access! - Yi APIs (Replicate) ##### 🙋‍♀️ Run Yi in playground If you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options: - Yi-34B-Chat-Playground (Yi official) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). - Yi-34B-Chat-Playground (Replicate) ##### 🙋‍♀️ Chat with Yi If you want to chat with Yi, you can use one of these online services, which offer a similar user experience: - Yi-34B-Chat (Yi official on Hugging Face) - No registration is required. - Yi-34B-Chat (Yi official beta) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - pip This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference. #### Step 0: Prerequisites - Make sure Python 3.10 or a later version is installed. - If you want to run other Yi models, see software and hardware requirements. #### Step 1: Prepare your environment To set up the environment and install the required packages, execute the following command. #### Step 2: Download the Yi model You can download the weights and tokenizer of Yi models from the following sources: - Hugging Face - ModelScope - WiseModel #### Step 3: Perform inference You can perform inference with Yi chat or base models as below. ##### Perform inference with Yi chat model 1. Create a file named and copy the following content to it. 2. Run . Then you can see an output similar to the one below. 🥳 ##### Perform inference with Yi base model - Yi-34B The steps are similar to pip - Perform inference with Yi chat model. You can use the existing file []( Then you can see an output similar to the one below. 🥳 <details> <summary>Output. ⬇️ </summary> <br> **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry, **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up... </details> - Yi-9B Input Output <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - Docker <details> <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️</summary> <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> or <strong>4*4090</strong> locally and then performing inference. <h4>Step 0: Prerequisites</h4> <p>Make sure you've installed <a href=\" and <a href=\" <h4> Step 1: Start Docker </h4> <pre><code>docker run -it --gpus all \\ -v &lt;your-model-path&gt;: /models ghcr.io/01-ai/yi:latest </code></pre> <p>Alternatively, you can pull the Yi Docker image from <code>registry.lingyiwanwu.com/ci/01-ai/yi:latest</code>.</p> <h4>Step 2: Perform inference</h4> <p>You can perform inference with Yi chat or base models as below.</p> <h5>Perform inference with Yi chat model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-chat-model\">pip - Perform inference with Yi chat model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>model_path = '&lt;your-model-mount-path&gt;'</code> instead of <code>model_path = '&lt;your-model-path&gt;'</code>.</p> <h5>Perform inference with Yi base model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-base-model\">pip - Perform inference with Yi base model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>--model &lt;your-model-mount-path&gt;'</code> instead of <code>model &lt;your-model-path&gt;</code>.</p> </details> ### Quick start - conda-lock <details> <summary>You can use <code><a href=\" to generate fully reproducible lock files for conda environments. ⬇️</summary> <br> You can refer to <a href=\" for the exact versions of the dependencies. Additionally, you can utilize <code><a href=\" for installing these dependencies. <br> To install the dependencies, follow these steps: 1. Install micromamba by following the instructions available <a href=\" 2. Execute <code>micromamba install -y -n yi -f conda-lock.yml</code> to create a conda environment named <code>yi</code> and install the necessary dependencies. </details> ### Quick start - llama.cpp <a href=\" following tutorial </a> will guide you through every step of running a quantized model (<a href=\" locally and then performing inference. <details> <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️</summary> <br><a href=\" tutorial</a> guides you through every step of running a quantized model (<a href=\" locally and then performing inference.</p> - Step 0: Prerequisites - Step 1: Download llama.cpp - Step 2: Download Yi model - Step 3: Perform inference #### Step 0: Prerequisites - This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip. - Make sure []( is installed on your machine. #### Step 1: Download To clone the []( repository, run the following command. #### Step 2: Download Yi model 2.1 To clone XeIaso/yi-chat-6B-GGUF with just pointers, run the following command. 2.2 To download a quantized Yi model (yi-chat-6b.Q2_K.gguf), run the following command. #### Step 3: Perform inference To perform inference with the Yi model, you can use one of the following methods. - Method 1: Perform inference in terminal - Method 2: Perform inference in web ##### Method 1: Perform inference in terminal To compile using 4 threads and then conduct inference, navigate to the directory, and run the following command. > ##### Tips > > - Replace with the actual path of your model. > > - By default, the model operates in completion mode. > > - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run to check detailed descriptions and usage. Now you have successfully asked a question to the Yi model and got an answer! 🥳 ##### Method 2: Perform inference in web 1. To initialize a lightweight and swift chatbot, run the following command. Then you can get an output like this: 2. To access the chatbot interface, open your web browser and enter into the address bar. !Yi model chatbot interface - llama.cpp 3. Enter a question, such as \"How do you feed your pet fox? Please answer this question in 6 simple steps\" into the prompt window, and you will receive a corresponding answer. !Ask a question to Yi model - llama.cpp </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Web demo You can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario). Step 1: Prepare your environment. Step 2: Download the Yi model. Step 3. To start a web service locally, run the following command. You can access the web UI by entering the address provided in the console into your browser. !Quick start - web demo <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Fine-tuning Once finished, you can compare the finetuned model and the base model with the following command: <details style=\"display: inline;\"><summary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ </summary> <ul> ### Finetune code for Yi 6B and 34B #### Preparation ##### From Image By default, we use a small dataset from BAAI/COIG to finetune the base model. You can also prepare your customized dataset in the following format: And then mount them in the container to replace the default ones: ##### From Local Server Make sure you have conda. If not, use Then, create a conda env: #### Hardware Setup For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended. For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh). A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB. #### Quick Start Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like: Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-static. has example datasets, which are modified from BAAI/COIG into the scripts folder, copy and paste the script, and run. For example: For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes. For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient. #### Evaluation Then you'll see the answer from both the base model and the finetuned model. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quantization #### GPT-Q Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### GPT-Q quantization GPT-Q is a PTQ (Post-Training Quantization) method. It saves memory and provides potential speedups while retaining the accuracy of the model. Yi models can be GPT-Q quantized without a lot of efforts. We provide a step-by-step tutorial below. To run GPT-Q, we will use AutoGPTQ and exllama. And the huggingface transformers has integrated optimum and auto-gptq to perform GPTQ quantization on language models. ##### Do Quantization The script is provided for you to perform GPT-Q quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> #### AWQ Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### AWQ quantization AWQ is a PTQ (Post-Training Quantization) method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs. Yi models can be AWQ quantized without a lot of efforts. We provide a step-by-step tutorial below. To run AWQ, we will use AutoAWQ. ##### Do Quantization The script is provided for you to perform AWQ quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Deployment If you want to deploy Yi models, make sure you meet the software and hardware requirements. #### Software requirements Before using Yi quantized models, make sure you've installed the correct software listed below. | Model | Software |---|--- Yi 4-bit quantized models | AWQ and CUDA Yi 8-bit quantized models | GPTQ and CUDA #### Hardware requirements Before deploying Yi in your environment, make sure your hardware meets the following requirements. ##### Chat models | Model | Minimum VRAM | Recommended GPU Example | |:----------------------|:--------------|:-------------------------------------:| | Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)<br> 1 x RTX 4060 (8 GB) | | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB) <br> 1 x RTX 4060 (8 GB) | | Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)<br> 1 x A800 (80GB) | | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) <br> 1 x A100 (40 GB) | | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB) <br> 2 x RTX 4090 (24 GB)<br> 1 x A800 (40 GB) | Below are detailed minimum VRAM requirements under different batch use cases. | Model | batch=1 | batch=4 | batch=16 | batch=32 | | ----------------------- | ------- | ------- | -------- | -------- | | Yi-6B-Chat | 12 GB | 13 GB | 15 GB | 18 GB | | Yi-6B-Chat-4bits | 4 GB | 5 GB | 7 GB | 10 GB | | Yi-6B-Chat-8bits | 7 GB | 8 GB | 10 GB | 14 GB | | Yi-34B-Chat | 65 GB | 68 GB | 76 GB | > 80 GB | | Yi-34B-Chat-4bits | 19 GB | 20 GB | 30 GB | 40 GB | | Yi-34B-Chat-8bits | 35 GB | 37 GB | 46 GB | 58 GB | ##### Base models | Model | Minimum VRAM | Recommended GPU Example | |----------------------|--------------|:-------------------------------------:| | Yi-6B | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-200K | 50 GB | 1 x A800 (80 GB) | | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) | | Yi-34B | 72 GB | 4 x RTX 4090 (24 GB) <br> 1 x A800 (80 GB) | | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) | <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### FAQ <details> <summary> If you have any questions while using the Yi series models, the answers provided below could serve as a helpful reference for you. ⬇️</summary> <br> #### 💡Fine-tuning - <strong>Base model or Chat model - which to fine-tune?</strong> <br>The choice of pre-trained language model for fine-tuning hinges on the computational resources you have at your disposal and the particular demands of your task. - If you are working with a substantial volume of fine-tuning data (say, over 10,000 samples), the Base model could be your go-to choice. - On the other hand, if your fine-tuning data is not quite as extensive, opting for the Chat model might be a more fitting choice. - It is generally advisable to fine-tune both the Base and Chat models, compare their performance, and then pick the model that best aligns with your specific requirements. - <strong>Yi-34B versus Yi-34B-Chat for full-scale fine-tuning - what is the difference?</strong> <br> The key distinction between full-scale fine-tuning on and comes down to the fine-tuning approach and outcomes. - Yi-34B-Chat employs a Special Fine-Tuning (SFT) method, resulting in responses that mirror human conversation style more closely. - The Base model's fine-tuning is more versatile, with a relatively high performance potential. - If you are confident in the quality of your data, fine-tuning with could be your go-to. - If you are aiming for model-generated responses that better mimic human conversational style, or if you have doubts about your data quality, might be your best bet. #### 💡Quantization - <strong>Quantized model versus original model - what is the performance gap?</strong> - The performance variance is largely contingent on the quantization method employed and the specific use cases of these models. For instance, when it comes to models provided by the AWQ official, from a Benchmark standpoint, quantization might result in a minor performance drop of a few percentage points. - Subjectively speaking, in situations like logical reasoning, even a 1% performance shift could impact the accuracy of the output results. #### 💡General - <strong>Where can I source fine-tuning question answering datasets?</strong> - You can find fine-tuning question answering datasets on platforms like Hugging Face, with datasets like m-a-p/COIG-CQIA readily available. - Additionally, Github offers fine-tuning frameworks, such as hiyouga/LLaMA-Factory, which integrates pre-made datasets. - <strong>What is the GPU memory requirement for fine-tuning Yi-34B FP16?</strong> <br> The GPU memory needed for fine-tuning 34B FP16 hinges on the specific fine-tuning method employed. For full parameter fine-tuning, you'll need 8 GPUs each with 80 GB; however, more economical solutions like Lora require less. For more details, check out hiyouga/LLaMA-Factory. Also, consider using BF16 instead of FP16 for fine-tuning to optimize performance. - <strong>Are there any third-party platforms that support chat functionality for the Yi-34b-200k model?</strong> <br> If you're looking for third-party Chats, options include fireworks.ai. </details> ### Learning hub <details> <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️</summary> <br> Welcome to the Yi learning hub! Whether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more. The content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions! At the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below. With all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳 #### Tutorials ##### Blog tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | 使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG 应用(三):AI 电影推荐 | 2024-05-20 | 苏洋 | | 使用autodl服务器,在A40显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度18 words-s | 2024-05-20 | fly-iot | | Yi-VL 最佳实践 | 2024-05-20 | ModelScope | | 一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型 | 2024-05-13 | Second State | | 零一万物开源Yi-1.5系列大模型 | 2024-05-13 | 刘聪 | | 零一万物Yi-1.5系列模型发布并开源! 34B-9B-6B 多尺寸,魔搭社区推理微调最佳实践教程来啦! | 2024-05-13 | ModelScope | | Yi-34B 本地部署简单测试 | 2024-05-13 | 漆妮妮 | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(上) | 2024-05-13 | Words worth | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(下篇) | 2024-05-13 | Words worth | | Ollama新增两个命令,开始支持零一万物Yi-1.5系列模型 | 2024-05-13 | AI工程师笔记 | | 使用零一万物 200K 模型和 Dify 快速搭建模型应用 | 2024-05-13 | 苏洋 | | (持更) 零一万物模型折腾笔记:社区 Yi-34B 微调模型使用 | 2024-05-13 | 苏洋 | | Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探 | 2024-05-11 | 江湖评谈 | | 技术布道 Vue及Python调用零一万物模型和Prompt模板(通过百度千帆大模型平台) | 2024-05-11 | MumuLab | | 多模态大模型Yi-VL-plus体验 效果很棒 | 2024-04-27 | 大家好我是爱因 | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度23 words-s | 2024-04-27 | fly-iot | | Getting Started with Yi-1.5-9B-Chat | 2024-04-27 | Second State | | 基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记 | 2024-04-24 | 正经人王同学 | | 【AI开发:语言】一、Yi-34B超大模型本地部署CPU和GPU版 | 2024-04-21 | My的梦想已实现 | | 【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words-s,vllm要求算力在7以上的显卡就可以 | 2024-03-22 | fly-iot | | 零一万物大模型部署+微调总结 | 2024-03-22 | v_wus | | 零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案 | 2024-03-02 | 郝铠锋 | | Yi-34B微调训练 | 2024-03-02 | lsjlnd | | 实测零一万物Yi-VL多模态语言模型:能准确“识图吃瓜” | 2024-02-02 | 苏洋 | | 零一万物开源Yi-VL多模态大模型,魔搭社区推理&微调最佳实践来啦! | 2024-01-26 | ModelScope | | 单卡 3 小时训练 Yi-6B 大模型 Agent:基于 Llama Factory 实战 | 2024-01-22 | 郑耀威 | | 零一科技Yi-34B Chat大模型环境搭建&推理 | 2024-01-15 | 要养家的程序员 | | 基于LLaMA Factory,单卡3小时训练专属大模型 Agent | 2024-01-15 | 机器学习社区 | | 双卡 3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录 | 2024-01-02 | 漆妮妮 | | 【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型(baichuan2-13b、InternLM-20b、Yi-34b) | 2024-01-02 | aq_Seabiscuit | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | 零一万物模型官方 Yi-34B 模型本地离线运行部署使用笔记(物理机和docker两种部署方式),200K 超长文本内容,34B 干翻一众 70B 模型,打榜分数那么高,这模型到底行不行? | 2023-12-28 | 代码讲故事 | | LLM - 大模型速递之 Yi-34B 入门与 LoRA 微调 | 2023-12-18 | BIT_666 | | 通过vllm框架进行大模型推理 | 2023-12-18 | 土山炮 | | CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案 | 2023-12-12 | 苏洋 | | 零一万物模型折腾笔记:官方 Yi-34B 模型基础使用 | 2023-12-10 | 苏洋 | | Running Yi-34B-Chat locally using LlamaEdge | 2023-11-30 | Second State | | 本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存 | 2023-11-26 | 苏洋 | ##### GitHub Project | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------- | | yi-openai-proxy | 2024-05-11 | 苏洋 | | 基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集 | 2024-04-29 | 正经人王同学 | | 基于视频网站和零一万物大模型构建大语言模型高质量训练数据集 | 2024-04-25 | 正经人王同学 | | 基于零一万物yi-34b-chat-200k输入任意文章地址,点击按钮即可生成无广告或推广内容的简要笔记,并生成分享图给好友 | 2024-04-24 | 正经人王同学 | | Food-GPT-Yi-model | 2024-04-21 | Hubert S | ##### Video tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | Run dolphin-2.2-yi-34b on IoT Devices | 2023-11-30 | Second State | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | Install Yi 34B Locally - Chinese English Bilingual LLM | 2023-11-05 | Fahd Mirza | | Dolphin Yi 34b - Brand New Foundational Model TESTED | 2023-11-27 | Matthew Berman | | Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来 | 2024-01-28 | 漆妮妮 | | 4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型 | 2024-05-14 | titan909 | | Yi-1.5: True Apache 2.0 Competitor to LLAMA-3 | 2024-05-13 | Prompt Engineering | | Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks | 2024-05-13 | Fahd Mirza | | how to install Ollama and run Yi 6B | 2024-05-13 | Ridaa Davids | | 地表最强混合智能AI助手:llama3_70B+Yi_34B+Qwen1.5_110B | 2024-05-04 | 朱扎特 | | ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答 | 2024-05-03 | 朱扎特 | | 基于Yi-34B的领域知识问答项目演示 | 2024-05-02 | 朱扎特 | | 使用RTX4090+GaLore算法 全参微调Yi-6B大模型 | 2024-03-24 | 小工蚂创始人 | | 无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行 | 2024-03-20 | 刘悦的技术博客 | | 无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲 | 2024-03-16 | 刘悦的技术博客 | | 量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署 | 2024-03-05 | 白鸽巢 | | Yi-VL-34B(5):使用3个3090显卡24G版本,运行Yi-VL-34B模型,支持命令行和web界面方式,理解图片的内容转换成文字 | 2024-02-27 | fly-iot | | Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏 | 2024-02-25 | 魚蟲蟲 | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2 | 2024-02-23 | 魚蟲蟲 | | 【wails】(2):使用go-llama.cpp 运行 yi-01-6b大模型,使用本地CPU运行,速度还可以,等待下一版本更新 | 2024-02-20 | fly-iot | | 【xinference】(6):在autodl上,使用xinference部署yi-vl-chat和qwen-vl-chat模型,可以使用openai调用成功 | 2024-02-06 | fly-iot | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1 | 2024-02-05 | 魚蟲蟲 | | 2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南 | 2024-01-30 | 小饭护法要转码 | | Best Story Writing AI Model - Install Yi 6B 200K Locally on Windows | 2024-01-22 | Fahd Mirza | | Mac 本地运行大语言模型方法与常见问题指南(Yi 34B 模型+32 GB 内存测试) | 2024-01-21 | 小吴苹果机器人 | | 【Dify知识库】(11):Dify0.4.9改造支持MySQL,成功接入yi-6b 做对话,本地使用fastchat启动,占8G显存,完成知识库配置 | 2024-01-21 | fly-iot | | 这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥 | 2024-01-20 | 晓漫吧 | | 大模型推理 NvLink 桥接器有用吗|双卡 A6000 测试一下 | 2024-01-17 | 漆妮妮 | | 大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能 | 2024-01-15 | 漆妮妮 | | C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来 | 2024-01-11 | 漆妮妮 | | 双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录 | 2024-01-01 | 漆妮妮 | | 手把手教学!使用 vLLM 快速部署 Yi-34B-Chat | 2023-12-26 | 白鸽巢 | | 如何训练企业自己的大语言模型?Yi-6B LORA微调演示 #小工蚁 | 2023-12-21 | 小工蚂创始人 | | Yi-34B(4):使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words/s | 2023-12-02 | fly-iot | | 使用autodl服务器,RTX 3090 * 3 显卡上运行, Yi-34B-Chat模型,显存占用60G | 2023-12-01 | fly-iot | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,用vllm优化,增加 --num-gpu 2,速度23 words/s | 2023-12-01 | fly-iot | | Yi大模型一键本地部署 技术小白玩转AI | 2023-12-01 | 技术小白玩转AI | | 01.AI's Yi-6B: Overview and Fine-Tuning | 2023-11-28 | AI Makerspace | | Yi 34B Chat LLM outperforms Llama 70B | 2023-11-27 | DLExplorer | | How to run open source models on mac Yi 34b on m3 Max | 2023-11-26 | TECHNO PREMIUM | | Yi-34B - 200K - The BEST & NEW CONTEXT WINDOW KING | 2023-11-24 | Prompt Engineering | | Yi 34B : The Rise of Powerful Mid-Sized Models - Base,200k & Chat | 2023-11-24 | Sam Witteveen | | 在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b(还可作为私有OpenAI API服务器) | 2023-11-15 | Second State | | Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server) | 2023-11-14 | Second State | | How to Install Yi 34B 200K Llamafied on Windows Laptop | 2023-11-11 | Fahd Mirza | </details> # Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Chat model performance - Base model performance - Yi-34B and Yi-34B-200K - Yi-9B ## Ecosystem Yi has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity. - Upstream - Downstream - Serving - Quantization - Fine-tuning - API ### Upstream The Yi series models follow the same model architecture as Llama. By choosing Yi, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency. For example, the Yi series models are saved in the format of the Llama model. You can directly use and to load the model. For more information, see Use the chat model. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Downstream > 💡 Tip > > - Feel free to create a PR and share the fantastic work you've built using the Yi series models. > > - To help others quickly understand your work, it is recommended to use the format of . #### Serving If you want to get up with Yi in a few minutes, you can use the following services built upon Yi. - Yi-34B-Chat: you can chat with Yi using one of the following platforms: - Yi-34B-Chat | Hugging Face - Yi-34B-Chat | Yi Platform: **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in English or Chinese) and experience it firsthand! - Yi-6B-Chat (Replicate): you can use this model with more options by setting additional parameters and calling APIs. - ScaleLLM: you can use this service to run Yi models locally with added flexibility and customization. #### Quantization If you have limited computational capabilities, you can use Yi's quantized models as follows. These quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage. - TheBloke/Yi-34B-GPTQ - TheBloke/Yi-34B-GGUF - TheBloke/Yi-34B-AWQ #### Fine-tuning If you're seeking to explore the diverse capabilities within Yi's thriving family, you can delve into Yi's fine-tuned models as below. - TheBloke Models: this site hosts numerous fine-tuned models derived from various LLMs including Yi. This is not an exhaustive list for Yi, but to name a few sorted on downloads: - TheBloke/dolphin-2_2-yi-34b-AWQ - TheBloke/Yi-34B-Chat-AWQ - TheBloke/Yi-34B-Chat-GPTQ - SUSTech/SUS-Chat-34B: this model ranked first among all models below 70B and outperformed the twice larger deepseek-llm-67b-chat. You can check the result on the Open LLM Leaderboard. - OrionStarAI/OrionStar-Yi-34B-Chat-Llama: this model excelled beyond other models (such as GPT-4, Qwen-14B-Chat, Baichuan2-13B-Chat) in C-Eval and CMMLU evaluations on the OpenCompass LLM Leaderboard. - NousResearch/Nous-Capybara-34B: this model is trained with 200K context length and 3 epochs on the Capybara dataset. #### API - amazing-openai-api: this tool converts Yi model APIs into the OpenAI API format out of the box. - LlamaEdge: this tool builds an OpenAI-compatible API server for Yi-34B-Chat using a portable Wasm (WebAssembly) file, powered by Rust. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Tech report For detailed capabilities of the Yi series model, see Yi: Open Foundation Models by 01.AI. ### Citation ## Benchmarks - Chat model performance - Base model performance ### Chat model performance Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more. !Chat model performance <details> <summary> Evaluation methods and challenges. ⬇️ </summary> - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed. - **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. - **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results. <strong>*</strong>: C-Eval results are evaluated on the validation datasets </details> ### Base model performance #### Yi-34B and Yi-34B-200K The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more. !Base model performance <details> <summary> Evaluation methods. ⬇️</summary> - **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass. - **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences. - **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content. - **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline. - **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension. - **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category \"Math & Code\". - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated. </details> #### Yi-9B Yi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. !Yi-9B benchmark - details - In terms of **overall** ability (Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - overall - In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - code - In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - math - In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - text <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Who can use Yi? Everyone! 🙌 ✅ The code and weights of the Yi series models are distributed under the Apache 2.0 license, which means the Yi series models are free for personal usage, academic purposes, and commercial use. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Misc. ### Acknowledgments A heartfelt thank you to each of you who have made contributions to the Yi community! You have helped Yi not just a project, but a vibrant, growing home for innovation. ![yi contributors]( <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Disclaimer We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### License The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license. If you create derivative works based on this model, please include the following attribution in your derivative works: This work is a derivative of [The Yi Series Model You Base On] by 01.AI, used under the Apache 2.0 License. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p>",
20
- "model_explanation_gemini": "A bilingual, open-source large language model (Yi-6B) designed for text generation, excelling in language understanding, reasoning, and comprehension in both English and Chinese. \n\n**Features**: \n- Text generation \n- Bilingual (English/Chinese) support \n- Strong performance in reasoning and comprehension \n- Open-source (Apache-2.0 license) \n\n**Comparison**: \nOutperforms many open-source models (e.g., Falcon-180B, Llama-70B)"
 
 
 
 
 
 
21
  }
 
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 widget: - example_title: \"Yi-34B-Chat\" text: \"hi\" output: text: \" Hello! How can I assist you today?\" - example_title: \"Yi-34B\" text: \"There's a place where time stands still. A place of breath taking wonder, but also\" output: text: \" an eerie sense that something is just not right…\\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?\" pipeline_tag: text-generation --- <div align=\"center\"> <picture> <source media=\"(prefers-color-scheme: dark)\" srcset=\" width=\"200px\"> <source media=\"(prefers-color-scheme: light)\" srcset=\" width=\"200px\"> <img alt=\"specify theme context for images\" src=\" </picture> </br> </br> <div style=\"display: inline-block;\"> <a href=\" <img src=\" </a> </div> <div style=\"display: inline-block;\"> <a href=\"mailto:[email protected]\"> <img src=\" </a> </div> </div> <div align=\"center\"> <h3 align=\"center\">Building the Next Generation of Open-Source and Bilingual LLMs</h3> </div> <p align=\"center\"> 🤗 <a href=\" target=\"_blank\">Hugging Face</a> • 🤖 <a href=\" target=\"_blank\">ModelScope</a> • ✡️ <a href=\" target=\"_blank\">WiseModel</a> </p> <p align=\"center\"> 👩‍🚀 Ask questions or discuss ideas on <a href=\" target=\"_blank\"> GitHub </a> </p> <p align=\"center\"> 👋 Join us on <a href=\" target=\"_blank\"> 👾 Discord </a> or <a href=\"有官方的微信群嘛 · Issue #43 · 01-ai/Yi\" target=\"_blank\"> 💬 WeChat </a> </p> <p align=\"center\"> 📝 Check out <a href=\" Yi Tech Report </a> </p> <p align=\"center\"> 📚 Grow at <a href=\"#learning-hub\"> Yi Learning Hub </a> </p> <!-- DO NOT REMOVE ME --> <hr> <details open> <summary></b>📕 Table of Contents</b></summary> - What is Yi? - Introduction - Models - Chat models - Base models - Model info - News - How to use Yi? - Quick start - Choose your path - pip - docker - llama.cpp - conda-lock - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub - Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Base model performance - Chat model performance - Tech report - Citation - Who can use Yi? - Misc. - Acknowledgements - Disclaimer - License </details> <hr> # What is Yi? ## Introduction - 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example, - Yi-34B-Chat model **landed in second place (following GPT-4 Turbo)**, outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024). - Yi-34B model **ranked first among all existing open-source models** (such as Falcon-180B, Llama-70B, Claude) in **both English and Chinese** on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023). - 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem. <details style=\"display: inline;\"><summary> If you're interested in Yi's adoption of Llama architecture and license usage policy, see <span style=\"color: green;\">Yi's relation with Llama.</span> ⬇️</summary> <ul> <br> > 💡 TL;DR > > The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama. - Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018. - Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi. - Thanks to the Transformer and Llama architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems. - However, the Yi series models are NOT derivatives of Llama, as they do not use Llama's weights. - As Llama's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure. - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing Llama on the Alpaca Leaderboard in Dec 2023. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## News <details> <summary>🔥 <b>2024-07-29</b>: The <a href=\" Cookbook 1.0 </a> is released, featuring tutorials and examples in both Chinese and English.</summary> </details> <details> <summary>🎯 <b>2024-05-13</b>: The <a href=\" series models </a> are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.</summary> </details> <details> <summary>🎯 <b>2024-03-16</b>: The <code>Yi-9B-200K</code> is open-sourced and available to the public.</summary> </details> <details> <summary>🎯 <b>2024-03-08</b>: <a href=\" Tech Report</a> is published! </summary> </details> <details open> <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary> <br> In the \"Needle-in-a-Haystack\" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance. </details> <details open> <summary>🎯 <b>2024-03-06</b>: The <code>Yi-9B</code> is open-sourced and available to the public.</summary> <br> <code>Yi-9B</code> stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. </details> <details open> <summary>🎯 <b>2024-01-23</b>: The Yi-VL models, <code><a href=\" and <code><a href=\" are open-sourced and available to the public.</summary> <br> <code><a href=\" has ranked <strong>first</strong> among all existing open-source models in the latest benchmarks, including <a href=\" and <a href=\" (based on data available up to January 2024).</li> </details> <details> <summary>🎯 <b>2023-11-23</b>: <a href=\"#chat-models\">Chat models</a> are open-sourced and available to the public.</summary> <br>This release contains two chat models based on previously released base models, two 8-bit models quantized by GPTQ, and two 4-bit models quantized by AWQ. - - - - - - You can try some of them interactively at: - Hugging Face - Replicate </details> <details> <summary>🔔 <b>2023-11-23</b>: The Yi Series Models Community License Agreement is updated to <a href=\" </details> <details> <summary>🔥 <b>2023-11-08</b>: Invited test of Yi-34B chat model.</summary> <br>Application form: - English - Chinese </details> <details> <summary>🎯 <b>2023-11-05</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B-200K</code> and <code>Yi-34B-200K</code>, are open-sourced and available to the public.</summary> <br>This release contains two base models with the same parameter sizes as the previous release, except that the context window is extended to 200K. </details> <details> <summary>🎯 <b>2023-11-02</b>: <a href=\"#base-models\">The base models, </a><code>Yi-6B</code> and <code>Yi-34B</code>, are open-sourced and available to the public.</summary> <br>The first public release contains two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B. Both of them are trained with 4K sequence length and can be extended to 32K during inference time. </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Models Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements. If you want to deploy Yi models, make sure you meet the software and hardware requirements. ### Chat models | Model | Download | |---|---| |Yi-34B-Chat | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-4bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-Chat-8bits | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub> ### Base models | Model | Download | |---|---| |Yi-34B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-34B-200K|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B|• 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel| |Yi-9B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B| • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | |Yi-6B-200K | • 🤗 Hugging Face • 🤖 ModelScope • 🟣 wisemodel | <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. <br> - If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run to download the weight. </sup></sub> ### Model info - For chat and base models <table> <thead> <tr> <th>Model</th> <th>Intro</th> <th>Default context window</th> <th>Pretrained tokens</th> <th>Training Data Date</th> </tr> </thead> <tbody><tr> <td>6B series models</td> <td>They are suitable for personal and academic use.</td> <td rowspan=\"3\">4K</td> <td>3T</td> <td rowspan=\"3\">Up to June 2023</td> </tr> <tr> <td>9B series models</td> <td>It is the best at coding and math in the Yi series models.</td> <td>Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens.</td> </tr> <tr> <td>34B series models</td> <td>They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It&#39;s a cost-effective solution that&#39;s affordable and equipped with emergent ability.</td> <td>3T</td> </tr> </tbody></table> - For chat models <details style=\"display: inline;\"><summary>For chat model limitations, see the explanations below. ⬇️</summary> <ul> <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training. <br>However, this higher diversity might amplify certain existing issues, including: <li>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.</li> <li>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.</li> <li>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.</li> <li>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.</li> </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # How to use Yi? - Quick start - Choose your path - pip - docker - conda-lock - llama.cpp - Web demo - Fine-tuning - Quantization - Deployment - FAQ - Learning hub ## Quick start > **💡 Tip**: If you want to get started with the Yi model and explore different methods for inference, check out the Yi Cookbook. ### Choose your path Select one of the following paths to begin your journey with Yi! !Quick start - Choose your path #### 🎯 Deploy Yi locally If you prefer to deploy Yi models locally, - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods: - pip - Docker - conda-lock - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use llama.cpp. #### 🎯 Not to deploy Yi locally If you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options. ##### 🙋‍♀️ Run Yi with APIs If you want to explore more features of Yi, you can adopt one of these methods: - Yi APIs (Yi official) - Early access has been granted to some applicants. Stay tuned for the next round of access! - Yi APIs (Replicate) ##### 🙋‍♀️ Run Yi in playground If you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options: - Yi-34B-Chat-Playground (Yi official) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). - Yi-34B-Chat-Playground (Replicate) ##### 🙋‍♀️ Chat with Yi If you want to chat with Yi, you can use one of these online services, which offer a similar user experience: - Yi-34B-Chat (Yi official on Hugging Face) - No registration is required. - Yi-34B-Chat (Yi official beta) - Access is available through a whitelist. Welcome to apply (fill out a form in English or Chinese). <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - pip This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference. #### Step 0: Prerequisites - Make sure Python 3.10 or a later version is installed. - If you want to run other Yi models, see software and hardware requirements. #### Step 1: Prepare your environment To set up the environment and install the required packages, execute the following command. #### Step 2: Download the Yi model You can download the weights and tokenizer of Yi models from the following sources: - Hugging Face - ModelScope - WiseModel #### Step 3: Perform inference You can perform inference with Yi chat or base models as below. ##### Perform inference with Yi chat model 1. Create a file named and copy the following content to it. 2. Run . Then you can see an output similar to the one below. 🥳 ##### Perform inference with Yi base model - Yi-34B The steps are similar to pip - Perform inference with Yi chat model. You can use the existing file []( Then you can see an output similar to the one below. 🥳 <details> <summary>Output. ⬇️ </summary> <br> **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry, **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up... </details> - Yi-9B Input Output <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quick start - Docker <details> <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️</summary> <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> or <strong>4*4090</strong> locally and then performing inference. <h4>Step 0: Prerequisites</h4> <p>Make sure you've installed <a href=\" and <a href=\" <h4> Step 1: Start Docker </h4> <pre><code>docker run -it --gpus all \\ -v &lt;your-model-path&gt;: /models ghcr.io/01-ai/yi:latest </code></pre> <p>Alternatively, you can pull the Yi Docker image from <code>registry.lingyiwanwu.com/ci/01-ai/yi:latest</code>.</p> <h4>Step 2: Perform inference</h4> <p>You can perform inference with Yi chat or base models as below.</p> <h5>Perform inference with Yi chat model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-chat-model\">pip - Perform inference with Yi chat model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>model_path = '&lt;your-model-mount-path&gt;'</code> instead of <code>model_path = '&lt;your-model-path&gt;'</code>.</p> <h5>Perform inference with Yi base model</h5> <p>The steps are similar to <a href=\"#perform-inference-with-yi-base-model\">pip - Perform inference with Yi base model</a>.</p> <p><strong>Note</strong> that the only difference is to set <code>--model &lt;your-model-mount-path&gt;'</code> instead of <code>model &lt;your-model-path&gt;</code>.</p> </details> ### Quick start - conda-lock <details> <summary>You can use <code><a href=\" to generate fully reproducible lock files for conda environments. ⬇️</summary> <br> You can refer to <a href=\" for the exact versions of the dependencies. Additionally, you can utilize <code><a href=\" for installing these dependencies. <br> To install the dependencies, follow these steps: 1. Install micromamba by following the instructions available <a href=\" 2. Execute <code>micromamba install -y -n yi -f conda-lock.yml</code> to create a conda environment named <code>yi</code> and install the necessary dependencies. </details> ### Quick start - llama.cpp <a href=\" following tutorial </a> will guide you through every step of running a quantized model (<a href=\" locally and then performing inference. <details> <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️</summary> <br><a href=\" tutorial</a> guides you through every step of running a quantized model (<a href=\" locally and then performing inference.</p> - Step 0: Prerequisites - Step 1: Download llama.cpp - Step 2: Download Yi model - Step 3: Perform inference #### Step 0: Prerequisites - This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip. - Make sure []( is installed on your machine. #### Step 1: Download To clone the []( repository, run the following command. #### Step 2: Download Yi model 2.1 To clone XeIaso/yi-chat-6B-GGUF with just pointers, run the following command. 2.2 To download a quantized Yi model (yi-chat-6b.Q2_K.gguf), run the following command. #### Step 3: Perform inference To perform inference with the Yi model, you can use one of the following methods. - Method 1: Perform inference in terminal - Method 2: Perform inference in web ##### Method 1: Perform inference in terminal To compile using 4 threads and then conduct inference, navigate to the directory, and run the following command. > ##### Tips > > - Replace with the actual path of your model. > > - By default, the model operates in completion mode. > > - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run to check detailed descriptions and usage. Now you have successfully asked a question to the Yi model and got an answer! 🥳 ##### Method 2: Perform inference in web 1. To initialize a lightweight and swift chatbot, run the following command. Then you can get an output like this: 2. To access the chatbot interface, open your web browser and enter into the address bar. !Yi model chatbot interface - llama.cpp 3. Enter a question, such as \"How do you feed your pet fox? Please answer this question in 6 simple steps\" into the prompt window, and you will receive a corresponding answer. !Ask a question to Yi model - llama.cpp </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Web demo You can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario). Step 1: Prepare your environment. Step 2: Download the Yi model. Step 3. To start a web service locally, run the following command. You can access the web UI by entering the address provided in the console into your browser. !Quick start - web demo <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Fine-tuning Once finished, you can compare the finetuned model and the base model with the following command: <details style=\"display: inline;\"><summary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ </summary> <ul> ### Finetune code for Yi 6B and 34B #### Preparation ##### From Image By default, we use a small dataset from BAAI/COIG to finetune the base model. You can also prepare your customized dataset in the following format: And then mount them in the container to replace the default ones: ##### From Local Server Make sure you have conda. If not, use Then, create a conda env: #### Hardware Setup For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended. For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh). A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB. #### Quick Start Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like: Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-static. has example datasets, which are modified from BAAI/COIG into the scripts folder, copy and paste the script, and run. For example: For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes. For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient. #### Evaluation Then you'll see the answer from both the base model and the finetuned model. </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Quantization #### GPT-Q Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### GPT-Q quantization GPT-Q is a PTQ (Post-Training Quantization) method. It saves memory and provides potential speedups while retaining the accuracy of the model. Yi models can be GPT-Q quantized without a lot of efforts. We provide a step-by-step tutorial below. To run GPT-Q, we will use AutoGPTQ and exllama. And the huggingface transformers has integrated optimum and auto-gptq to perform GPTQ quantization on language models. ##### Do Quantization The script is provided for you to perform GPT-Q quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> #### AWQ Once finished, you can then evaluate the resulting model as follows: <details style=\"display: inline;\"><summary>For details, see the explanations below. ⬇️</summary> <ul> #### AWQ quantization AWQ is a PTQ (Post-Training Quantization) method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs. Yi models can be AWQ quantized without a lot of efforts. We provide a step-by-step tutorial below. To run AWQ, we will use AutoAWQ. ##### Do Quantization The script is provided for you to perform AWQ quantization: ##### Run Quantized Model You can run a quantized model using the : </ul> </details> <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Deployment If you want to deploy Yi models, make sure you meet the software and hardware requirements. #### Software requirements Before using Yi quantized models, make sure you've installed the correct software listed below. | Model | Software |---|--- Yi 4-bit quantized models | AWQ and CUDA Yi 8-bit quantized models | GPTQ and CUDA #### Hardware requirements Before deploying Yi in your environment, make sure your hardware meets the following requirements. ##### Chat models | Model | Minimum VRAM | Recommended GPU Example | |:----------------------|:--------------|:-------------------------------------:| | Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)<br> 1 x RTX 4060 (8 GB) | | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB) <br> 1 x RTX 4060 (8 GB) | | Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)<br> 1 x A800 (80GB) | | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) <br> 1 x A100 (40 GB) | | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB) <br> 2 x RTX 4090 (24 GB)<br> 1 x A800 (40 GB) | Below are detailed minimum VRAM requirements under different batch use cases. | Model | batch=1 | batch=4 | batch=16 | batch=32 | | ----------------------- | ------- | ------- | -------- | -------- | | Yi-6B-Chat | 12 GB | 13 GB | 15 GB | 18 GB | | Yi-6B-Chat-4bits | 4 GB | 5 GB | 7 GB | 10 GB | | Yi-6B-Chat-8bits | 7 GB | 8 GB | 10 GB | 14 GB | | Yi-34B-Chat | 65 GB | 68 GB | 76 GB | > 80 GB | | Yi-34B-Chat-4bits | 19 GB | 20 GB | 30 GB | 40 GB | | Yi-34B-Chat-8bits | 35 GB | 37 GB | 46 GB | 58 GB | ##### Base models | Model | Minimum VRAM | Recommended GPU Example | |----------------------|--------------|:-------------------------------------:| | Yi-6B | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) | | Yi-6B-200K | 50 GB | 1 x A800 (80 GB) | | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) | | Yi-34B | 72 GB | 4 x RTX 4090 (24 GB) <br> 1 x A800 (80 GB) | | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) | <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### FAQ <details> <summary> If you have any questions while using the Yi series models, the answers provided below could serve as a helpful reference for you. ⬇️</summary> <br> #### 💡Fine-tuning - <strong>Base model or Chat model - which to fine-tune?</strong> <br>The choice of pre-trained language model for fine-tuning hinges on the computational resources you have at your disposal and the particular demands of your task. - If you are working with a substantial volume of fine-tuning data (say, over 10,000 samples), the Base model could be your go-to choice. - On the other hand, if your fine-tuning data is not quite as extensive, opting for the Chat model might be a more fitting choice. - It is generally advisable to fine-tune both the Base and Chat models, compare their performance, and then pick the model that best aligns with your specific requirements. - <strong>Yi-34B versus Yi-34B-Chat for full-scale fine-tuning - what is the difference?</strong> <br> The key distinction between full-scale fine-tuning on and comes down to the fine-tuning approach and outcomes. - Yi-34B-Chat employs a Special Fine-Tuning (SFT) method, resulting in responses that mirror human conversation style more closely. - The Base model's fine-tuning is more versatile, with a relatively high performance potential. - If you are confident in the quality of your data, fine-tuning with could be your go-to. - If you are aiming for model-generated responses that better mimic human conversational style, or if you have doubts about your data quality, might be your best bet. #### 💡Quantization - <strong>Quantized model versus original model - what is the performance gap?</strong> - The performance variance is largely contingent on the quantization method employed and the specific use cases of these models. For instance, when it comes to models provided by the AWQ official, from a Benchmark standpoint, quantization might result in a minor performance drop of a few percentage points. - Subjectively speaking, in situations like logical reasoning, even a 1% performance shift could impact the accuracy of the output results. #### 💡General - <strong>Where can I source fine-tuning question answering datasets?</strong> - You can find fine-tuning question answering datasets on platforms like Hugging Face, with datasets like m-a-p/COIG-CQIA readily available. - Additionally, Github offers fine-tuning frameworks, such as hiyouga/LLaMA-Factory, which integrates pre-made datasets. - <strong>What is the GPU memory requirement for fine-tuning Yi-34B FP16?</strong> <br> The GPU memory needed for fine-tuning 34B FP16 hinges on the specific fine-tuning method employed. For full parameter fine-tuning, you'll need 8 GPUs each with 80 GB; however, more economical solutions like Lora require less. For more details, check out hiyouga/LLaMA-Factory. Also, consider using BF16 instead of FP16 for fine-tuning to optimize performance. - <strong>Are there any third-party platforms that support chat functionality for the Yi-34b-200k model?</strong> <br> If you're looking for third-party Chats, options include fireworks.ai. </details> ### Learning hub <details> <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️</summary> <br> Welcome to the Yi learning hub! Whether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more. The content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions! At the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below. With all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳 #### Tutorials ##### Blog tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | 使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG 应用(三):AI 电影推荐 | 2024-05-20 | 苏洋 | | 使用autodl服务器,在A40显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度18 words-s | 2024-05-20 | fly-iot | | Yi-VL 最佳实践 | 2024-05-20 | ModelScope | | 一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型 | 2024-05-13 | Second State | | 零一万物开源Yi-1.5系列大模型 | 2024-05-13 | 刘聪 | | 零一万物Yi-1.5系列模型发布并开源! 34B-9B-6B 多尺寸,魔搭社区推理微调最佳实践教程来啦! | 2024-05-13 | ModelScope | | Yi-34B 本地部署简单测试 | 2024-05-13 | 漆妮妮 | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(上) | 2024-05-13 | Words worth | | 驾辰龙跨Llama持Wasm,玩转Yi模型迎新春过大年(下篇) | 2024-05-13 | Words worth | | Ollama新增两个命令,开始支持零一万物Yi-1.5系列模型 | 2024-05-13 | AI工程师笔记 | | 使用零一万物 200K 模型和 Dify 快速搭建模型应用 | 2024-05-13 | 苏洋 | | (持更) 零一万物模型折腾笔记:社区 Yi-34B 微调模型使用 | 2024-05-13 | 苏洋 | | Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探 | 2024-05-11 | 江湖评谈 | | 技术布道 Vue及Python调用零一万物模型和Prompt模板(通过百度千帆大模型平台) | 2024-05-11 | MumuLab | | 多模态大模型Yi-VL-plus体验 效果很棒 | 2024-04-27 | 大家好我是爱因 | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,并使用vllm优化加速,显存占用42G,速度23 words-s | 2024-04-27 | fly-iot | | Getting Started with Yi-1.5-9B-Chat | 2024-04-27 | Second State | | 基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记 | 2024-04-24 | 正经人王同学 | | 【AI开发:语言】一、Yi-34B超大模型本地部署CPU和GPU版 | 2024-04-21 | My的梦想已实现 | | 【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words-s,vllm要求算力在7以上的显卡就可以 | 2024-03-22 | fly-iot | | 零一万物大模型部署+微调总结 | 2024-03-22 | v_wus | | 零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案 | 2024-03-02 | 郝铠锋 | | Yi-34B微调训练 | 2024-03-02 | lsjlnd | | 实测零一万物Yi-VL多模态语言模型:能准确“识图吃瓜” | 2024-02-02 | 苏洋 | | 零一万物开源Yi-VL多模态大模型,魔搭社区推理&微调最佳实践来啦! | 2024-01-26 | ModelScope | | 单卡 3 小时训练 Yi-6B 大模型 Agent:基于 Llama Factory 实战 | 2024-01-22 | 郑耀威 | | 零一科技Yi-34B Chat大模型环境搭建&推理 | 2024-01-15 | 要养家的程序员 | | 基于LLaMA Factory,单卡3小时训练专属大模型 Agent | 2024-01-15 | 机器学习社区 | | 双卡 3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录 | 2024-01-02 | 漆妮妮 | | 【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型(baichuan2-13b、InternLM-20b、Yi-34b) | 2024-01-02 | aq_Seabiscuit | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | 零一万物模型官方 Yi-34B 模型本地离线运行部署使用笔记(物理机和docker两种部署方式),200K 超长文本内容,34B 干翻一众 70B 模型,打榜分数那么高,这模型到底行不行? | 2023-12-28 | 代码讲故事 | | LLM - 大模型速递之 Yi-34B 入门与 LoRA 微调 | 2023-12-18 | BIT_666 | | 通过vllm框架进行大模型推理 | 2023-12-18 | 土山炮 | | CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案 | 2023-12-12 | 苏洋 | | 零一万物模型折腾笔记:官方 Yi-34B 模型基础使用 | 2023-12-10 | 苏洋 | | Running Yi-34B-Chat locally using LlamaEdge | 2023-11-30 | Second State | | 本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存 | 2023-11-26 | 苏洋 | ##### GitHub Project | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------- | | yi-openai-proxy | 2024-05-11 | 苏洋 | | 基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集 | 2024-04-29 | 正经人王同学 | | 基于视频网站和零一万物大模型构建大语言模型高质量训练数据集 | 2024-04-25 | 正经人王同学 | | 基于零一万物yi-34b-chat-200k输入任意文章地址,点击按钮即可生成无广告或推广内容的简要笔记,并生成分享图给好友 | 2024-04-24 | 正经人王同学 | | Food-GPT-Yi-model | 2024-04-21 | Hubert S | ##### Video tutorials | Deliverable | Date | Author | | ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ | | Run dolphin-2.2-yi-34b on IoT Devices | 2023-11-30 | Second State | | 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 | 2023-12-28 | 漆妮妮 | | Install Yi 34B Locally - Chinese English Bilingual LLM | 2023-11-05 | Fahd Mirza | | Dolphin Yi 34b - Brand New Foundational Model TESTED | 2023-11-27 | Matthew Berman | | Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来 | 2024-01-28 | 漆妮妮 | | 4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型 | 2024-05-14 | titan909 | | Yi-1.5: True Apache 2.0 Competitor to LLAMA-3 | 2024-05-13 | Prompt Engineering | | Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks | 2024-05-13 | Fahd Mirza | | how to install Ollama and run Yi 6B | 2024-05-13 | Ridaa Davids | | 地表最强混合智能AI助手:llama3_70B+Yi_34B+Qwen1.5_110B | 2024-05-04 | 朱扎特 | | ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答 | 2024-05-03 | 朱扎特 | | 基于Yi-34B的领域知识问答项目演示 | 2024-05-02 | 朱扎特 | | 使用RTX4090+GaLore算法 全参微调Yi-6B大模型 | 2024-03-24 | 小工蚂创始人 | | 无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行 | 2024-03-20 | 刘悦的技术博客 | | 无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲 | 2024-03-16 | 刘悦的技术博客 | | 量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署 | 2024-03-05 | 白鸽巢 | | Yi-VL-34B(5):使用3个3090显卡24G版本,运行Yi-VL-34B模型,支持命令行和web界面方式,理解图片的内容转换成文字 | 2024-02-27 | fly-iot | | Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏 | 2024-02-25 | 魚蟲蟲 | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2 | 2024-02-23 | 魚蟲蟲 | | 【wails】(2):使用go-llama.cpp 运行 yi-01-6b大模型,使用本地CPU运行,速度还可以,等待下一版本更新 | 2024-02-20 | fly-iot | | 【xinference】(6):在autodl上,使用xinference部署yi-vl-chat和qwen-vl-chat模型,可以使用openai调用成功 | 2024-02-06 | fly-iot | | 无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1 | 2024-02-05 | 魚蟲蟲 | | 2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南 | 2024-01-30 | 小饭护法要转码 | | Best Story Writing AI Model - Install Yi 6B 200K Locally on Windows | 2024-01-22 | Fahd Mirza | | Mac 本地运行大语言模型方法与常见问题指南(Yi 34B 模型+32 GB 内存测试) | 2024-01-21 | 小吴苹果机器人 | | 【Dify知识库】(11):Dify0.4.9改造支持MySQL,成功接入yi-6b 做对话,本地使用fastchat启动,占8G显存,完成知识库配置 | 2024-01-21 | fly-iot | | 这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥 | 2024-01-20 | 晓漫吧 | | 大模型推理 NvLink 桥接器有用吗|双卡 A6000 测试一下 | 2024-01-17 | 漆妮妮 | | 大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能 | 2024-01-15 | 漆妮妮 | | C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来 | 2024-01-11 | 漆妮妮 | | 双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录 | 2024-01-01 | 漆妮妮 | | 手把手教学!使用 vLLM 快速部署 Yi-34B-Chat | 2023-12-26 | 白鸽巢 | | 如何训练企业自己的大语言模型?Yi-6B LORA微调演示 #小工蚁 | 2023-12-21 | 小工蚂创始人 | | Yi-34B(4):使用4个2080Ti显卡11G版本,运行Yi-34B模型,5年前老显卡是支持的,可以正常运行,速度 21 words/s | 2023-12-02 | fly-iot | | 使用autodl服务器,RTX 3090 * 3 显卡上运行, Yi-34B-Chat模型,显存占用60G | 2023-12-01 | fly-iot | | 使用autodl服务器,两个3090显卡上运行, Yi-34B-Chat-int4模型,用vllm优化,增加 --num-gpu 2,速度23 words/s | 2023-12-01 | fly-iot | | Yi大模型一键本地部署 技术小白玩转AI | 2023-12-01 | 技术小白玩转AI | | 01.AI's Yi-6B: Overview and Fine-Tuning | 2023-11-28 | AI Makerspace | | Yi 34B Chat LLM outperforms Llama 70B | 2023-11-27 | DLExplorer | | How to run open source models on mac Yi 34b on m3 Max | 2023-11-26 | TECHNO PREMIUM | | Yi-34B - 200K - The BEST & NEW CONTEXT WINDOW KING | 2023-11-24 | Prompt Engineering | | Yi 34B : The Rise of Powerful Mid-Sized Models - Base,200k & Chat | 2023-11-24 | Sam Witteveen | | 在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b(还可作为私有OpenAI API服务器) | 2023-11-15 | Second State | | Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server) | 2023-11-14 | Second State | | How to Install Yi 34B 200K Llamafied on Windows Laptop | 2023-11-11 | Fahd Mirza | </details> # Why Yi? - Ecosystem - Upstream - Downstream - Serving - Quantization - Fine-tuning - API - Benchmarks - Chat model performance - Base model performance - Yi-34B and Yi-34B-200K - Yi-9B ## Ecosystem Yi has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity. - Upstream - Downstream - Serving - Quantization - Fine-tuning - API ### Upstream The Yi series models follow the same model architecture as Llama. By choosing Yi, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency. For example, the Yi series models are saved in the format of the Llama model. You can directly use and to load the model. For more information, see Use the chat model. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Downstream > 💡 Tip > > - Feel free to create a PR and share the fantastic work you've built using the Yi series models. > > - To help others quickly understand your work, it is recommended to use the format of . #### Serving If you want to get up with Yi in a few minutes, you can use the following services built upon Yi. - Yi-34B-Chat: you can chat with Yi using one of the following platforms: - Yi-34B-Chat | Hugging Face - Yi-34B-Chat | Yi Platform: **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in English or Chinese) and experience it firsthand! - Yi-6B-Chat (Replicate): you can use this model with more options by setting additional parameters and calling APIs. - ScaleLLM: you can use this service to run Yi models locally with added flexibility and customization. #### Quantization If you have limited computational capabilities, you can use Yi's quantized models as follows. These quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage. - TheBloke/Yi-34B-GPTQ - TheBloke/Yi-34B-GGUF - TheBloke/Yi-34B-AWQ #### Fine-tuning If you're seeking to explore the diverse capabilities within Yi's thriving family, you can delve into Yi's fine-tuned models as below. - TheBloke Models: this site hosts numerous fine-tuned models derived from various LLMs including Yi. This is not an exhaustive list for Yi, but to name a few sorted on downloads: - TheBloke/dolphin-2_2-yi-34b-AWQ - TheBloke/Yi-34B-Chat-AWQ - TheBloke/Yi-34B-Chat-GPTQ - SUSTech/SUS-Chat-34B: this model ranked first among all models below 70B and outperformed the twice larger deepseek-llm-67b-chat. You can check the result on the Open LLM Leaderboard. - OrionStarAI/OrionStar-Yi-34B-Chat-Llama: this model excelled beyond other models (such as GPT-4, Qwen-14B-Chat, Baichuan2-13B-Chat) in C-Eval and CMMLU evaluations on the OpenCompass LLM Leaderboard. - NousResearch/Nous-Capybara-34B: this model is trained with 200K context length and 3 epochs on the Capybara dataset. #### API - amazing-openai-api: this tool converts Yi model APIs into the OpenAI API format out of the box. - LlamaEdge: this tool builds an OpenAI-compatible API server for Yi-34B-Chat using a portable Wasm (WebAssembly) file, powered by Rust. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ## Tech report For detailed capabilities of the Yi series model, see Yi: Open Foundation Models by 01.AI. ### Citation ## Benchmarks - Chat model performance - Base model performance ### Chat model performance Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more. !Chat model performance <details> <summary> Evaluation methods and challenges. ⬇️ </summary> - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed. - **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. - **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results. <strong>*</strong>: C-Eval results are evaluated on the validation datasets </details> ### Base model performance #### Yi-34B and Yi-34B-200K The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more. !Base model performance <details> <summary> Evaluation methods. ⬇️</summary> - **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass. - **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences. - **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content. - **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline. - **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension. - **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category \"Math & Code\". - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated. </details> #### Yi-9B Yi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension. !Yi-9B benchmark - details - In terms of **overall** ability (Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - overall - In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - code - In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B. !Yi-9B benchmark - math - In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B. !Yi-9B benchmark - text <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Who can use Yi? Everyone! 🙌 ✅ The code and weights of the Yi series models are distributed under the Apache 2.0 license, which means the Yi series models are free for personal usage, academic purposes, and commercial use. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> # Misc. ### Acknowledgments A heartfelt thank you to each of you who have made contributions to the Yi community! You have helped Yi not just a project, but a vibrant, growing home for innovation. ![yi contributors]( <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### Disclaimer We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p> ### License The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license. If you create derivative works based on this model, please include the following attribution in your derivative works: This work is a derivative of [The Yi Series Model You Base On] by 01.AI, used under the Apache 2.0 License. <p align=\"right\"> [ <a href=\"#top\">Back to top ⬆️ </a> ] </p>",
20
+ "model_explanation_gemini": "A bilingual, open-source large language model (Yi-6B) designed for text generation, excelling in language understanding, reasoning, and comprehension in both English and Chinese. \n\n**Features**: \n- Text generation \n- Bilingual (English/Chinese) support \n- Strong performance in reasoning and comprehension \n- Open-source (Apache-2.0 license) \n\n**Comparison**: \nOutperforms many open-source models (e.g., Falcon-180B, Llama-70B)",
21
+ "release_year": "2024",
22
+ "parameter_count": "6B",
23
+ "is_fine_tuned": false,
24
+ "category": "Large Language Model",
25
+ "model_family": "LLaMA",
26
+ "api_enhanced": true
27
  }
model_data_json/1-800-BAD-CODE_xlm-roberta_punctuation_fullstop_truecase.json CHANGED
@@ -61,5 +61,11 @@
61
  "region:us"
62
  ],
63
  "description": "--- license: apache-2.0 library_name: generic tags: - text2text-generation - punctuation - sentence-boundary-detection - truecasing - true-casing language: - af - am - ar - bg - bn - de - el - en - es - et - fa - fi - fr - gu - hi - hr - hu - id - is - it - ja - kk - kn - ko - ky - lt - lv - mk - ml - mr - nl - or - pa - pl - ps - pt - ro - ru - rw - so - sr - sw - ta - te - tr - uk - zh widget: - text: \"hola amigo cómo estás es un día lluvioso hoy\" - text: \"please rsvp for the party asap preferably before 8 pm tonight\" - text: \"este modelo fue entrenado en un gpu a100 en realidad no se que dice esta frase lo traduje con nmt\" - text: \"此模型向文本添加标点符号它支持47种语言并在a100gpu上接受过训练它可以在每种语言上运行而无需每种语言的特殊路径\" - text: \"यह मॉडल 47 भाषाओं में विराम चिह्न जोड़ता है यह भाषा विशिष्ट पथ के बिना काम करता है यह प्रत्येक भाषा के लिए विशेष पथों के बिना प्रत्येक भाषा पर कार्य कर सकता है\" --- # Model Overview This is an fine-tuned to restore punctuation, true-case (capitalize), and detect sentence boundaries (full stops) in 47 languages. # Usage If you want to just play with the model, the widget on this page will suffice. To use the model offline, the following snippets show how to use the model both with a wrapper (that I wrote, available from ) and manual usuage (using the ONNX and SentencePiece models in this repo). ## Usage via package <details> <summary>Click to see usage with wrappers</summary> The easiest way to use this model is to install punctuators: But this is just an ONNX and SentencePiece model, so you may run it as you wish. The input to the API is a list (batch) of strings. Each string will be punctuated, true-cased, and segmented on predicted full stops. The output will therefore be a list of list of strings: one list of segmented sentences per input text. To disable full stops, use . The output will then be a list of strings: one punctuated, true-cased string per input text. <details open> <summary>Example Usage</summary> </details> <details open> <summary>Expected output</summary> </details> </details> ## Manual Usage If you want to use the ONNX and SP models without wrappers, see the following example. <details> <summary>Click to see manual usage</summary> Expected output: </details> &nbsp; # Model Architecture This model implements the following graph, which allows punctuation, true-casing, and fullstop prediction in every language without language-specific behavior: !graph.png <details> <summary>Click to see graph explanations</summary> We start by tokenizing the text and encoding it with XLM-Roberta, which is the pre-trained portion of this graph. Then we predict punctuation before and after every subtoken. Predicting before each token allows for Spanish inverted question marks. Predicting after every token allows for all other punctuation, including punctuation within continuous-script languages and acronyms. We use embeddings to represent the predicted punctuation tokens to inform the sentence boundary head of the punctuation that'll be inserted into the text. This allows proper full stop prediction, since certain punctuation tokens (periods, questions marks, etc.) are strongly correlated with sentence boundaries. We then shift full stop predictions to the right by one, to inform the true-casing head of where the beginning of each new sentence is. This is important since true-casing is strongly correlated to sentence boundaries. For true-casing, we predict predictions per subtoken, where is the number of characters in the subtoken. In practice, is the maximum subtoken length and extra predictions are ignored. Essentially, true-casing is modeled as a multi-label problem. This allows for upper-casing arbitrary characters, e.g., \"NATO\", \"MacDonald\", \"mRNA\", etc. Applying all these predictions to the input text, we can punctuate, true-case, and split sentences in any language. </details> ## Tokenizer <details> <summary>Click to see how the XLM-Roberta tokenizer was un-hacked</summary> Instead of the hacky wrapper used by FairSeq and strangely ported (not fixed) by HuggingFace, the SentencePiece model was adjusted to correctly encode the text. Per HF's comments, The SP model was un-hacked with the following snippet (SentencePiece experts, let me know if there is a problem here): Now we can use just the SP model without a wrapper. </details> ## Post-Punctuation Tokens This model predicts the following set of punctuation tokens after each subtoken: | Token | Description | Relevant Languages | | ---: | :---------- | :----------- | | \\<NULL\\> | No punctuation | All | | \\<ACRONYM\\> | Every character in this subword is followed by a period | Primarily English, some European | | . | Latin full stop | Many | | , | Latin comma | Many | | ? | Latin question mark | Many | | ? | Full-width question mark | Chinese, Japanese | | , | Full-width comma | Chinese, Japanese | | 。 | Full-width full stop | Chinese, Japanese | | 、 | Ideographic comma | Chinese, Japanese | | ・ | Middle dot | Japanese | | । | Danda | Hindi, Bengali, Oriya | | ؟ | Arabic question mark | Arabic | | ; | Greek question mark | Greek | | ። | Ethiopic full stop | Amharic | | ፣ | Ethiopic comma | Amharic | | ፧ | Ethiopic question mark | Amharic | ## Pre-Punctuation Tokens This model predicts the following set of punctuation tokens before each subword: | Token | Description | Relevant Languages | | ---: | :---------- | :----------- | | \\<NULL\\> | No punctuation | All | | ¿ | Inverted question mark | Spanish | # Training Details This model was trained in the NeMo framework on an A100 for approximately 7 hours. You may view the log on tensorboard.dev. This model was trained with News Crawl data from WMT. 1M lines of text for each language was used, except for a few low-resource languages which may have used less. Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author. # Limitations This model was trained on news data, and may not perform well on conversational or informal data. This model is unlikely to be of production quality. It was trained with \"only\" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data. This model over-predicts Spanish question marks, especially the inverted question mark (see metrics below). Since is a rare token, especially in the context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from additional training data that was not used. However, this seems to have \"over-corrected\" the problem and a lot of Spanish question marks are predicted. The model may also over-predict commas. If you find any general limitations not mentioned here, let me know so all limitations can be addressed in the next fine-tuning. # Evaluation In these metrics, keep in mind that 1. The data is noisy 2. Sentence boundaries and true-casing are conditioned on predicted punctuation, which is the most difficult task and sometimes incorrect. When conditioning on reference punctuation, true-casing and SBD is practically 100% for most languages. 4. Punctuation can be subjective. E.g., or When the sentences are longer and more practical, these ambiguities abound and affect all 3 analytics. ## Test Data and Example Generation Each test example was generated using the following procedure: 1. Concatenate 11 random sentences (1 + 10 for each sentence in the test set) 2. Lower-case the concatenated sentence 3. Remove all punctuation Targets are generated as we lower-case letters and remove punctuation. The data is a held-out portion of News Crawl, which has been deduplicated. 3,000 lines of data per language was used, generating 3,000 unique examples of 11 sentences each. We generate 3,000 examples, where example begins with sentence and is followed by 10 random sentences selected from the 3,000 sentence test set. For measuring true-casing and sentence boundary detection, reference punctuation tokens were used for conditioning (see graph above). If we use predicted punctuation instead, then incorrect punctuation will result in true-casing and SBD targets not aligning correctly and these metrics will be artificially low. ## Selected Language Evaluation Reports For now, metrics for a few selected languages are shown below. Given the amount of work required to collect and pretty-print metrics in 47 languages, I'll add more eventually. Expand any of the following tabs to see metrics for that language. <details> <summary>English</summary> </details> <details> <summary>Spanish</summary> </details> <details> <summary>Amharic</summary> </details> <details> <summary>Chinese</summary> </details> <details> <summary>Japanese</summary> </details> <details> <summary>Hindi</summary> </details> <details> <summary>Arabic</summary> </details> &nbsp; # Extra Stuff ## Acronyms, abbreviations, and bi-capitalized words This section briefly demonstrates the models behavior when presented with the following: 1. Acronyms: \"NATO\" 2. Fake acronyms: \"NHTG\" in place of \"NATO\" 3. Ambigous term which could be an acronym or proper noun: \"Tuny\" 3. Bi-capitalized words: \"McDavid\" 4. Intialisms: \"p.m.\" <details open> <summary>Acronyms, etc. inputs</summary> </details> <details open> <summary>Expected output</summary> </details>",
64
- "model_explanation_gemini": "Restores punctuation, true-casing, and sentence boundaries in text across 47 languages.\n\nFeatures: \n- Punctuation restoration \n- True-casing (capitalization) \n- Sentence boundary detection (full stops) \n- Supports 47 languages \n- Uses XLM-Roberta architecture \n- ONNX and SentencePiece models \n\nComparison: Unlike language-specific punctuation models, this handles multiple languages without requiring separate language paths, leveraging XLM-Roberta's multilingual capabilities."
 
 
 
 
 
 
65
  }
 
61
  "region:us"
62
  ],
63
  "description": "--- license: apache-2.0 library_name: generic tags: - text2text-generation - punctuation - sentence-boundary-detection - truecasing - true-casing language: - af - am - ar - bg - bn - de - el - en - es - et - fa - fi - fr - gu - hi - hr - hu - id - is - it - ja - kk - kn - ko - ky - lt - lv - mk - ml - mr - nl - or - pa - pl - ps - pt - ro - ru - rw - so - sr - sw - ta - te - tr - uk - zh widget: - text: \"hola amigo cómo estás es un día lluvioso hoy\" - text: \"please rsvp for the party asap preferably before 8 pm tonight\" - text: \"este modelo fue entrenado en un gpu a100 en realidad no se que dice esta frase lo traduje con nmt\" - text: \"此模型向文本添加标点符号它支持47种语言并在a100gpu上接受过训练它可以在每种语言上运行而无需每种语言的特殊路径\" - text: \"यह मॉडल 47 भाषाओं में विराम चिह्न जोड़ता है यह भाषा विशिष्ट पथ के बिना काम करता है यह प्रत्येक भाषा के लिए विशेष पथों के बिना प्रत्येक भाषा पर कार्य कर सकता है\" --- # Model Overview This is an fine-tuned to restore punctuation, true-case (capitalize), and detect sentence boundaries (full stops) in 47 languages. # Usage If you want to just play with the model, the widget on this page will suffice. To use the model offline, the following snippets show how to use the model both with a wrapper (that I wrote, available from ) and manual usuage (using the ONNX and SentencePiece models in this repo). ## Usage via package <details> <summary>Click to see usage with wrappers</summary> The easiest way to use this model is to install punctuators: But this is just an ONNX and SentencePiece model, so you may run it as you wish. The input to the API is a list (batch) of strings. Each string will be punctuated, true-cased, and segmented on predicted full stops. The output will therefore be a list of list of strings: one list of segmented sentences per input text. To disable full stops, use . The output will then be a list of strings: one punctuated, true-cased string per input text. <details open> <summary>Example Usage</summary> </details> <details open> <summary>Expected output</summary> </details> </details> ## Manual Usage If you want to use the ONNX and SP models without wrappers, see the following example. <details> <summary>Click to see manual usage</summary> Expected output: </details> &nbsp; # Model Architecture This model implements the following graph, which allows punctuation, true-casing, and fullstop prediction in every language without language-specific behavior: !graph.png <details> <summary>Click to see graph explanations</summary> We start by tokenizing the text and encoding it with XLM-Roberta, which is the pre-trained portion of this graph. Then we predict punctuation before and after every subtoken. Predicting before each token allows for Spanish inverted question marks. Predicting after every token allows for all other punctuation, including punctuation within continuous-script languages and acronyms. We use embeddings to represent the predicted punctuation tokens to inform the sentence boundary head of the punctuation that'll be inserted into the text. This allows proper full stop prediction, since certain punctuation tokens (periods, questions marks, etc.) are strongly correlated with sentence boundaries. We then shift full stop predictions to the right by one, to inform the true-casing head of where the beginning of each new sentence is. This is important since true-casing is strongly correlated to sentence boundaries. For true-casing, we predict predictions per subtoken, where is the number of characters in the subtoken. In practice, is the maximum subtoken length and extra predictions are ignored. Essentially, true-casing is modeled as a multi-label problem. This allows for upper-casing arbitrary characters, e.g., \"NATO\", \"MacDonald\", \"mRNA\", etc. Applying all these predictions to the input text, we can punctuate, true-case, and split sentences in any language. </details> ## Tokenizer <details> <summary>Click to see how the XLM-Roberta tokenizer was un-hacked</summary> Instead of the hacky wrapper used by FairSeq and strangely ported (not fixed) by HuggingFace, the SentencePiece model was adjusted to correctly encode the text. Per HF's comments, The SP model was un-hacked with the following snippet (SentencePiece experts, let me know if there is a problem here): Now we can use just the SP model without a wrapper. </details> ## Post-Punctuation Tokens This model predicts the following set of punctuation tokens after each subtoken: | Token | Description | Relevant Languages | | ---: | :---------- | :----------- | | \\<NULL\\> | No punctuation | All | | \\<ACRONYM\\> | Every character in this subword is followed by a period | Primarily English, some European | | . | Latin full stop | Many | | , | Latin comma | Many | | ? | Latin question mark | Many | | ? | Full-width question mark | Chinese, Japanese | | , | Full-width comma | Chinese, Japanese | | 。 | Full-width full stop | Chinese, Japanese | | 、 | Ideographic comma | Chinese, Japanese | | ・ | Middle dot | Japanese | | । | Danda | Hindi, Bengali, Oriya | | ؟ | Arabic question mark | Arabic | | ; | Greek question mark | Greek | | ። | Ethiopic full stop | Amharic | | ፣ | Ethiopic comma | Amharic | | ፧ | Ethiopic question mark | Amharic | ## Pre-Punctuation Tokens This model predicts the following set of punctuation tokens before each subword: | Token | Description | Relevant Languages | | ---: | :---------- | :----------- | | \\<NULL\\> | No punctuation | All | | ¿ | Inverted question mark | Spanish | # Training Details This model was trained in the NeMo framework on an A100 for approximately 7 hours. You may view the log on tensorboard.dev. This model was trained with News Crawl data from WMT. 1M lines of text for each language was used, except for a few low-resource languages which may have used less. Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author. # Limitations This model was trained on news data, and may not perform well on conversational or informal data. This model is unlikely to be of production quality. It was trained with \"only\" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data. This model over-predicts Spanish question marks, especially the inverted question mark (see metrics below). Since is a rare token, especially in the context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from additional training data that was not used. However, this seems to have \"over-corrected\" the problem and a lot of Spanish question marks are predicted. The model may also over-predict commas. If you find any general limitations not mentioned here, let me know so all limitations can be addressed in the next fine-tuning. # Evaluation In these metrics, keep in mind that 1. The data is noisy 2. Sentence boundaries and true-casing are conditioned on predicted punctuation, which is the most difficult task and sometimes incorrect. When conditioning on reference punctuation, true-casing and SBD is practically 100% for most languages. 4. Punctuation can be subjective. E.g., or When the sentences are longer and more practical, these ambiguities abound and affect all 3 analytics. ## Test Data and Example Generation Each test example was generated using the following procedure: 1. Concatenate 11 random sentences (1 + 10 for each sentence in the test set) 2. Lower-case the concatenated sentence 3. Remove all punctuation Targets are generated as we lower-case letters and remove punctuation. The data is a held-out portion of News Crawl, which has been deduplicated. 3,000 lines of data per language was used, generating 3,000 unique examples of 11 sentences each. We generate 3,000 examples, where example begins with sentence and is followed by 10 random sentences selected from the 3,000 sentence test set. For measuring true-casing and sentence boundary detection, reference punctuation tokens were used for conditioning (see graph above). If we use predicted punctuation instead, then incorrect punctuation will result in true-casing and SBD targets not aligning correctly and these metrics will be artificially low. ## Selected Language Evaluation Reports For now, metrics for a few selected languages are shown below. Given the amount of work required to collect and pretty-print metrics in 47 languages, I'll add more eventually. Expand any of the following tabs to see metrics for that language. <details> <summary>English</summary> </details> <details> <summary>Spanish</summary> </details> <details> <summary>Amharic</summary> </details> <details> <summary>Chinese</summary> </details> <details> <summary>Japanese</summary> </details> <details> <summary>Hindi</summary> </details> <details> <summary>Arabic</summary> </details> &nbsp; # Extra Stuff ## Acronyms, abbreviations, and bi-capitalized words This section briefly demonstrates the models behavior when presented with the following: 1. Acronyms: \"NATO\" 2. Fake acronyms: \"NHTG\" in place of \"NATO\" 3. Ambigous term which could be an acronym or proper noun: \"Tuny\" 3. Bi-capitalized words: \"McDavid\" 4. Intialisms: \"p.m.\" <details open> <summary>Acronyms, etc. inputs</summary> </details> <details open> <summary>Expected output</summary> </details>",
64
+ "model_explanation_gemini": "Restores punctuation, true-casing, and sentence boundaries in text across 47 languages.\n\nFeatures: \n- Punctuation restoration \n- True-casing (capitalization) \n- Sentence boundary detection (full stops) \n- Supports 47 languages \n- Uses XLM-Roberta architecture \n- ONNX and SentencePiece models \n\nComparison: Unlike language-specific punctuation models, this handles multiple languages without requiring separate language paths, leveraging XLM-Roberta's multilingual capabilities.",
65
+ "release_year": null,
66
+ "parameter_count": null,
67
+ "is_fine_tuned": false,
68
+ "category": "Large Language Model",
69
+ "model_family": "BERT",
70
+ "api_enhanced": true
71
  }
model_data_json/1aurent_swin_tiny_patch4_window7_224.CTransPath.json CHANGED
@@ -14,5 +14,10 @@
14
  "region:us"
15
  ],
16
  "description": "--- tags: - feature-extraction - image-classification - timm - biology - cancer - histology library_name: timm model-index: - name: ctranspath results: - task: type: image-classification name: Image Classification dataset: name: Camelyon16[Meta] type: image-classification metrics: - type: accuracy value: 96.3 ± 2.6 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[Hist] type: image-classification metrics: - type: accuracy value: 95.8 ± 0.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[HRD] type: image-classification metrics: - type: accuracy value: 77.1 ± 2.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[Mol] type: image-classification metrics: - type: accuracy value: 80.8 ± 1.7 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[OS] type: image-classification metrics: - type: accuracy value: 65.0 ± 6.0 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-CRC[MSI] type: image-classification metrics: - type: accuracy value: 88.5 ± 2.3 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-COAD[OS] type: image-classification metrics: - type: accuracy value: 64.3 ± 5.4 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-NSCLC[CType] type: image-classification metrics: - type: accuracy value: 97.3 ± 0.4 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-LUAD[OS] type: image-classification metrics: - type: accuracy value: 59.1 ± 4.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-LUSC[OS] type: image-classification metrics: - type: accuracy value: 61.5 ± 2.9 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-OV[HRD] type: image-classification metrics: - type: accuracy value: 69.5 ± 7.0 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-RCC[CType] type: image-classification metrics: - type: accuracy value: 98.9 ± 0.2 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-STAD[MSI] type: image-classification metrics: - type: accuracy value: 83.2 ± 8.1 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-PAAD[OS] type: image-classification metrics: - type: accuracy value: 59.0 ± 4.2 name: ROC AUC verified: false license: gpl-3.0 pipeline_tag: feature-extraction inference: false metrics: - accuracy --- # Model card for swin_tiny_patch4_window7_224.CTransPath A Swin Transformer image classification model. \\ Trained on 15M histology patches from PAIP and TCGA. : 27.5 - Image size: 224 x 224 x 3 - **Papers:** - Transformer-based unsupervised contrastive learning for histopathological image classification: - **Dataset:** TCGA: - **Original:** - **License:** GPLv3 ## Model Usage ### Custom Patch Embed Layer Definition ### Image Embeddings ## Citation",
17
- "model_explanation_gemini": "A Swin Transformer-based model for histopathological image classification and feature extraction, trained on 15M histology patches from PAIP and TCGA datasets. \n\n**Features:** \n- Architecture: Swin Tiny (patch size 4, window size 7, input resolution 224x224x3) \n- Tasks: Image classification, feature extraction \n- Domains: Biology, cancer histology \n- Datasets: Camelyon16, TCGA (BRCA, CRC, NSCLC"
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- tags: - feature-extraction - image-classification - timm - biology - cancer - histology library_name: timm model-index: - name: ctranspath results: - task: type: image-classification name: Image Classification dataset: name: Camelyon16[Meta] type: image-classification metrics: - type: accuracy value: 96.3 ± 2.6 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[Hist] type: image-classification metrics: - type: accuracy value: 95.8 ± 0.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[HRD] type: image-classification metrics: - type: accuracy value: 77.1 ± 2.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[Mol] type: image-classification metrics: - type: accuracy value: 80.8 ± 1.7 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-BRCA[OS] type: image-classification metrics: - type: accuracy value: 65.0 ± 6.0 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-CRC[MSI] type: image-classification metrics: - type: accuracy value: 88.5 ± 2.3 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-COAD[OS] type: image-classification metrics: - type: accuracy value: 64.3 ± 5.4 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-NSCLC[CType] type: image-classification metrics: - type: accuracy value: 97.3 ± 0.4 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-LUAD[OS] type: image-classification metrics: - type: accuracy value: 59.1 ± 4.5 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-LUSC[OS] type: image-classification metrics: - type: accuracy value: 61.5 ± 2.9 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-OV[HRD] type: image-classification metrics: - type: accuracy value: 69.5 ± 7.0 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-RCC[CType] type: image-classification metrics: - type: accuracy value: 98.9 ± 0.2 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-STAD[MSI] type: image-classification metrics: - type: accuracy value: 83.2 ± 8.1 name: ROC AUC verified: false - task: type: image-classification name: Image Classification dataset: name: TCGA-PAAD[OS] type: image-classification metrics: - type: accuracy value: 59.0 ± 4.2 name: ROC AUC verified: false license: gpl-3.0 pipeline_tag: feature-extraction inference: false metrics: - accuracy --- # Model card for swin_tiny_patch4_window7_224.CTransPath A Swin Transformer image classification model. \\ Trained on 15M histology patches from PAIP and TCGA. : 27.5 - Image size: 224 x 224 x 3 - **Papers:** - Transformer-based unsupervised contrastive learning for histopathological image classification: - **Dataset:** TCGA: - **Original:** - **License:** GPLv3 ## Model Usage ### Custom Patch Embed Layer Definition ### Image Embeddings ## Citation",
17
+ "model_explanation_gemini": "A Swin Transformer-based model for histopathological image classification and feature extraction, trained on 15M histology patches from PAIP and TCGA datasets. \n\n**Features:** \n- Architecture: Swin Tiny (patch size 4, window size 7, input resolution 224x224x3) \n- Tasks: Image classification, feature extraction \n- Domains: Biology, cancer histology \n- Datasets: Camelyon16, TCGA (BRCA, CRC, NSCLC",
18
+ "release_year": null,
19
+ "parameter_count": null,
20
+ "is_fine_tuned": false,
21
+ "category": "Embedding",
22
+ "api_enhanced": true
23
  }
model_data_json/1bitLLM_bitnet_b1_58-large.json CHANGED
@@ -14,5 +14,11 @@
14
  "region:us"
15
  ],
16
  "description": "--- license: mit --- This is a reproduction of the <a href=\" BitNet b1.58</a> paper. The models are trained with <a href=\" dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href=\" All models are open-source in the <a href=\" We will train larger models and/or more tokens when resource is available. ## Results PPL and zero-shot accuracy: | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg |-------|-------|-------|-------|-------|-------|-------|-------|-------|-------| | FP16 700M (reported) | 12.33 | 54.7 | 23.0 | 37.0 | 60.0 | 20.2 | 68.9 | 54.8 | 45.5 | | BitNet b1.58 700M (reported) | 12.87 | 51.8 | 21.4 | 35.1 | 58.2 | 20.0 | 68.1 | 55.2 | 44.3 | | BitNet b1.58 700M (reproduced) | 12.78 | 51.4 | 21.8 | 35.0 | 59.6 | 20.6 | 67.5 | 55.4 | 44.5 | | FP16 1.3B (reported) | 11.25 | 56.9 | 23.5 | 38.5 | 59.1 | 21.6 | 70.0 | 53.9 | 46.2 | BitNet b1.58 1.3B (reported) | 11.29 | 54.9 | 24.2 | 37.7 | 56.7 | 19.6 | 68.8 | 55.8 | 45.4 | | BitNet b1.58 1.3B (reproduced) | 11.19 | 55.8 | 23.7 | 37.6 | 59.0 | 20.2 | 69.2 | 56.0 | 45.9 | FP16 3B (reported) | 10.04 | 62.1 | 25.6 | 43.3 | 61.8 | 24.6 | 72.1 | 58.2 | 49.7 | BitNet b1.58 3B (reported) | 9.91 | 61.4 | 28.3 | 42.9 | 61.5 | 26.6 | 71.5 | 59.3 | 50.2 | BitNet b1.58 3B (reproduced) | 9.88 | 60.9 | 28.0 | 42.3 | 58.3 | 26.0 | 71.4 | 60.3 | 49.6 | The differences between the reported numbers and the reproduced results are possibly variances from the training data processing, seeds, or other random factors. ## Evaluation The evaluation pipelines are from the paper authors. Here is the commands to run the evaluation:",
17
- "model_explanation_gemini": "\"Reproduces BitNet b1.58, a 1-bit LLM trained on 100B tokens, achieving competitive perplexity (PPL) and zero-shot accuracy compared to FP16 models across various benchmarks.\"\n\n**Model Features**: \n- 1-bit quantization (BitNet b1.58 architecture) \n- Trained on 100B tokens \n- Implements two-stage LR and weight decay \n- Open-source \n- Evaluated on PPL and zero-shot tasks (ARC"
 
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- license: mit --- This is a reproduction of the <a href=\" BitNet b1.58</a> paper. The models are trained with <a href=\" dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href=\" All models are open-source in the <a href=\" We will train larger models and/or more tokens when resource is available. ## Results PPL and zero-shot accuracy: | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg |-------|-------|-------|-------|-------|-------|-------|-------|-------|-------| | FP16 700M (reported) | 12.33 | 54.7 | 23.0 | 37.0 | 60.0 | 20.2 | 68.9 | 54.8 | 45.5 | | BitNet b1.58 700M (reported) | 12.87 | 51.8 | 21.4 | 35.1 | 58.2 | 20.0 | 68.1 | 55.2 | 44.3 | | BitNet b1.58 700M (reproduced) | 12.78 | 51.4 | 21.8 | 35.0 | 59.6 | 20.6 | 67.5 | 55.4 | 44.5 | | FP16 1.3B (reported) | 11.25 | 56.9 | 23.5 | 38.5 | 59.1 | 21.6 | 70.0 | 53.9 | 46.2 | BitNet b1.58 1.3B (reported) | 11.29 | 54.9 | 24.2 | 37.7 | 56.7 | 19.6 | 68.8 | 55.8 | 45.4 | | BitNet b1.58 1.3B (reproduced) | 11.19 | 55.8 | 23.7 | 37.6 | 59.0 | 20.2 | 69.2 | 56.0 | 45.9 | FP16 3B (reported) | 10.04 | 62.1 | 25.6 | 43.3 | 61.8 | 24.6 | 72.1 | 58.2 | 49.7 | BitNet b1.58 3B (reported) | 9.91 | 61.4 | 28.3 | 42.9 | 61.5 | 26.6 | 71.5 | 59.3 | 50.2 | BitNet b1.58 3B (reproduced) | 9.88 | 60.9 | 28.0 | 42.3 | 58.3 | 26.0 | 71.4 | 60.3 | 49.6 | The differences between the reported numbers and the reproduced results are possibly variances from the training data processing, seeds, or other random factors. ## Evaluation The evaluation pipelines are from the paper authors. Here is the commands to run the evaluation:",
17
+ "model_explanation_gemini": "\"Reproduces BitNet b1.58, a 1-bit LLM trained on 100B tokens, achieving competitive perplexity (PPL) and zero-shot accuracy compared to FP16 models across various benchmarks.\"\n\n**Model Features**: \n- 1-bit quantization (BitNet b1.58 architecture) \n- Trained on 100B tokens \n- Implements two-stage LR and weight decay \n- Open-source \n- Evaluated on PPL and zero-shot tasks (ARC",
18
+ "release_year": "2024",
19
+ "parameter_count": null,
20
+ "is_fine_tuned": false,
21
+ "category": "Large Language Model",
22
+ "model_family": "LLaMA",
23
+ "api_enhanced": true
24
  }
model_data_json/3loi_SER-Odyssey-Baseline-WavLM-Multi-Attributes.json CHANGED
@@ -23,5 +23,10 @@
23
  "region:us"
24
  ],
25
  "description": "--- license: mit language: - en pipeline_tag: audio-classification tags: - wavlm - msp-podcast - emotion-recognition - audio - speech - valence - arousal - dominance - lucas - speech-emotion-recognition --- The model was trained on MSP-Podcast for the Odyssey 2024 Emotion Recognition competition baseline<br> This particular model is the multi-attributed based model which predict arousal, dominance and valence in a range of approximately 0...1. # Benchmarks CCC based on Test3 and Development sets of the Odyssey Competition <table style=\"width:500px\"> <tr><th colspan=6 align=\"center\" >Multi-Task Setup</th></tr> <tr><th colspan=3 align=\"center\">Test 3</th><th colspan=3 align=\"center\">Development</th></tr> <tr> <td>Val</td> <td>Dom</td> <td>Aro</td> <td>Val</td> <td>Dom</td> <td>Aro</td> </tr> <tr> <td> 0.577</td> <td>0.577</td> <td>0.405</td> <td>0.652</td> <td>0.688</td> <td>0.579</td> </tr> </table> For more details: demo, paper, and GitHub. # Usage",
26
- "model_explanation_gemini": "Predicts arousal, dominance, and valence scores (0-1 range) from speech for emotion recognition, trained on MSP-Podcast as an Odyssey 2024 competition baseline. \n\n**Features:** \n- Audio classification for speech emotion recognition \n- Multi-attribute prediction (arousal, dominance, valence) \n- Trained on MSP-Podcast dataset \n- WavLM-based architecture \n- Benchmark metrics (CCC scores) provided for Test3 and Development sets \n\n**Comparison (if inferred"
 
 
 
 
 
27
  }
 
23
  "region:us"
24
  ],
25
  "description": "--- license: mit language: - en pipeline_tag: audio-classification tags: - wavlm - msp-podcast - emotion-recognition - audio - speech - valence - arousal - dominance - lucas - speech-emotion-recognition --- The model was trained on MSP-Podcast for the Odyssey 2024 Emotion Recognition competition baseline<br> This particular model is the multi-attributed based model which predict arousal, dominance and valence in a range of approximately 0...1. # Benchmarks CCC based on Test3 and Development sets of the Odyssey Competition <table style=\"width:500px\"> <tr><th colspan=6 align=\"center\" >Multi-Task Setup</th></tr> <tr><th colspan=3 align=\"center\">Test 3</th><th colspan=3 align=\"center\">Development</th></tr> <tr> <td>Val</td> <td>Dom</td> <td>Aro</td> <td>Val</td> <td>Dom</td> <td>Aro</td> </tr> <tr> <td> 0.577</td> <td>0.577</td> <td>0.405</td> <td>0.652</td> <td>0.688</td> <td>0.579</td> </tr> </table> For more details: demo, paper, and GitHub. # Usage",
26
+ "model_explanation_gemini": "Predicts arousal, dominance, and valence scores (0-1 range) from speech for emotion recognition, trained on MSP-Podcast as an Odyssey 2024 competition baseline. \n\n**Features:** \n- Audio classification for speech emotion recognition \n- Multi-attribute prediction (arousal, dominance, valence) \n- Trained on MSP-Podcast dataset \n- WavLM-based architecture \n- Benchmark metrics (CCC scores) provided for Test3 and Development sets \n\n**Comparison (if inferred",
27
+ "release_year": null,
28
+ "parameter_count": null,
29
+ "is_fine_tuned": false,
30
+ "category": "Audio",
31
+ "api_enhanced": true
32
  }
model_data_json/51la5_roberta-large-NER.json CHANGED
@@ -109,5 +109,11 @@
109
  "region:us"
110
  ],
111
  "description": "--- language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh --- # xlm-roberta-large-finetuned-conll03-english # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training 5. Evaluation 6. Environmental Impact 7. Technical Specifications 8. Citation 9. Model Card Authors 10. How To Get Started With the Model # Model Details ## Model Description The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is XLM-RoBERTa-large fine-tuned with the conll2003 dataset in English. - **Developed by:** See associated paper - **Model type:** Multi-lingual language model - **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see GitHub Repo for full list; model is fine-tuned on a dataset in English - **License:** More information needed - **Related Models:** RoBERTa, XLM - **Parent Model:** XLM-RoBERTa-large - **Resources for more information:** -GitHub Repo -Associated Paper # Uses ## Direct Use The model is a language model. The model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. ## Downstream Use Potential downstream use cases include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. To learn more about token classification and other potential downstream use cases, see the Hugging Face token classification docs. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations **CONTENT WARNING: Readers should be made aware that language generated by this model may be disturbing or offensive to some and may propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). In the context of tasks relevant to this model, Mishra et al. (2020) explore social biases in NER systems for English and find that there is systematic bias in existing NER systems in that they fail to identify named entities from different demographic groups (though this paper did not look at BERT). For example, using a sample sentence from Mishra et al. (2020): ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. # Training See the following resources for training data and training procedure details: - XLM-RoBERTa-large model card - CoNLL-2003 data card - Associated paper # Evaluation See the associated paper for evaluation details. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** 500 32GB Nvidia V100 GPUs (from the associated paper) - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications See the associated paper for further details. # Citation **BibTeX:** **APA:** - Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model. You can use this model directly within a pipeline for NER. <details> <summary> Click to expand </summary> </details>",
112
- "model_explanation_gemini": "A multilingual XLM-RoBERTa-large model fine-tuned for English Named Entity Recognition (NER) using the CoNLL-2003 dataset. \n\n**Features:** \n- Multilingual base (supports 100 languages) \n- Fine-tuned specifically for English NER \n- Token classification capability (labels assigned to text tokens) \n- Derived from XLM-RoBERTa-large architecture \n\n**Comparison:** \nWhile the base model (XLM-RoBERTa-large) is"
 
 
 
 
 
 
113
  }
 
109
  "region:us"
110
  ],
111
  "description": "--- language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh --- # xlm-roberta-large-finetuned-conll03-english # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training 5. Evaluation 6. Environmental Impact 7. Technical Specifications 8. Citation 9. Model Card Authors 10. How To Get Started With the Model # Model Details ## Model Description The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is XLM-RoBERTa-large fine-tuned with the conll2003 dataset in English. - **Developed by:** See associated paper - **Model type:** Multi-lingual language model - **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see GitHub Repo for full list; model is fine-tuned on a dataset in English - **License:** More information needed - **Related Models:** RoBERTa, XLM - **Parent Model:** XLM-RoBERTa-large - **Resources for more information:** -GitHub Repo -Associated Paper # Uses ## Direct Use The model is a language model. The model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. ## Downstream Use Potential downstream use cases include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. To learn more about token classification and other potential downstream use cases, see the Hugging Face token classification docs. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations **CONTENT WARNING: Readers should be made aware that language generated by this model may be disturbing or offensive to some and may propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). In the context of tasks relevant to this model, Mishra et al. (2020) explore social biases in NER systems for English and find that there is systematic bias in existing NER systems in that they fail to identify named entities from different demographic groups (though this paper did not look at BERT). For example, using a sample sentence from Mishra et al. (2020): ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. # Training See the following resources for training data and training procedure details: - XLM-RoBERTa-large model card - CoNLL-2003 data card - Associated paper # Evaluation See the associated paper for evaluation details. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** 500 32GB Nvidia V100 GPUs (from the associated paper) - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications See the associated paper for further details. # Citation **BibTeX:** **APA:** - Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model. You can use this model directly within a pipeline for NER. <details> <summary> Click to expand </summary> </details>",
112
+ "model_explanation_gemini": "A multilingual XLM-RoBERTa-large model fine-tuned for English Named Entity Recognition (NER) using the CoNLL-2003 dataset. \n\n**Features:** \n- Multilingual base (supports 100 languages) \n- Fine-tuned specifically for English NER \n- Token classification capability (labels assigned to text tokens) \n- Derived from XLM-RoBERTa-large architecture \n\n**Comparison:** \nWhile the base model (XLM-RoBERTa-large) is",
113
+ "release_year": "2019",
114
+ "parameter_count": null,
115
+ "is_fine_tuned": false,
116
+ "category": "Named Entity Recognition",
117
+ "model_family": "BERT",
118
+ "api_enhanced": true
119
  }
model_data_json/AI-Growth-Lab_PatentSBERTa.json CHANGED
@@ -14,5 +14,11 @@
14
  "region:us"
15
  ],
16
  "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # PatentSBERTa ## PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT ### Aalborg University Business School, AI: Growth-Lab This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. <!--- Describe your model here --> ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results <!--- Describe how your model was evaluated --> For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 5 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors",
17
- "model_explanation_gemini": "\"PatentSBERTa is a sentence-transformers model that maps text to 768-dimensional vectors for tasks like clustering and semantic search, specifically designed for patent distance and classification.\"\n\n### Model Features: \n- **Task**: Sentence similarity, feature extraction \n- **Output**: 768-dimensional dense vector embeddings \n- **Use Cases**: Clustering, semantic search, patent analysis \n- **Architecture**: Based on augmented SBERT (Sentence-BERT) \n\n### Comparative Note: \nUnlike generic sentence-transform"
 
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # PatentSBERTa ## PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT ### Aalborg University Business School, AI: Growth-Lab This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. <!--- Describe your model here --> ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results <!--- Describe how your model was evaluated --> For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 5 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors",
17
+ "model_explanation_gemini": "\"PatentSBERTa is a sentence-transformers model that maps text to 768-dimensional vectors for tasks like clustering and semantic search, specifically designed for patent distance and classification.\"\n\n### Model Features: \n- **Task**: Sentence similarity, feature extraction \n- **Output**: 768-dimensional dense vector embeddings \n- **Use Cases**: Clustering, semantic search, patent analysis \n- **Architecture**: Based on augmented SBERT (Sentence-BERT) \n\n### Comparative Note: \nUnlike generic sentence-transform",
18
+ "release_year": "2021",
19
+ "parameter_count": null,
20
+ "is_fine_tuned": false,
21
+ "category": "Embedding",
22
+ "model_family": "BERT",
23
+ "api_enhanced": true
24
  }
model_data_json/AI-MO_Kimina-Prover-Preview-Distill-1.5B.json CHANGED
@@ -19,5 +19,11 @@
19
  "region:us"
20
  ],
21
  "description": "--- language: - en base_model: Qwen/Qwen2.5-Math-1.5B-Instruct tags: - chat library_name: transformers license: apache-2.0 --- # Kimina-Prover-Preview-Distill-1.5B !image/png **Kimina-Prover-Preview-Distill-1.5B** is a theorem proving model developed by Project Numina and Kimi teams, focusing on competition style problem solving capabilities in Lean 4. It is a distillation of **Kimina-Prover-Preview**, a model trained via large scale reinforcement learning. It achieves SOTA results on MiniF2F-test in its model size and compute budget at the time of release. # Quick Start with vLLM You can easily do inference using vLLM: lean4\\n{formal_statement}\\n # Citation If you find our work helpful, you can cite our paper:",
22
- "model_explanation_gemini": "\"Kimina-Prover-Preview-Distill-1.5B is a distilled 1.5B-parameter theorem-proving model specialized in Lean 4 for competition-style problem solving, achieving state-of-the-art results on MiniF2F-test within its compute class.\"\n\nFeatures: \n- Distilled version of Kimina-Prover-Preview \n- Specialized in Lean 4 theorem proving \n- Optimized for competition-style problems \n- Achieves SOTA on MiniF2F"
 
 
 
 
 
 
23
  }
 
19
  "region:us"
20
  ],
21
  "description": "--- language: - en base_model: Qwen/Qwen2.5-Math-1.5B-Instruct tags: - chat library_name: transformers license: apache-2.0 --- # Kimina-Prover-Preview-Distill-1.5B !image/png **Kimina-Prover-Preview-Distill-1.5B** is a theorem proving model developed by Project Numina and Kimi teams, focusing on competition style problem solving capabilities in Lean 4. It is a distillation of **Kimina-Prover-Preview**, a model trained via large scale reinforcement learning. It achieves SOTA results on MiniF2F-test in its model size and compute budget at the time of release. # Quick Start with vLLM You can easily do inference using vLLM: lean4\\n{formal_statement}\\n # Citation If you find our work helpful, you can cite our paper:",
22
+ "model_explanation_gemini": "\"Kimina-Prover-Preview-Distill-1.5B is a distilled 1.5B-parameter theorem-proving model specialized in Lean 4 for competition-style problem solving, achieving state-of-the-art results on MiniF2F-test within its compute class.\"\n\nFeatures: \n- Distilled version of Kimina-Prover-Preview \n- Specialized in Lean 4 theorem proving \n- Optimized for competition-style problems \n- Achieves SOTA on MiniF2F",
23
+ "release_year": "2024",
24
+ "parameter_count": "1.5B",
25
+ "is_fine_tuned": true,
26
+ "category": "Large Language Model",
27
+ "model_family": "Qwen",
28
+ "api_enhanced": true
29
  }
model_data_json/AI-MO_Kimina-Prover-Preview-Distill-7B.json CHANGED
@@ -19,5 +19,11 @@
19
  "region:us"
20
  ],
21
  "description": "--- language: - en base_model: Qwen/Qwen2.5-Math-7B-Instruct tags: - chat library_name: transformers license: apache-2.0 --- # Kimina-Prover-Preview-Distill-7B !image/png **Kimina-Prover-Preview-Distill-7B** is a theorem proving model developed by Project Numina and Kimi teams, focusing on competition style problem solving capabilities in Lean 4. It is a distillation of **Kimina-Prover-Preview**, a model trained via large scale reinforcement learning. It achieves SOTA results on MiniF2F-test in its model size and compute budget and ranks #1 in PutnamBench Leaderboard with a small compute budget at the time of release. # Quick Start with vLLM You can easily do inference using vLLM: lean4\\n{formal_statement}\\n # Citation If you find our work helpful, you can cite our paper:",
22
- "model_explanation_gemini": "\"Kimina-Prover-Preview-Distill-7B is a distilled theorem-proving model specialized in Lean 4 for competition-style problem-solving, achieving state-of-the-art results on MiniF2F-test and leading the PutnamBench Leaderboard with efficient compute usage.\"\n\nFeatures: \n- Specialized in Lean 4 theorem proving \n- Optimized for competition-style problems \n- Distilled from a reinforcement learning-trained model (Kimina-Prover-Preview) \n- State-of-the-art performance"
 
 
 
 
 
 
23
  }
 
19
  "region:us"
20
  ],
21
  "description": "--- language: - en base_model: Qwen/Qwen2.5-Math-7B-Instruct tags: - chat library_name: transformers license: apache-2.0 --- # Kimina-Prover-Preview-Distill-7B !image/png **Kimina-Prover-Preview-Distill-7B** is a theorem proving model developed by Project Numina and Kimi teams, focusing on competition style problem solving capabilities in Lean 4. It is a distillation of **Kimina-Prover-Preview**, a model trained via large scale reinforcement learning. It achieves SOTA results on MiniF2F-test in its model size and compute budget and ranks #1 in PutnamBench Leaderboard with a small compute budget at the time of release. # Quick Start with vLLM You can easily do inference using vLLM: lean4\\n{formal_statement}\\n # Citation If you find our work helpful, you can cite our paper:",
22
+ "model_explanation_gemini": "\"Kimina-Prover-Preview-Distill-7B is a distilled theorem-proving model specialized in Lean 4 for competition-style problem-solving, achieving state-of-the-art results on MiniF2F-test and leading the PutnamBench Leaderboard with efficient compute usage.\"\n\nFeatures: \n- Specialized in Lean 4 theorem proving \n- Optimized for competition-style problems \n- Distilled from a reinforcement learning-trained model (Kimina-Prover-Preview) \n- State-of-the-art performance",
23
+ "release_year": "2024",
24
+ "parameter_count": "7B",
25
+ "is_fine_tuned": true,
26
+ "category": "Large Language Model",
27
+ "model_family": "Qwen",
28
+ "api_enhanced": true
29
  }
model_data_json/AIDC-AI_Ovis2-4B.json CHANGED
@@ -19,5 +19,10 @@
19
  "region:us"
20
  ],
21
  "description": "--- license: apache-2.0 datasets: - AIDC-AI/Ovis-dataset library_name: transformers tags: - MLLM pipeline_tag: image-text-to-text language: - en - zh --- # Ovis2-4B <div align=\"center\"> <img src= width=\"30%\"/> </div> ## Introduction GitHub | Paper We are pleased to announce the release of **Ovis2**, our latest advancement in multi-modal large language models (MLLMs). Ovis2 inherits the innovative architectural design of the Ovis series, aimed at structurally aligning visual and textual embeddings. As the successor to Ovis1.6, Ovis2 incorporates significant improvements in both dataset curation and training methodologies. **Key Features**: - **Small Model Performance**: Optimized training strategies enable small-scale models to achieve higher capability density, demonstrating cross-tier leading advantages. - **Enhanced Reasoning Capabilities**: Significantly strengthens Chain-of-Thought (CoT) reasoning abilities through the combination of instruction tuning and preference learning. - **Video and Multi-Image Processing**: Video and multi-image data are incorporated into training to enhance the ability to handle complex visual information across frames and images. - **Multilingual Support and OCR**: Enhances multilingual OCR beyond English and Chinese and improves structured data extraction from complex visual elements like tables and charts. <div align=\"center\"> <img src=\" width=\"100%\" /> </div> ## Model Zoo | Ovis MLLMs | ViT | LLM | Model Weights | Demo | |:-----------|:-----------------------:|:---------------------:|:-------------------------------------------------------:|:--------------------------------------------------------:| | Ovis2-1B | aimv2-large-patch14-448 | Qwen2.5-0.5B-Instruct | Huggingface | Space | | Ovis2-2B | aimv2-large-patch14-448 | Qwen2.5-1.5B-Instruct | Huggingface | Space | | Ovis2-4B | aimv2-huge-patch14-448 | Qwen2.5-3B-Instruct | Huggingface | Space | | Ovis2-8B | aimv2-huge-patch14-448 | Qwen2.5-7B-Instruct | Huggingface | Space | | Ovis2-16B | aimv2-huge-patch14-448 | Qwen2.5-14B-Instruct | Huggingface | Space | | Ovis2-34B | aimv2-1B-patch14-448 | Qwen2.5-32B-Instruct | Huggingface | - | ## Performance We use VLMEvalKit, as employed in the OpenCompass multimodal and reasoning leaderboard, to evaluate Ovis2. !image/png ### Image Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B-MPO | MiniCPM-o-2.6 | Ovis1.6-9B | InternVL2.5-4B-MPO | Ovis2-4B | Ovis2-8B | |:-----------------------------|:---------------:|:--------------------:|:---------------:|:------------:|:--------------------:|:----------:|:----------:| | MMBench-V1.1<sub>test</sub> | 82.6 | 82.0 | 80.6 | 80.5 | 77.8 | 81.4 | **83.6** | | MMStar | 64.1 | **65.2** | 63.3 | 62.9 | 61 | 61.9 | 64.6 | | MMMU<sub>val</sub> | 56.2 | 54.8 | 50.9 | 55 | 51.8 | 49.0 | **57.4** | | MathVista<sub>testmini</sub> | 65.8 | 67.9 | **73.3** | 67.3 | 64.1 | 69.6 | 71.8 | | HallusionBench | **56.3** | 51.7 | 51.1 | 52.2 | 47.5 | 53.8 | **56.3** | | AI2D | 84.1 | 84.5 | 86.1 | 84.4 | 81.5 | 85.7 | **86.6** | | OCRBench | 87.7 | 88.2 | 88.9 | 83 | 87.9 | **91.1** | 89.1 | | MMVet | 66.6 | **68.1** | 67.2 | 65 | 66 | 65.5 | 65.1 | | MMBench<sub>test</sub> | 83.4 | 83.2 | 83.2 | 82.7 | 79.6 | 83.2 | **84.9** | | MMT-Bench<sub>val</sub> | 62.7 | 62.5 | 62.3 | 64.9 | 61.6 | 65.2 | **66.6** | | RealWorldQA | 68.8 | 71.1 | 68.0 | 70.7 | 64.4 | 71.1 | **72.5** | | BLINK | 56.1 | **56.6** | 53.9 | 48.5 | 50.6 | 53.0 | 54.3 | | QBench | 77.9 | 73.8 | 78.7 | 76.7 | 71.5 | 78.1 | **78.9** | | ABench | 75.6 | 77.0 | **77.5** | 74.4 | 75.9 | **77.5** | 76.4 | | MTVQA | 28.5 | 27.2 | 23.1 | 19.2 | 28 | 29.4 | **29.7** | ### Video Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B | LLaVA-OV-7B | InternVL2.5-4B | Ovis2-4B | Ovis2-8B | |:--------------------|:-------------:|:--------------:|:------------------:|:--------------:|:---------:|:-------------:| | VideoMME(wo/w-subs) | 65.1/71.6 | 64.2 / 66.9 | 58.2/61.5 | 62.3 / 63.6 | 64.0/66.3 | **68.0/71.6** | | MVBench | 69.6 | **72.0** | 56.7 | 71.6 | 68.45 | 68.15 | | MLVU(M-Avg/G-Avg) | 70.2/- | 68.9/- | 64.7/- | 68.3/- | 70.8/4.23 | **76.4**/4.25 | | MMBench-Video | 1.79 | 1.68 | - | 1.73 | 1.69 | **1.85** | | TempCompass | **71.7** | - | - | - | 67.02 | 69.28 | ## Usage Below is a code snippet demonstrating how to run Ovis with various input types. For additional usage instructions, including inference wrapper and Gradio UI, please refer to Ovis GitHub. <details> <summary>Batch Inference</summary> </details> ## Citation If you find Ovis useful, please consider citing the paper ## License This project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0). ## Disclaimer We used compliance-checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.",
22
- "model_explanation_gemini": "\"AIDC-AI_Ovis2-4B is a multimodal large language model (MLLM) designed for image-text-to-text tasks, featuring enhanced reasoning, multilingual OCR, and video/multi-image processing capabilities.\"\n\n**Model Features**: \n- Optimized small-scale performance with high capability density \n- Strengthened Chain-of-Thought (CoT) reasoning \n- Video and multi-image processing support \n- Multilingual OCR (English/Chinese) and structured data extraction \n\n**Comparison**:"
 
 
 
 
 
23
  }
 
19
  "region:us"
20
  ],
21
  "description": "--- license: apache-2.0 datasets: - AIDC-AI/Ovis-dataset library_name: transformers tags: - MLLM pipeline_tag: image-text-to-text language: - en - zh --- # Ovis2-4B <div align=\"center\"> <img src= width=\"30%\"/> </div> ## Introduction GitHub | Paper We are pleased to announce the release of **Ovis2**, our latest advancement in multi-modal large language models (MLLMs). Ovis2 inherits the innovative architectural design of the Ovis series, aimed at structurally aligning visual and textual embeddings. As the successor to Ovis1.6, Ovis2 incorporates significant improvements in both dataset curation and training methodologies. **Key Features**: - **Small Model Performance**: Optimized training strategies enable small-scale models to achieve higher capability density, demonstrating cross-tier leading advantages. - **Enhanced Reasoning Capabilities**: Significantly strengthens Chain-of-Thought (CoT) reasoning abilities through the combination of instruction tuning and preference learning. - **Video and Multi-Image Processing**: Video and multi-image data are incorporated into training to enhance the ability to handle complex visual information across frames and images. - **Multilingual Support and OCR**: Enhances multilingual OCR beyond English and Chinese and improves structured data extraction from complex visual elements like tables and charts. <div align=\"center\"> <img src=\" width=\"100%\" /> </div> ## Model Zoo | Ovis MLLMs | ViT | LLM | Model Weights | Demo | |:-----------|:-----------------------:|:---------------------:|:-------------------------------------------------------:|:--------------------------------------------------------:| | Ovis2-1B | aimv2-large-patch14-448 | Qwen2.5-0.5B-Instruct | Huggingface | Space | | Ovis2-2B | aimv2-large-patch14-448 | Qwen2.5-1.5B-Instruct | Huggingface | Space | | Ovis2-4B | aimv2-huge-patch14-448 | Qwen2.5-3B-Instruct | Huggingface | Space | | Ovis2-8B | aimv2-huge-patch14-448 | Qwen2.5-7B-Instruct | Huggingface | Space | | Ovis2-16B | aimv2-huge-patch14-448 | Qwen2.5-14B-Instruct | Huggingface | Space | | Ovis2-34B | aimv2-1B-patch14-448 | Qwen2.5-32B-Instruct | Huggingface | - | ## Performance We use VLMEvalKit, as employed in the OpenCompass multimodal and reasoning leaderboard, to evaluate Ovis2. !image/png ### Image Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B-MPO | MiniCPM-o-2.6 | Ovis1.6-9B | InternVL2.5-4B-MPO | Ovis2-4B | Ovis2-8B | |:-----------------------------|:---------------:|:--------------------:|:---------------:|:------------:|:--------------------:|:----------:|:----------:| | MMBench-V1.1<sub>test</sub> | 82.6 | 82.0 | 80.6 | 80.5 | 77.8 | 81.4 | **83.6** | | MMStar | 64.1 | **65.2** | 63.3 | 62.9 | 61 | 61.9 | 64.6 | | MMMU<sub>val</sub> | 56.2 | 54.8 | 50.9 | 55 | 51.8 | 49.0 | **57.4** | | MathVista<sub>testmini</sub> | 65.8 | 67.9 | **73.3** | 67.3 | 64.1 | 69.6 | 71.8 | | HallusionBench | **56.3** | 51.7 | 51.1 | 52.2 | 47.5 | 53.8 | **56.3** | | AI2D | 84.1 | 84.5 | 86.1 | 84.4 | 81.5 | 85.7 | **86.6** | | OCRBench | 87.7 | 88.2 | 88.9 | 83 | 87.9 | **91.1** | 89.1 | | MMVet | 66.6 | **68.1** | 67.2 | 65 | 66 | 65.5 | 65.1 | | MMBench<sub>test</sub> | 83.4 | 83.2 | 83.2 | 82.7 | 79.6 | 83.2 | **84.9** | | MMT-Bench<sub>val</sub> | 62.7 | 62.5 | 62.3 | 64.9 | 61.6 | 65.2 | **66.6** | | RealWorldQA | 68.8 | 71.1 | 68.0 | 70.7 | 64.4 | 71.1 | **72.5** | | BLINK | 56.1 | **56.6** | 53.9 | 48.5 | 50.6 | 53.0 | 54.3 | | QBench | 77.9 | 73.8 | 78.7 | 76.7 | 71.5 | 78.1 | **78.9** | | ABench | 75.6 | 77.0 | **77.5** | 74.4 | 75.9 | **77.5** | 76.4 | | MTVQA | 28.5 | 27.2 | 23.1 | 19.2 | 28 | 29.4 | **29.7** | ### Video Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B | LLaVA-OV-7B | InternVL2.5-4B | Ovis2-4B | Ovis2-8B | |:--------------------|:-------------:|:--------------:|:------------------:|:--------------:|:---------:|:-------------:| | VideoMME(wo/w-subs) | 65.1/71.6 | 64.2 / 66.9 | 58.2/61.5 | 62.3 / 63.6 | 64.0/66.3 | **68.0/71.6** | | MVBench | 69.6 | **72.0** | 56.7 | 71.6 | 68.45 | 68.15 | | MLVU(M-Avg/G-Avg) | 70.2/- | 68.9/- | 64.7/- | 68.3/- | 70.8/4.23 | **76.4**/4.25 | | MMBench-Video | 1.79 | 1.68 | - | 1.73 | 1.69 | **1.85** | | TempCompass | **71.7** | - | - | - | 67.02 | 69.28 | ## Usage Below is a code snippet demonstrating how to run Ovis with various input types. For additional usage instructions, including inference wrapper and Gradio UI, please refer to Ovis GitHub. <details> <summary>Batch Inference</summary> </details> ## Citation If you find Ovis useful, please consider citing the paper ## License This project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0). ## Disclaimer We used compliance-checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.",
22
+ "model_explanation_gemini": "\"AIDC-AI_Ovis2-4B is a multimodal large language model (MLLM) designed for image-text-to-text tasks, featuring enhanced reasoning, multilingual OCR, and video/multi-image processing capabilities.\"\n\n**Model Features**: \n- Optimized small-scale performance with high capability density \n- Strengthened Chain-of-Thought (CoT) reasoning \n- Video and multi-image processing support \n- Multilingual OCR (English/Chinese) and structured data extraction \n\n**Comparison**:",
23
+ "release_year": "2024",
24
+ "parameter_count": "4B",
25
+ "is_fine_tuned": false,
26
+ "category": "Multimodal",
27
+ "api_enhanced": true
28
  }
model_data_json/AIDC-AI_Ovis2-8B.json CHANGED
@@ -19,5 +19,10 @@
19
  "region:us"
20
  ],
21
  "description": "--- license: apache-2.0 datasets: - AIDC-AI/Ovis-dataset library_name: transformers tags: - MLLM pipeline_tag: image-text-to-text language: - en - zh --- # Ovis2-8B <div align=\"center\"> <img src= width=\"30%\"/> </div> ## Introduction GitHub | Paper We are pleased to announce the release of **Ovis2**, our latest advancement in multi-modal large language models (MLLMs). Ovis2 inherits the innovative architectural design of the Ovis series, aimed at structurally aligning visual and textual embeddings. As the successor to Ovis1.6, Ovis2 incorporates significant improvements in both dataset curation and training methodologies. **Key Features**: - **Small Model Performance**: Optimized training strategies enable small-scale models to achieve higher capability density, demonstrating cross-tier leading advantages. - **Enhanced Reasoning Capabilities**: Significantly strengthens Chain-of-Thought (CoT) reasoning abilities through the combination of instruction tuning and preference learning. - **Video and Multi-Image Processing**: Video and multi-image data are incorporated into training to enhance the ability to handle complex visual information across frames and images. - **Multilingual Support and OCR**: Enhances multilingual OCR beyond English and Chinese and improves structured data extraction from complex visual elements like tables and charts. <div align=\"center\"> <img src=\" width=\"100%\" /> </div> ## Model Zoo | Ovis MLLMs | ViT | LLM | Model Weights | Demo | |:-----------|:-----------------------:|:---------------------:|:-------------------------------------------------------:|:--------------------------------------------------------:| | Ovis2-1B | aimv2-large-patch14-448 | Qwen2.5-0.5B-Instruct | Huggingface | Space | | Ovis2-2B | aimv2-large-patch14-448 | Qwen2.5-1.5B-Instruct | Huggingface | Space | | Ovis2-4B | aimv2-huge-patch14-448 | Qwen2.5-3B-Instruct | Huggingface | Space | | Ovis2-8B | aimv2-huge-patch14-448 | Qwen2.5-7B-Instruct | Huggingface | Space | | Ovis2-16B | aimv2-huge-patch14-448 | Qwen2.5-14B-Instruct | Huggingface | Space | | Ovis2-34B | aimv2-1B-patch14-448 | Qwen2.5-32B-Instruct | Huggingface | - | ## Performance We use VLMEvalKit, as employed in the OpenCompass multimodal and reasoning leaderboard, to evaluate Ovis2. !image/png ### Image Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B-MPO | MiniCPM-o-2.6 | Ovis1.6-9B | InternVL2.5-4B-MPO | Ovis2-4B | Ovis2-8B | |:-----------------------------|:---------------:|:--------------------:|:---------------:|:------------:|:--------------------:|:----------:|:----------:| | MMBench-V1.1<sub>test</sub> | 82.6 | 82.0 | 80.6 | 80.5 | 77.8 | 81.4 | **83.6** | | MMStar | 64.1 | **65.2** | 63.3 | 62.9 | 61 | 61.9 | 64.6 | | MMMU<sub>val</sub> | 56.2 | 54.8 | 50.9 | 55 | 51.8 | 49.0 | **57.4** | | MathVista<sub>testmini</sub> | 65.8 | 67.9 | **73.3** | 67.3 | 64.1 | 69.6 | 71.8 | | HallusionBench | **56.3** | 51.7 | 51.1 | 52.2 | 47.5 | 53.8 | **56.3** | | AI2D | 84.1 | 84.5 | 86.1 | 84.4 | 81.5 | 85.7 | **86.6** | | OCRBench | 87.7 | 88.2 | 88.9 | 83 | 87.9 | **91.1** | 89.1 | | MMVet | 66.6 | **68.1** | 67.2 | 65 | 66 | 65.5 | 65.1 | | MMBench<sub>test</sub> | 83.4 | 83.2 | 83.2 | 82.7 | 79.6 | 83.2 | **84.9** | | MMT-Bench<sub>val</sub> | 62.7 | 62.5 | 62.3 | 64.9 | 61.6 | 65.2 | **66.6** | | RealWorldQA | 68.8 | 71.1 | 68.0 | 70.7 | 64.4 | 71.1 | **72.5** | | BLINK | 56.1 | **56.6** | 53.9 | 48.5 | 50.6 | 53.0 | 54.3 | | QBench | 77.9 | 73.8 | 78.7 | 76.7 | 71.5 | 78.1 | **78.9** | | ABench | 75.6 | 77.0 | **77.5** | 74.4 | 75.9 | **77.5** | 76.4 | | MTVQA | 28.5 | 27.2 | 23.1 | 19.2 | 28 | 29.4 | **29.7** | ### Video Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B | LLaVA-OV-7B | InternVL2.5-4B | Ovis2-4B | Ovis2-8B | |:--------------------|:-------------:|:--------------:|:------------------:|:--------------:|:---------:|:-------------:| | VideoMME(wo/w-subs) | 65.1/71.6 | 64.2 / 66.9 | 58.2/61.5 | 62.3 / 63.6 | 64.0/66.3 | **68.0/71.6** | | MVBench | 69.6 | **72.0** | 56.7 | 71.6 | 68.45 | 68.15 | | MLVU(M-Avg/G-Avg) | 70.2/- | 68.9/- | 64.7/- | 68.3/- | 70.8/4.23 | **76.4**/4.25 | | MMBench-Video | 1.79 | 1.68 | - | 1.73 | 1.69 | **1.85** | | TempCompass | **71.7** | - | - | - | 67.02 | 69.28 | ### Reasoning Benchmark !image/png ## Usage Below is a code snippet demonstrating how to run Ovis with various input types. For additional usage instructions, including inference wrapper and Gradio UI, please refer to Ovis GitHub. <details> <summary>Batch Inference</summary> </details> ## Citation If you find Ovis useful, please consider citing the paper ## License This project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0). ## Disclaimer We used compliance-checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.",
22
- "model_explanation_gemini": "\"Ovis2-8B is a multilingual multimodal large language model (MLLM) designed for image-text-to-text tasks, featuring enhanced reasoning, video/multi-image processing, and OCR capabilities, outperforming comparable models in benchmarks like MMBench-V1.1 and MMMU.\"\n\n**Model Features**: \n- Optimized small-model performance with high capability density. \n- Strengthened Chain-of-Thought (CoT) reasoning via instruction tuning and preference learning. \n- Supports"
 
 
 
 
 
23
  }
 
19
  "region:us"
20
  ],
21
  "description": "--- license: apache-2.0 datasets: - AIDC-AI/Ovis-dataset library_name: transformers tags: - MLLM pipeline_tag: image-text-to-text language: - en - zh --- # Ovis2-8B <div align=\"center\"> <img src= width=\"30%\"/> </div> ## Introduction GitHub | Paper We are pleased to announce the release of **Ovis2**, our latest advancement in multi-modal large language models (MLLMs). Ovis2 inherits the innovative architectural design of the Ovis series, aimed at structurally aligning visual and textual embeddings. As the successor to Ovis1.6, Ovis2 incorporates significant improvements in both dataset curation and training methodologies. **Key Features**: - **Small Model Performance**: Optimized training strategies enable small-scale models to achieve higher capability density, demonstrating cross-tier leading advantages. - **Enhanced Reasoning Capabilities**: Significantly strengthens Chain-of-Thought (CoT) reasoning abilities through the combination of instruction tuning and preference learning. - **Video and Multi-Image Processing**: Video and multi-image data are incorporated into training to enhance the ability to handle complex visual information across frames and images. - **Multilingual Support and OCR**: Enhances multilingual OCR beyond English and Chinese and improves structured data extraction from complex visual elements like tables and charts. <div align=\"center\"> <img src=\" width=\"100%\" /> </div> ## Model Zoo | Ovis MLLMs | ViT | LLM | Model Weights | Demo | |:-----------|:-----------------------:|:---------------------:|:-------------------------------------------------------:|:--------------------------------------------------------:| | Ovis2-1B | aimv2-large-patch14-448 | Qwen2.5-0.5B-Instruct | Huggingface | Space | | Ovis2-2B | aimv2-large-patch14-448 | Qwen2.5-1.5B-Instruct | Huggingface | Space | | Ovis2-4B | aimv2-huge-patch14-448 | Qwen2.5-3B-Instruct | Huggingface | Space | | Ovis2-8B | aimv2-huge-patch14-448 | Qwen2.5-7B-Instruct | Huggingface | Space | | Ovis2-16B | aimv2-huge-patch14-448 | Qwen2.5-14B-Instruct | Huggingface | Space | | Ovis2-34B | aimv2-1B-patch14-448 | Qwen2.5-32B-Instruct | Huggingface | - | ## Performance We use VLMEvalKit, as employed in the OpenCompass multimodal and reasoning leaderboard, to evaluate Ovis2. !image/png ### Image Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B-MPO | MiniCPM-o-2.6 | Ovis1.6-9B | InternVL2.5-4B-MPO | Ovis2-4B | Ovis2-8B | |:-----------------------------|:---------------:|:--------------------:|:---------------:|:------------:|:--------------------:|:----------:|:----------:| | MMBench-V1.1<sub>test</sub> | 82.6 | 82.0 | 80.6 | 80.5 | 77.8 | 81.4 | **83.6** | | MMStar | 64.1 | **65.2** | 63.3 | 62.9 | 61 | 61.9 | 64.6 | | MMMU<sub>val</sub> | 56.2 | 54.8 | 50.9 | 55 | 51.8 | 49.0 | **57.4** | | MathVista<sub>testmini</sub> | 65.8 | 67.9 | **73.3** | 67.3 | 64.1 | 69.6 | 71.8 | | HallusionBench | **56.3** | 51.7 | 51.1 | 52.2 | 47.5 | 53.8 | **56.3** | | AI2D | 84.1 | 84.5 | 86.1 | 84.4 | 81.5 | 85.7 | **86.6** | | OCRBench | 87.7 | 88.2 | 88.9 | 83 | 87.9 | **91.1** | 89.1 | | MMVet | 66.6 | **68.1** | 67.2 | 65 | 66 | 65.5 | 65.1 | | MMBench<sub>test</sub> | 83.4 | 83.2 | 83.2 | 82.7 | 79.6 | 83.2 | **84.9** | | MMT-Bench<sub>val</sub> | 62.7 | 62.5 | 62.3 | 64.9 | 61.6 | 65.2 | **66.6** | | RealWorldQA | 68.8 | 71.1 | 68.0 | 70.7 | 64.4 | 71.1 | **72.5** | | BLINK | 56.1 | **56.6** | 53.9 | 48.5 | 50.6 | 53.0 | 54.3 | | QBench | 77.9 | 73.8 | 78.7 | 76.7 | 71.5 | 78.1 | **78.9** | | ABench | 75.6 | 77.0 | **77.5** | 74.4 | 75.9 | **77.5** | 76.4 | | MTVQA | 28.5 | 27.2 | 23.1 | 19.2 | 28 | 29.4 | **29.7** | ### Video Benchmark | Benchmark | Qwen2.5-VL-7B | InternVL2.5-8B | LLaVA-OV-7B | InternVL2.5-4B | Ovis2-4B | Ovis2-8B | |:--------------------|:-------------:|:--------------:|:------------------:|:--------------:|:---------:|:-------------:| | VideoMME(wo/w-subs) | 65.1/71.6 | 64.2 / 66.9 | 58.2/61.5 | 62.3 / 63.6 | 64.0/66.3 | **68.0/71.6** | | MVBench | 69.6 | **72.0** | 56.7 | 71.6 | 68.45 | 68.15 | | MLVU(M-Avg/G-Avg) | 70.2/- | 68.9/- | 64.7/- | 68.3/- | 70.8/4.23 | **76.4**/4.25 | | MMBench-Video | 1.79 | 1.68 | - | 1.73 | 1.69 | **1.85** | | TempCompass | **71.7** | - | - | - | 67.02 | 69.28 | ### Reasoning Benchmark !image/png ## Usage Below is a code snippet demonstrating how to run Ovis with various input types. For additional usage instructions, including inference wrapper and Gradio UI, please refer to Ovis GitHub. <details> <summary>Batch Inference</summary> </details> ## Citation If you find Ovis useful, please consider citing the paper ## License This project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0). ## Disclaimer We used compliance-checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.",
22
+ "model_explanation_gemini": "\"Ovis2-8B is a multilingual multimodal large language model (MLLM) designed for image-text-to-text tasks, featuring enhanced reasoning, video/multi-image processing, and OCR capabilities, outperforming comparable models in benchmarks like MMBench-V1.1 and MMMU.\"\n\n**Model Features**: \n- Optimized small-model performance with high capability density. \n- Strengthened Chain-of-Thought (CoT) reasoning via instruction tuning and preference learning. \n- Supports",
23
+ "release_year": "2024",
24
+ "parameter_count": "8B",
25
+ "is_fine_tuned": false,
26
+ "category": "Multimodal",
27
+ "api_enhanced": true
28
  }
model_data_json/AITeamVN_Vietnamese_Embedding.json CHANGED
@@ -17,5 +17,11 @@
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 language: - vi base_model: - BAAI/bge-m3 pipeline_tag: sentence-similarity library_name: sentence-transformers tags: - Embedding --- ## Model Card: Vietnamese_Embedding Vietnamese_Embedding is an embedding model fine-tuned from the BGE-M3 model ( to enhance retrieval capabilities for Vietnamese. * The model was trained on approximately 300,000 triplets of queries, positive documents, and negative documents for Vietnamese. * The model was trained with a maximum sequence length of 2048. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** BAAI/bge-m3 - **Maximum Sequence Length:** 2048 tokens - **Output Dimensionality:** 1024 dimensions - **Similarity Function:** Dot product Similarity - **Language:** Vietnamese - **Licence:** Apache 2.0 ## Usage ### Evaluation: - Dataset: Entire training dataset of Legal Zalo 2021. Our model was not trained on this dataset. | Model | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 | MRR@10 | |----------------------|------------|------------|------------|-------------|--------------| | Vietnamese_Reranker | 0.7944 | 0.9324 | 0.9537 | 0.9740 | 0.8672 | | Vietnamese_Embedding_v2 | 0.7262 | 0.8927 | 0.9268 | 0.9578 | 0.8149 | | Vietnamese_Embedding (public) | 0.7274 | 0.8992 | 0.9305 | 0.9568 | 0.8181 | | Vietnamese-bi-encoder (BKAI) | 0.7109 | 0.8680 | 0.9014 | 0.9299 | 0.7951 | | BGE-M3 | 0.5682 | 0.7728 | 0.8382 | 0.8921 | 0.6822 | Vietnamese_Reranker and Vietnamese_Embedding_v2 was trained on 1100000 triplets. Although the score on the legal domain drops a bit on Vietnamese_Embedding_v2, since this phase data is much larger, it is very good for other domains. You can access 2 model via link: Vietnamese_Embedding_v2, Vietnamese_Reranker You can reproduce the evaluation result by running code python evaluation_model.py (data downloaded from Kaggle). ## Contact Email: [email protected] **Developer** Member: Nguyễn Nho Trung, Nguyễn Nhật Quang ## Citation",
20
- "model_explanation_gemini": "\"Fine-tuned from BGE-M3 to generate Vietnamese sentence embeddings for improved retrieval tasks, with a 1024-dimensional output and 2048-token sequence length.\"\n\n### Model Features: \n- **Base Model:** BAAI/bge-m3 \n- **Task:** Sentence similarity (dot product) \n- **Language:** Vietnamese \n- **Sequence Length:** 2048 tokens \n- **Output Dimension:** 1024 \n- **Training Data:** ~300K Vietnamese query-d"
 
 
 
 
 
 
21
  }
 
17
  "region:us"
18
  ],
19
  "description": "--- license: apache-2.0 language: - vi base_model: - BAAI/bge-m3 pipeline_tag: sentence-similarity library_name: sentence-transformers tags: - Embedding --- ## Model Card: Vietnamese_Embedding Vietnamese_Embedding is an embedding model fine-tuned from the BGE-M3 model ( to enhance retrieval capabilities for Vietnamese. * The model was trained on approximately 300,000 triplets of queries, positive documents, and negative documents for Vietnamese. * The model was trained with a maximum sequence length of 2048. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** BAAI/bge-m3 - **Maximum Sequence Length:** 2048 tokens - **Output Dimensionality:** 1024 dimensions - **Similarity Function:** Dot product Similarity - **Language:** Vietnamese - **Licence:** Apache 2.0 ## Usage ### Evaluation: - Dataset: Entire training dataset of Legal Zalo 2021. Our model was not trained on this dataset. | Model | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 | MRR@10 | |----------------------|------------|------------|------------|-------------|--------------| | Vietnamese_Reranker | 0.7944 | 0.9324 | 0.9537 | 0.9740 | 0.8672 | | Vietnamese_Embedding_v2 | 0.7262 | 0.8927 | 0.9268 | 0.9578 | 0.8149 | | Vietnamese_Embedding (public) | 0.7274 | 0.8992 | 0.9305 | 0.9568 | 0.8181 | | Vietnamese-bi-encoder (BKAI) | 0.7109 | 0.8680 | 0.9014 | 0.9299 | 0.7951 | | BGE-M3 | 0.5682 | 0.7728 | 0.8382 | 0.8921 | 0.6822 | Vietnamese_Reranker and Vietnamese_Embedding_v2 was trained on 1100000 triplets. Although the score on the legal domain drops a bit on Vietnamese_Embedding_v2, since this phase data is much larger, it is very good for other domains. You can access 2 model via link: Vietnamese_Embedding_v2, Vietnamese_Reranker You can reproduce the evaluation result by running code python evaluation_model.py (data downloaded from Kaggle). ## Contact Email: [email protected] **Developer** Member: Nguyễn Nho Trung, Nguyễn Nhật Quang ## Citation",
20
+ "model_explanation_gemini": "\"Fine-tuned from BGE-M3 to generate Vietnamese sentence embeddings for improved retrieval tasks, with a 1024-dimensional output and 2048-token sequence length.\"\n\n### Model Features: \n- **Base Model:** BAAI/bge-m3 \n- **Task:** Sentence similarity (dot product) \n- **Language:** Vietnamese \n- **Sequence Length:** 2048 tokens \n- **Output Dimension:** 1024 \n- **Training Data:** ~300K Vietnamese query-d",
21
+ "release_year": null,
22
+ "parameter_count": null,
23
+ "is_fine_tuned": true,
24
+ "category": "Embedding",
25
+ "model_family": "BERT",
26
+ "api_enhanced": true
27
  }
model_data_json/AdamCodd_vit-base-nsfw-detector.json CHANGED
@@ -16,5 +16,10 @@
16
  "region:us"
17
  ],
18
  "description": "--- metrics: - accuracy pipeline_tag: image-classification base_model: google/vit-base-patch16-384 model-index: - name: AdamCodd/vit-base-nsfw-detector results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.9654 name: Accuracy - type: AUC value: 0.9948 - type: loss value: 0.0937 name: Loss license: apache-2.0 tags: - transformers.js - transformers - nlp --- # vit-base-nsfw-detector This model is a fine-tuned version of vit-base-patch16-384 on around 25_000 images (drawings, photos...). It achieves the following results on the evaluation set: - Loss: 0.0937 - Accuracy: 0.9654 **<u>New [07/30]</u>**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): **AdamCodd/vit-nsfw-stable-diffusion**. **Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea: - Loss: 0.3682 (↑ 292.95%) - Accuracy: 0.8600 (↓ 10.91%) - F1: 0.8654 - AUC: 0.9376 (↓ 5.75%) - Precision: 0.8350 - Recall: 0.8980 ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. ## Intended uses & limitations There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify \"sexy\" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal. Usage for a local image: Usage for a distant image: Usage with Transformers.js (Vanilla JS): The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy. You can find a toy implementation of this model with Transformers.js here. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 1 ### Training results - Validation Loss: 0.0937 - Accuracy: 0.9654, - AUC: 0.9948 Confusion matrix (eval): [1076 37] [ 60 1627] ### Framework versions - Transformers 4.36.2 - Evaluate 0.4.1 If you want to support me, you can here.",
19
- "model_explanation_gemini": "Classifies images as SFW or NSFW with high accuracy but performs worse on AI-generated images. \n\n**Features:** \n- Fine-tuned ViT model (google/vit-base-patch16-384) \n- 96.54% accuracy on evaluation set \n- Two classes: SFW and NSFW (includes \"sexy\" content) \n- Trained on ~25k diverse images (drawings, photos) \n- Not optimized for generative/AI-created images"
 
 
 
 
 
20
  }
 
16
  "region:us"
17
  ],
18
  "description": "--- metrics: - accuracy pipeline_tag: image-classification base_model: google/vit-base-patch16-384 model-index: - name: AdamCodd/vit-base-nsfw-detector results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.9654 name: Accuracy - type: AUC value: 0.9948 - type: loss value: 0.0937 name: Loss license: apache-2.0 tags: - transformers.js - transformers - nlp --- # vit-base-nsfw-detector This model is a fine-tuned version of vit-base-patch16-384 on around 25_000 images (drawings, photos...). It achieves the following results on the evaluation set: - Loss: 0.0937 - Accuracy: 0.9654 **<u>New [07/30]</u>**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): **AdamCodd/vit-nsfw-stable-diffusion**. **Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea: - Loss: 0.3682 (↑ 292.95%) - Accuracy: 0.8600 (↓ 10.91%) - F1: 0.8654 - AUC: 0.9376 (↓ 5.75%) - Precision: 0.8350 - Recall: 0.8980 ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. ## Intended uses & limitations There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify \"sexy\" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal. Usage for a local image: Usage for a distant image: Usage with Transformers.js (Vanilla JS): The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy. You can find a toy implementation of this model with Transformers.js here. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 1 ### Training results - Validation Loss: 0.0937 - Accuracy: 0.9654, - AUC: 0.9948 Confusion matrix (eval): [1076 37] [ 60 1627] ### Framework versions - Transformers 4.36.2 - Evaluate 0.4.1 If you want to support me, you can here.",
19
+ "model_explanation_gemini": "Classifies images as SFW or NSFW with high accuracy but performs worse on AI-generated images. \n\n**Features:** \n- Fine-tuned ViT model (google/vit-base-patch16-384) \n- 96.54% accuracy on evaluation set \n- Two classes: SFW and NSFW (includes \"sexy\" content) \n- Trained on ~25k diverse images (drawings, photos) \n- Not optimized for generative/AI-created images",
20
+ "release_year": null,
21
+ "parameter_count": null,
22
+ "is_fine_tuned": true,
23
+ "category": "Vision",
24
+ "api_enhanced": true
25
  }
model_data_json/AhmedTaha012_finance-ner-v0.0.8-finetuned-ner.json CHANGED
@@ -14,5 +14,11 @@
14
  "region:us"
15
  ],
16
  "description": "--- license: mit tags: - generated_from_trainer metrics: - precision - recall - f1 - accuracy model-index: - name: finance-ner-v0.0.8-finetuned-ner results: [] --- <!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. --> # finance-ner-v0.0.8-finetuned-ner This model is a fine-tuned version of dslim/bert-base-NER on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0001 - Precision: 0.9994 - Recall: 0.9997 - F1: 0.9995 - Accuracy: 1.0000 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | 0.0 | 1.0 | 4176 | 0.0001 | 0.9993 | 0.9998 | 0.9996 | 1.0000 | ### Framework versions - Transformers 4.30.2 - Pytorch 2.0.0 - Datasets 2.1.0 - Tokenizers 0.13.3",
17
- "model_explanation_gemini": "Detects financial entities in text by fine-tuning BERT-base-NER, achieving near-perfect precision, recall, and accuracy. \n\n**Features:** \n- Fine-tuned from `dslim/bert-base-NER` \n- Metrics: 99.94% precision, 99.97% recall, 99.95% F1, 100% accuracy \n- Training hyperparameters: 2e-5 learning rate, Adam optimizer, 1 epoch \n\n**Comparison:**"
 
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- license: mit tags: - generated_from_trainer metrics: - precision - recall - f1 - accuracy model-index: - name: finance-ner-v0.0.8-finetuned-ner results: [] --- <!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. --> # finance-ner-v0.0.8-finetuned-ner This model is a fine-tuned version of dslim/bert-base-NER on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0001 - Precision: 0.9994 - Recall: 0.9997 - F1: 0.9995 - Accuracy: 1.0000 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | 0.0 | 1.0 | 4176 | 0.0001 | 0.9993 | 0.9998 | 0.9996 | 1.0000 | ### Framework versions - Transformers 4.30.2 - Pytorch 2.0.0 - Datasets 2.1.0 - Tokenizers 0.13.3",
17
+ "model_explanation_gemini": "Detects financial entities in text by fine-tuning BERT-base-NER, achieving near-perfect precision, recall, and accuracy. \n\n**Features:** \n- Fine-tuned from `dslim/bert-base-NER` \n- Metrics: 99.94% precision, 99.97% recall, 99.95% F1, 100% accuracy \n- Training hyperparameters: 2e-5 learning rate, Adam optimizer, 1 epoch \n\n**Comparison:**",
18
+ "release_year": null,
19
+ "parameter_count": null,
20
+ "is_fine_tuned": true,
21
+ "category": "Named Entity Recognition",
22
+ "model_family": "BERT",
23
+ "api_enhanced": true
24
  }
model_data_json/Alibaba-NLP_gme-Qwen2-VL-2B-Instruct.json CHANGED
@@ -24,5 +24,11 @@
24
  "region:us"
25
  ],
26
  "description": "--- license: apache-2.0 base_model: - Qwen/Qwen2-VL-2B-Instruct language: - en - zh tags: - mteb - sentence-transformers - transformers - Qwen2-VL - sentence-similarity - vidore model-index: - name: external results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_pearson value: 61.03190209456061 - type: cos_sim_spearman value: 67.54853383020948 - type: euclidean_pearson value: 65.38958681599493 - type: euclidean_spearman value: 67.54853383020948 - type: manhattan_pearson value: 65.25341659273157 - type: manhattan_spearman value: 67.34190190683134 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_pearson value: 50.83794357648487 - type: cos_sim_spearman value: 54.03230997664373 - type: euclidean_pearson value: 55.2072028123375 - type: euclidean_spearman value: 54.032311102613264 - type: manhattan_pearson value: 55.05163232251946 - type: manhattan_spearman value: 53.81272176804127 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 72.55223880597015 - type: ap value: 35.01515316721116 - type: f1 value: 66.44086070814382 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 96.75819999999999 - type: ap value: 95.51009242092881 - type: f1 value: 96.75713119357414 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 61.971999999999994 - type: f1 value: 60.50745575187704 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.49 - type: f1 value: 51.576550662258434 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 36.272999999999996 - type: map_at_10 value: 52.782 - type: map_at_100 value: 53.339999999999996 - type: map_at_1000 value: 53.342999999999996 - type: map_at_3 value: 48.4 - type: map_at_5 value: 50.882000000000005 - type: mrr_at_1 value: 36.984 - type: mrr_at_10 value: 53.052 - type: mrr_at_100 value: 53.604 - type: mrr_at_1000 value: 53.607000000000006 - type: mrr_at_3 value: 48.613 - type: mrr_at_5 value: 51.159 - type: ndcg_at_1 value: 36.272999999999996 - type: ndcg_at_10 value: 61.524 - type: ndcg_at_100 value: 63.796 - type: ndcg_at_1000 value: 63.869 - type: ndcg_at_3 value: 52.456 - type: ndcg_at_5 value: 56.964000000000006 - type: precision_at_1 value: 36.272999999999996 - type: precision_at_10 value: 8.926 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.407999999999998 - type: precision_at_5 value: 15.049999999999999 - type: recall_at_1 value: 36.272999999999996 - type: recall_at_10 value: 89.25999999999999 - type: recall_at_100 value: 98.933 - type: recall_at_1000 value: 99.502 - type: recall_at_3 value: 64.225 - type: recall_at_5 value: 75.249 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 52.45236368396085 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 46.83781937870832 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.653430349851746 - type: mrr value: 74.28736314470387 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.18568151905953 - type: cos_sim_spearman value: 86.47666922475281 - type: euclidean_pearson value: 87.25416218056225 - type: euclidean_spearman value: 86.47666922475281 - type: manhattan_pearson value: 87.04960508086356 - type: manhattan_spearman value: 86.73992823533615 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_pearson value: 75.7464284612374 - type: cos_sim_spearman value: 77.71894224189296 - type: euclidean_pearson value: 77.63454068918787 - type: euclidean_spearman value: 77.71894224189296 - type: manhattan_pearson value: 77.58744810404339 - type: manhattan_spearman value: 77.63293552726073 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 80.2435064935065 - type: f1 value: 79.44078343737895 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 44.68220155432257 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 40.666150477589284 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 44.23533333311907 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 43.01114481307774 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.4349853821696 - type: mrr value: 88.80150793650795 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.56417400982208 - type: mrr value: 89.85813492063491 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.623 - type: map_at_10 value: 40.482 - type: map_at_100 value: 41.997 - type: map_at_1000 value: 42.135 - type: map_at_3 value: 37.754 - type: map_at_5 value: 39.031 - type: mrr_at_1 value: 37.482 - type: mrr_at_10 value: 46.311 - type: mrr_at_100 value: 47.211999999999996 - type: mrr_at_1000 value: 47.27 - type: mrr_at_3 value: 44.157999999999994 - type: mrr_at_5 value: 45.145 - type: ndcg_at_1 value: 37.482 - type: ndcg_at_10 value: 46.142 - type: ndcg_at_100 value: 51.834 - type: ndcg_at_1000 value: 54.164 - type: ndcg_at_3 value: 42.309000000000005 - type: ndcg_at_5 value: 43.485 - type: precision_at_1 value: 37.482 - type: precision_at_10 value: 8.455 - type: precision_at_100 value: 1.3780000000000001 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 20.172 - type: precision_at_5 value: 13.705 - type: recall_at_1 value: 30.623 - type: recall_at_10 value: 56.77100000000001 - type: recall_at_100 value: 80.034 - type: recall_at_1000 value: 94.62899999999999 - type: recall_at_3 value: 44.663000000000004 - type: recall_at_5 value: 48.692 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 27.941 - type: map_at_10 value: 38.437 - type: map_at_100 value: 39.625 - type: map_at_1000 value: 39.753 - type: map_at_3 value: 35.388999999999996 - type: map_at_5 value: 37.113 - type: mrr_at_1 value: 34.522000000000006 - type: mrr_at_10 value: 43.864999999999995 - type: mrr_at_100 value: 44.533 - type: mrr_at_1000 value: 44.580999999999996 - type: mrr_at_3 value: 41.55 - type: mrr_at_5 value: 42.942 - type: ndcg_at_1 value: 34.522000000000006 - type: ndcg_at_10 value: 44.330000000000005 - type: ndcg_at_100 value: 48.61 - type: ndcg_at_1000 value: 50.712999999999994 - type: ndcg_at_3 value: 39.834 - type: ndcg_at_5 value: 42.016 - type: precision_at_1 value: 34.522000000000006 - type: precision_at_10 value: 8.471 - type: precision_at_100 value: 1.3379999999999999 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 19.363 - type: precision_at_5 value: 13.898 - type: recall_at_1 value: 27.941 - type: recall_at_10 value: 55.336 - type: recall_at_100 value: 73.51100000000001 - type: recall_at_1000 value: 86.636 - type: recall_at_3 value: 42.54 - type: recall_at_5 value: 48.392 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 32.681 - type: map_at_10 value: 45.48 - type: map_at_100 value: 46.542 - type: map_at_1000 value: 46.604 - type: map_at_3 value: 42.076 - type: map_at_5 value: 44.076 - type: mrr_at_1 value: 37.492 - type: mrr_at_10 value: 48.746 - type: mrr_at_100 value: 49.485 - type: mrr_at_1000 value: 49.517 - type: mrr_at_3 value: 45.998 - type: mrr_at_5 value: 47.681000000000004 - type: ndcg_at_1 value: 37.492 - type: ndcg_at_10 value: 51.778999999999996 - type: ndcg_at_100 value: 56.294 - type: ndcg_at_1000 value: 57.58 - type: ndcg_at_3 value: 45.856 - type: ndcg_at_5 value: 48.968 - type: precision_at_1 value: 37.492 - type: precision_at_10 value: 8.620999999999999 - type: precision_at_100 value: 1.189 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 20.773 - type: precision_at_5 value: 14.596 - type: recall_at_1 value: 32.681 - type: recall_at_10 value: 67.196 - type: recall_at_100 value: 87.027 - type: recall_at_1000 value: 96.146 - type: recall_at_3 value: 51.565000000000005 - type: recall_at_5 value: 59.123999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 22.421 - type: map_at_10 value: 30.127 - type: map_at_100 value: 31.253999999999998 - type: map_at_1000 value: 31.344 - type: map_at_3 value: 27.673 - type: map_at_5 value: 29.182000000000002 - type: mrr_at_1 value: 24.068 - type: mrr_at_10 value: 31.857000000000003 - type: mrr_at_100 value: 32.808 - type: mrr_at_1000 value: 32.881 - type: mrr_at_3 value: 29.397000000000002 - type: mrr_at_5 value: 30.883 - type: ndcg_at_1 value: 24.068 - type: ndcg_at_10 value: 34.642 - type: ndcg_at_100 value: 40.327 - type: ndcg_at_1000 value: 42.55 - type: ndcg_at_3 value: 29.868 - type: ndcg_at_5 value: 32.461 - type: precision_at_1 value: 24.068 - type: precision_at_10 value: 5.390000000000001 - type: precision_at_100 value: 0.873 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 12.692999999999998 - type: precision_at_5 value: 9.107 - type: recall_at_1 value: 22.421 - type: recall_at_10 value: 46.846 - type: recall_at_100 value: 73.409 - type: recall_at_1000 value: 90.06 - type: recall_at_3 value: 34.198 - type: recall_at_5 value: 40.437 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 16.494 - type: map_at_10 value: 24.4 - type: map_at_100 value: 25.718999999999998 - type: map_at_1000 value: 25.840000000000003 - type: map_at_3 value: 21.731 - type: map_at_5 value: 23.247999999999998 - type: mrr_at_1 value: 20.274 - type: mrr_at_10 value: 28.866000000000003 - type: mrr_at_100 value: 29.889 - type: mrr_at_1000 value: 29.957 - type: mrr_at_3 value: 26.284999999999997 - type: mrr_at_5 value: 27.79 - type: ndcg_at_1 value: 20.274 - type: ndcg_at_10 value: 29.666999999999998 - type: ndcg_at_100 value: 36.095 - type: ndcg_at_1000 value: 38.87 - type: ndcg_at_3 value: 24.672 - type: ndcg_at_5 value: 27.106 - type: precision_at_1 value: 20.274 - type: precision_at_10 value: 5.5969999999999995 - type: precision_at_100 value: 1.04 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.023 - type: precision_at_5 value: 8.98 - type: recall_at_1 value: 16.494 - type: recall_at_10 value: 41.400999999999996 - type: recall_at_100 value: 69.811 - type: recall_at_1000 value: 89.422 - type: recall_at_3 value: 27.834999999999997 - type: recall_at_5 value: 33.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 26.150000000000002 - type: map_at_10 value: 36.012 - type: map_at_100 value: 37.377 - type: map_at_1000 value: 37.497 - type: map_at_3 value: 32.712 - type: map_at_5 value: 34.475 - type: mrr_at_1 value: 32.05 - type: mrr_at_10 value: 41.556 - type: mrr_at_100 value: 42.451 - type: mrr_at_1000 value: 42.498000000000005 - type: mrr_at_3 value: 38.659 - type: mrr_at_5 value: 40.314 - type: ndcg_at_1 value: 32.05 - type: ndcg_at_10 value: 42.132 - type: ndcg_at_100 value: 48.028999999999996 - type: ndcg_at_1000 value: 50.229 - type: ndcg_at_3 value: 36.622 - type: ndcg_at_5 value: 39.062000000000005 - type: precision_at_1 value: 32.05 - type: precision_at_10 value: 7.767 - type: precision_at_100 value: 1.269 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.355999999999998 - type: precision_at_5 value: 12.474 - type: recall_at_1 value: 26.150000000000002 - type: recall_at_10 value: 55.205000000000005 - type: recall_at_100 value: 80.2 - type: recall_at_1000 value: 94.524 - type: recall_at_3 value: 39.322 - type: recall_at_5 value: 45.761 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 23.741 - type: map_at_10 value: 33.51 - type: map_at_100 value: 34.882999999999996 - type: map_at_1000 value: 34.995 - type: map_at_3 value: 30.514000000000003 - type: map_at_5 value: 32.085 - type: mrr_at_1 value: 28.653000000000002 - type: mrr_at_10 value: 38.059 - type: mrr_at_100 value: 39.050000000000004 - type: mrr_at_1000 value: 39.107 - type: mrr_at_3 value: 35.445 - type: mrr_at_5 value: 36.849 - type: ndcg_at_1 value: 28.653000000000002 - type: ndcg_at_10 value: 39.186 - type: ndcg_at_100 value: 45.301 - type: ndcg_at_1000 value: 47.547 - type: ndcg_at_3 value: 34.103 - type: ndcg_at_5 value: 36.239 - type: precision_at_1 value: 28.653000000000002 - type: precision_at_10 value: 7.295 - type: precision_at_100 value: 1.2189999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 16.438 - type: precision_at_5 value: 11.804 - type: recall_at_1 value: 23.741 - type: recall_at_10 value: 51.675000000000004 - type: recall_at_100 value: 78.13799999999999 - type: recall_at_1000 value: 93.12700000000001 - type: recall_at_3 value: 37.033 - type: recall_at_5 value: 42.793 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.281666666666663 - type: map_at_10 value: 34.080666666666666 - type: map_at_100 value: 35.278749999999995 - type: map_at_1000 value: 35.40183333333333 - type: map_at_3 value: 31.45316666666667 - type: map_at_5 value: 32.92716666666667 - type: mrr_at_1 value: 29.78783333333333 - type: mrr_at_10 value: 38.077333333333335 - type: mrr_at_100 value: 38.936499999999995 - type: mrr_at_1000 value: 39.000249999999994 - type: mrr_at_3 value: 35.7735 - type: mrr_at_5 value: 37.07683333333334 - type: ndcg_at_1 value: 29.78783333333333 - type: ndcg_at_10 value: 39.18300000000001 - type: ndcg_at_100 value: 44.444750000000006 - type: ndcg_at_1000 value: 46.90316666666667 - type: ndcg_at_3 value: 34.69308333333333 - type: ndcg_at_5 value: 36.80316666666666 - type: precision_at_1 value: 29.78783333333333 - type: precision_at_10 value: 6.820749999999999 - type: precision_at_100 value: 1.1224166666666666 - type: precision_at_1000 value: 0.1525 - type: precision_at_3 value: 15.936333333333335 - type: precision_at_5 value: 11.282333333333334 - type: recall_at_1 value: 25.281666666666663 - type: recall_at_10 value: 50.282 - type: recall_at_100 value: 73.54558333333334 - type: recall_at_1000 value: 90.64241666666666 - type: recall_at_3 value: 37.800999999999995 - type: recall_at_5 value: 43.223000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 23.452 - type: map_at_10 value: 30.231 - type: map_at_100 value: 31.227 - type: map_at_1000 value: 31.338 - type: map_at_3 value: 28.083000000000002 - type: map_at_5 value: 29.125 - type: mrr_at_1 value: 25.613000000000003 - type: mrr_at_10 value: 32.62 - type: mrr_at_100 value: 33.469 - type: mrr_at_1000 value: 33.554 - type: mrr_at_3 value: 30.368000000000002 - type: mrr_at_5 value: 31.502999999999997 - type: ndcg_at_1 value: 25.613000000000003 - type: ndcg_at_10 value: 34.441 - type: ndcg_at_100 value: 39.253 - type: ndcg_at_1000 value: 42.105 - type: ndcg_at_3 value: 30.183 - type: ndcg_at_5 value: 31.917 - type: precision_at_1 value: 25.613000000000003 - type: precision_at_10 value: 5.367999999999999 - type: precision_at_100 value: 0.848 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 8.773 - type: recall_at_1 value: 23.452 - type: recall_at_10 value: 45.021 - type: recall_at_100 value: 66.563 - type: recall_at_1000 value: 87.713 - type: recall_at_3 value: 33.433 - type: recall_at_5 value: 37.637 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.11 - type: map_at_10 value: 22.832 - type: map_at_100 value: 23.829 - type: map_at_1000 value: 23.959 - type: map_at_3 value: 20.66 - type: map_at_5 value: 21.851000000000003 - type: mrr_at_1 value: 19.408 - type: mrr_at_10 value: 26.354 - type: mrr_at_100 value: 27.237000000000002 - type: mrr_at_1000 value: 27.32 - type: mrr_at_3 value: 24.243000000000002 - type: mrr_at_5 value: 25.430000000000003 - type: ndcg_at_1 value: 19.408 - type: ndcg_at_10 value: 27.239 - type: ndcg_at_100 value: 32.286 - type: ndcg_at_1000 value: 35.498000000000005 - type: ndcg_at_3 value: 23.244 - type: ndcg_at_5 value: 25.080999999999996 - type: precision_at_1 value: 19.408 - type: precision_at_10 value: 4.917 - type: precision_at_100 value: 0.874 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 10.863 - type: precision_at_5 value: 7.887 - type: recall_at_1 value: 16.11 - type: recall_at_10 value: 37.075 - type: recall_at_100 value: 60.251999999999995 - type: recall_at_1000 value: 83.38600000000001 - type: recall_at_3 value: 25.901999999999997 - type: recall_at_5 value: 30.612000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 25.941 - type: map_at_10 value: 33.711999999999996 - type: map_at_100 value: 34.926 - type: map_at_1000 value: 35.05 - type: map_at_3 value: 31.075000000000003 - type: map_at_5 value: 32.611000000000004 - type: mrr_at_1 value: 30.784 - type: mrr_at_10 value: 38.079 - type: mrr_at_100 value: 39.018 - type: mrr_at_1000 value: 39.09 - type: mrr_at_3 value: 35.603 - type: mrr_at_5 value: 36.988 - type: ndcg_at_1 value: 30.784 - type: ndcg_at_10 value: 38.586 - type: ndcg_at_100 value: 44.205 - type: ndcg_at_1000 value: 46.916000000000004 - type: ndcg_at_3 value: 33.899 - type: ndcg_at_5 value: 36.11 - type: precision_at_1 value: 30.784 - type: precision_at_10 value: 6.409 - type: precision_at_100 value: 1.034 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 15.112 - type: precision_at_5 value: 10.728 - type: recall_at_1 value: 25.941 - type: recall_at_10 value: 49.242999999999995 - type: recall_at_100 value: 73.85000000000001 - type: recall_at_1000 value: 92.782 - type: recall_at_3 value: 36.204 - type: recall_at_5 value: 41.908 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 24.401999999999997 - type: map_at_10 value: 33.195 - type: map_at_100 value: 34.699999999999996 - type: map_at_1000 value: 34.946 - type: map_at_3 value: 30.570999999999998 - type: map_at_5 value: 32.0 - type: mrr_at_1 value: 28.656 - type: mrr_at_10 value: 37.039 - type: mrr_at_100 value: 38.049 - type: mrr_at_1000 value: 38.108 - type: mrr_at_3 value: 34.717 - type: mrr_at_5 value: 36.07 - type: ndcg_at_1 value: 28.656 - type: ndcg_at_10 value: 38.557 - type: ndcg_at_100 value: 44.511 - type: ndcg_at_1000 value: 47.346 - type: ndcg_at_3 value: 34.235 - type: ndcg_at_5 value: 36.260999999999996 - type: precision_at_1 value: 28.656 - type: precision_at_10 value: 7.312 - type: precision_at_100 value: 1.451 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 15.942 - type: precision_at_5 value: 11.66 - type: recall_at_1 value: 24.401999999999997 - type: recall_at_10 value: 48.791000000000004 - type: recall_at_100 value: 76.211 - type: recall_at_1000 value: 93.92 - type: recall_at_3 value: 36.975 - type: recall_at_5 value: 42.01 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 19.07 - type: map_at_10 value: 26.608999999999998 - type: map_at_100 value: 27.625 - type: map_at_1000 value: 27.743000000000002 - type: map_at_3 value: 24.532999999999998 - type: map_at_5 value: 25.671 - type: mrr_at_1 value: 20.518 - type: mrr_at_10 value: 28.541 - type: mrr_at_100 value: 29.453000000000003 - type: mrr_at_1000 value: 29.536 - type: mrr_at_3 value: 26.71 - type: mrr_at_5 value: 27.708 - type: ndcg_at_1 value: 20.518 - type: ndcg_at_10 value: 30.855 - type: ndcg_at_100 value: 35.973 - type: ndcg_at_1000 value: 38.827 - type: ndcg_at_3 value: 26.868 - type: ndcg_at_5 value: 28.74 - type: precision_at_1 value: 20.518 - type: precision_at_10 value: 4.843 - type: precision_at_100 value: 0.799 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 11.645 - type: precision_at_5 value: 8.133 - type: recall_at_1 value: 19.07 - type: recall_at_10 value: 41.925000000000004 - type: recall_at_100 value: 65.68 - type: recall_at_1000 value: 86.713 - type: recall_at_3 value: 31.251 - type: recall_at_5 value: 35.653 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 18.762 - type: map_at_10 value: 32.412 - type: map_at_100 value: 34.506 - type: map_at_1000 value: 34.678 - type: map_at_3 value: 27.594 - type: map_at_5 value: 30.128 - type: mrr_at_1 value: 42.345 - type: mrr_at_10 value: 54.443 - type: mrr_at_100 value: 55.05799999999999 - type: mrr_at_1000 value: 55.076 - type: mrr_at_3 value: 51.553000000000004 - type: mrr_at_5 value: 53.269 - type: ndcg_at_1 value: 42.345 - type: ndcg_at_10 value: 42.304 - type: ndcg_at_100 value: 49.425000000000004 - type: ndcg_at_1000 value: 52.123 - type: ndcg_at_3 value: 36.271 - type: ndcg_at_5 value: 38.216 - type: precision_at_1 value: 42.345 - type: precision_at_10 value: 12.808 - type: precision_at_100 value: 2.062 - type: precision_at_1000 value: 0.258 - type: precision_at_3 value: 26.840000000000003 - type: precision_at_5 value: 20.052 - type: recall_at_1 value: 18.762 - type: recall_at_10 value: 47.976 - type: recall_at_100 value: 71.86 - type: recall_at_1000 value: 86.61999999999999 - type: recall_at_3 value: 32.708999999999996 - type: recall_at_5 value: 39.151 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: map_at_1 value: 24.871 - type: map_at_10 value: 37.208999999999996 - type: map_at_100 value: 38.993 - type: map_at_1000 value: 39.122 - type: map_at_3 value: 33.2 - type: map_at_5 value: 35.33 - type: mrr_at_1 value: 37.884 - type: mrr_at_10 value: 46.189 - type: mrr_at_100 value: 47.147 - type: mrr_at_1000 value: 47.195 - type: mrr_at_3 value: 43.728 - type: mrr_at_5 value: 44.994 - type: ndcg_at_1 value: 37.884 - type: ndcg_at_10 value: 43.878 - type: ndcg_at_100 value: 51.002 - type: ndcg_at_1000 value: 53.161 - type: ndcg_at_3 value: 38.729 - type: ndcg_at_5 value: 40.628 - type: precision_at_1 value: 37.884 - type: precision_at_10 value: 9.75 - type: precision_at_100 value: 1.558 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 21.964 - type: precision_at_5 value: 15.719 - type: recall_at_1 value: 24.871 - type: recall_at_10 value: 54.615 - type: recall_at_100 value: 84.276 - type: recall_at_1000 value: 98.578 - type: recall_at_3 value: 38.936 - type: recall_at_5 value: 45.061 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_accuracy value: 76.12748045700542 - type: cos_sim_ap value: 84.47948419710998 - type: cos_sim_f1 value: 77.88108108108108 - type: cos_sim_precision value: 72.43112809169516 - type: cos_sim_recall value: 84.21790974982464 - type: dot_accuracy value: 76.12748045700542 - type: dot_ap value: 84.4933237839786 - type: dot_f1 value: 77.88108108108108 - type: dot_precision value: 72.43112809169516 - type: dot_recall value: 84.21790974982464 - type: euclidean_accuracy value: 76.12748045700542 - type: euclidean_ap value: 84.47947997540409 - type: euclidean_f1 value: 77.88108108108108 - type: euclidean_precision value: 72.43112809169516 - type: euclidean_recall value: 84.21790974982464 - type: manhattan_accuracy value: 75.40589296452195 - type: manhattan_ap value: 83.74383956930585 - type: manhattan_f1 value: 77.0983342289092 - type: manhattan_precision value: 71.34049323786795 - type: manhattan_recall value: 83.86719663315408 - type: max_accuracy value: 76.12748045700542 - type: max_ap value: 84.4933237839786 - type: max_f1 value: 77.88108108108108 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: map_at_1 value: 66.781 - type: map_at_10 value: 74.539 - type: map_at_100 value: 74.914 - type: map_at_1000 value: 74.921 - type: map_at_3 value: 72.734 - type: map_at_5 value: 73.788 - type: mrr_at_1 value: 66.913 - type: mrr_at_10 value: 74.543 - type: mrr_at_100 value: 74.914 - type: mrr_at_1000 value: 74.921 - type: mrr_at_3 value: 72.831 - type: mrr_at_5 value: 73.76899999999999 - type: ndcg_at_1 value: 67.018 - type: ndcg_at_10 value: 78.34299999999999 - type: ndcg_at_100 value: 80.138 - type: ndcg_at_1000 value: 80.322 - type: ndcg_at_3 value: 74.667 - type: ndcg_at_5 value: 76.518 - type: precision_at_1 value: 67.018 - type: precision_at_10 value: 9.115 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 26.906000000000002 - type: precision_at_5 value: 17.092 - type: recall_at_1 value: 66.781 - type: recall_at_10 value: 90.253 - type: recall_at_100 value: 98.52499999999999 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 80.05799999999999 - type: recall_at_5 value: 84.615 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.685 - type: map_at_10 value: 21.65 - type: map_at_100 value: 30.952 - type: map_at_1000 value: 33.049 - type: map_at_3 value: 14.953 - type: map_at_5 value: 17.592 - type: mrr_at_1 value: 72.0 - type: mrr_at_10 value: 78.054 - type: mrr_at_100 value: 78.41900000000001 - type: mrr_at_1000 value: 78.425 - type: mrr_at_3 value: 76.5 - type: mrr_at_5 value: 77.28699999999999 - type: ndcg_at_1 value: 61.25000000000001 - type: ndcg_at_10 value: 46.306000000000004 - type: ndcg_at_100 value: 50.867 - type: ndcg_at_1000 value: 58.533 - type: ndcg_at_3 value: 50.857 - type: ndcg_at_5 value: 48.283 - type: precision_at_1 value: 72.0 - type: precision_at_10 value: 37.3 - type: precision_at_100 value: 11.95 - type: precision_at_1000 value: 2.528 - type: precision_at_3 value: 53.583000000000006 - type: precision_at_5 value: 46.6 - type: recall_at_1 value: 9.685 - type: recall_at_10 value: 27.474999999999998 - type: recall_at_100 value: 56.825 - type: recall_at_1000 value: 81.792 - type: recall_at_3 value: 15.939 - type: recall_at_5 value: 19.853 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: map_at_1 value: 24.528 - type: map_at_10 value: 76.304 - type: map_at_100 value: 79.327 - type: map_at_1000 value: 79.373 - type: map_at_3 value: 52.035 - type: map_at_5 value: 66.074 - type: mrr_at_1 value: 86.05000000000001 - type: mrr_at_10 value: 90.74 - type: mrr_at_100 value: 90.809 - type: mrr_at_1000 value: 90.81099999999999 - type: mrr_at_3 value: 90.30799999999999 - type: mrr_at_5 value: 90.601 - type: ndcg_at_1 value: 86.05000000000001 - type: ndcg_at_10 value: 84.518 - type: ndcg_at_100 value: 87.779 - type: ndcg_at_1000 value: 88.184 - type: ndcg_at_3 value: 82.339 - type: ndcg_at_5 value: 81.613 - type: precision_at_1 value: 86.05000000000001 - type: precision_at_10 value: 40.945 - type: precision_at_100 value: 4.787 - type: precision_at_1000 value: 0.48900000000000005 - type: precision_at_3 value: 74.117 - type: precision_at_5 value: 62.86000000000001 - type: recall_at_1 value: 24.528 - type: recall_at_10 value: 86.78 - type: recall_at_100 value: 97.198 - type: recall_at_1000 value: 99.227 - type: recall_at_3 value: 54.94799999999999 - type: recall_at_5 value: 72.053 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: map_at_1 value: 52.1 - type: map_at_10 value: 62.502 - type: map_at_100 value: 63.026 - type: map_at_1000 value: 63.04 - type: map_at_3 value: 59.782999999999994 - type: map_at_5 value: 61.443000000000005 - type: mrr_at_1 value: 52.1 - type: mrr_at_10 value: 62.502 - type: mrr_at_100 value: 63.026 - type: mrr_at_1000 value: 63.04 - type: mrr_at_3 value: 59.782999999999994 - type: mrr_at_5 value: 61.443000000000005 - type: ndcg_at_1 value: 52.1 - type: ndcg_at_10 value: 67.75999999999999 - type: ndcg_at_100 value: 70.072 - type: ndcg_at_1000 value: 70.441 - type: ndcg_at_3 value: 62.28 - type: ndcg_at_5 value: 65.25800000000001 - type: precision_at_1 value: 52.1 - type: precision_at_10 value: 8.43 - type: precision_at_100 value: 0.946 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 23.166999999999998 - type: precision_at_5 value: 15.340000000000002 - type: recall_at_1 value: 52.1 - type: recall_at_10 value: 84.3 - type: recall_at_100 value: 94.6 - type: recall_at_1000 value: 97.5 - type: recall_at_3 value: 69.5 - type: recall_at_5 value: 76.7 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 62.805000000000014 - type: f1 value: 56.401757250989384 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 83.734 - type: map_at_10 value: 90.089 - type: map_at_100 value: 90.274 - type: map_at_1000 value: 90.286 - type: map_at_3 value: 89.281 - type: map_at_5 value: 89.774 - type: mrr_at_1 value: 90.039 - type: mrr_at_10 value: 94.218 - type: mrr_at_100 value: 94.24 - type: mrr_at_1000 value: 94.24 - type: mrr_at_3 value: 93.979 - type: mrr_at_5 value: 94.137 - type: ndcg_at_1 value: 90.039 - type: ndcg_at_10 value: 92.597 - type: ndcg_at_100 value: 93.147 - type: ndcg_at_1000 value: 93.325 - type: ndcg_at_3 value: 91.64999999999999 - type: ndcg_at_5 value: 92.137 - type: precision_at_1 value: 90.039 - type: precision_at_10 value: 10.809000000000001 - type: precision_at_100 value: 1.133 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 34.338 - type: precision_at_5 value: 21.089 - type: recall_at_1 value: 83.734 - type: recall_at_10 value: 96.161 - type: recall_at_100 value: 98.137 - type: recall_at_1000 value: 99.182 - type: recall_at_3 value: 93.551 - type: recall_at_5 value: 94.878 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.529999999999998 - type: map_at_10 value: 37.229 - type: map_at_100 value: 39.333 - type: map_at_1000 value: 39.491 - type: map_at_3 value: 32.177 - type: map_at_5 value: 35.077999999999996 - type: mrr_at_1 value: 45.678999999999995 - type: mrr_at_10 value: 53.952 - type: mrr_at_100 value: 54.727000000000004 - type: mrr_at_1000 value: 54.761 - type: mrr_at_3 value: 51.568999999999996 - type: mrr_at_5 value: 52.973000000000006 - type: ndcg_at_1 value: 45.678999999999995 - type: ndcg_at_10 value: 45.297 - type: ndcg_at_100 value: 52.516 - type: ndcg_at_1000 value: 55.16 - type: ndcg_at_3 value: 40.569 - type: ndcg_at_5 value: 42.49 - type: precision_at_1 value: 45.678999999999995 - type: precision_at_10 value: 12.269 - type: precision_at_100 value: 1.9709999999999999 - type: precision_at_1000 value: 0.244 - type: precision_at_3 value: 25.72 - type: precision_at_5 value: 19.66 - type: recall_at_1 value: 24.529999999999998 - type: recall_at_10 value: 51.983999999999995 - type: recall_at_100 value: 78.217 - type: recall_at_1000 value: 94.104 - type: recall_at_3 value: 36.449999999999996 - type: recall_at_5 value: 43.336999999999996 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 41.519 - type: map_at_10 value: 64.705 - type: map_at_100 value: 65.554 - type: map_at_1000 value: 65.613 - type: map_at_3 value: 61.478 - type: map_at_5 value: 63.55800000000001 - type: mrr_at_1 value: 83.038 - type: mrr_at_10 value: 87.82900000000001 - type: mrr_at_100 value: 87.96000000000001 - type: mrr_at_1000 value: 87.96300000000001 - type: mrr_at_3 value: 87.047 - type: mrr_at_5 value: 87.546 - type: ndcg_at_1 value: 83.038 - type: ndcg_at_10 value: 72.928 - type: ndcg_at_100 value: 75.778 - type: ndcg_at_1000 value: 76.866 - type: ndcg_at_3 value: 68.46600000000001 - type: ndcg_at_5 value: 71.036 - type: precision_at_1 value: 83.038 - type: precision_at_10 value: 15.040999999999999 - type: precision_at_100 value: 1.7260000000000002 - type: precision_at_1000 value: 0.187 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.188999999999997 - type: recall_at_1 value: 41.519 - type: recall_at_10 value: 75.20599999999999 - type: recall_at_100 value: 86.3 - type: recall_at_1000 value: 93.437 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.473 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 52.04309349749903 - type: f1 value: 39.91893257315586 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 96.0428 - type: ap value: 94.48278082595033 - type: f1 value: 96.0409595432081 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 85.60975609756099 - type: ap value: 54.30148799475452 - type: f1 value: 80.55899583002706 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_pearson value: 66.44418108776416 - type: cos_sim_spearman value: 72.79912770347306 - type: euclidean_pearson value: 71.11194894579198 - type: euclidean_spearman value: 72.79912104971427 - type: manhattan_pearson value: 70.96800061808604 - type: manhattan_spearman value: 72.63525186107175 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 27.9616280919871 - type: mrr value: 26.544047619047618 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: map_at_1 value: 68.32300000000001 - type: map_at_10 value: 77.187 - type: map_at_100 value: 77.496 - type: map_at_1000 value: 77.503 - type: map_at_3 value: 75.405 - type: map_at_5 value: 76.539 - type: mrr_at_1 value: 70.616 - type: mrr_at_10 value: 77.703 - type: mrr_at_100 value: 77.97699999999999 - type: mrr_at_1000 value: 77.984 - type: mrr_at_3 value: 76.139 - type: mrr_at_5 value: 77.125 - type: ndcg_at_1 value: 70.616 - type: ndcg_at_10 value: 80.741 - type: ndcg_at_100 value: 82.123 - type: ndcg_at_1000 value: 82.32300000000001 - type: ndcg_at_3 value: 77.35600000000001 - type: ndcg_at_5 value: 79.274 - type: precision_at_1 value: 70.616 - type: precision_at_10 value: 9.696 - type: precision_at_100 value: 1.038 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 29.026000000000003 - type: precision_at_5 value: 18.433 - type: recall_at_1 value: 68.32300000000001 - type: recall_at_10 value: 91.186 - type: recall_at_100 value: 97.439 - type: recall_at_1000 value: 99.004 - type: recall_at_3 value: 82.218 - type: recall_at_5 value: 86.797 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 21.496000000000002 - type: map_at_10 value: 33.82 - type: map_at_100 value: 35.013 - type: map_at_1000 value: 35.063 - type: map_at_3 value: 29.910999999999998 - type: map_at_5 value: 32.086 - type: mrr_at_1 value: 22.092 - type: mrr_at_10 value: 34.404 - type: mrr_at_100 value: 35.534 - type: mrr_at_1000 value: 35.577999999999996 - type: mrr_at_3 value: 30.544 - type: mrr_at_5 value: 32.711 - type: ndcg_at_1 value: 22.092 - type: ndcg_at_10 value: 40.877 - type: ndcg_at_100 value: 46.619 - type: ndcg_at_1000 value: 47.823 - type: ndcg_at_3 value: 32.861000000000004 - type: ndcg_at_5 value: 36.769 - type: precision_at_1 value: 22.092 - type: precision_at_10 value: 6.54 - type: precision_at_100 value: 0.943 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 14.069 - type: precision_at_5 value: 10.424 - type: recall_at_1 value: 21.496000000000002 - type: recall_at_10 value: 62.67 - type: recall_at_100 value: 89.24499999999999 - type: recall_at_1000 value: 98.312 - type: recall_at_3 value: 40.796 - type: recall_at_5 value: 50.21600000000001 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.74555403556772 - type: f1 value: 95.61381879323093 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 85.82763337893297 - type: f1 value: 63.17139719465236 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.51714862138535 - type: f1 value: 76.3995118440293 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.78143913920646 - type: f1 value: 72.6141122227626 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.03698722259583 - type: f1 value: 79.36511484240766 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.98722259583053 - type: f1 value: 76.5974920207624 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: map_at_1 value: 51.800000000000004 - type: map_at_10 value: 57.938 - type: map_at_100 value: 58.494 - type: map_at_1000 value: 58.541 - type: map_at_3 value: 56.617 - type: map_at_5 value: 57.302 - type: mrr_at_1 value: 51.800000000000004 - type: mrr_at_10 value: 57.938 - type: mrr_at_100 value: 58.494 - type: mrr_at_1000 value: 58.541 - type: mrr_at_3 value: 56.617 - type: mrr_at_5 value: 57.302 - type: ndcg_at_1 value: 51.800000000000004 - type: ndcg_at_10 value: 60.891 - type: ndcg_at_100 value: 63.897000000000006 - type: ndcg_at_1000 value: 65.231 - type: ndcg_at_3 value: 58.108000000000004 - type: ndcg_at_5 value: 59.343 - type: precision_at_1 value: 51.800000000000004 - type: precision_at_10 value: 7.02 - type: precision_at_100 value: 0.8500000000000001 - type: precision_at_1000 value: 0.096 - type: precision_at_3 value: 20.8 - type: precision_at_5 value: 13.08 - type: recall_at_1 value: 51.800000000000004 - type: recall_at_10 value: 70.19999999999999 - type: recall_at_100 value: 85.0 - type: recall_at_1000 value: 95.7 - type: recall_at_3 value: 62.4 - type: recall_at_5 value: 65.4 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 38.68901889835701 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 38.0740589898848 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 33.41312482460189 - type: mrr value: 34.713530863302495 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 80.39333333333335 - type: f1 value: 80.42683132366277 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.232 - type: map_at_10 value: 13.442000000000002 - type: map_at_100 value: 17.443 - type: map_at_1000 value: 19.1 - type: map_at_3 value: 9.794 - type: map_at_5 value: 11.375 - type: mrr_at_1 value: 50.15500000000001 - type: mrr_at_10 value: 58.628 - type: mrr_at_100 value: 59.077 - type: mrr_at_1000 value: 59.119 - type: mrr_at_3 value: 56.914 - type: mrr_at_5 value: 57.921 - type: ndcg_at_1 value: 48.762 - type: ndcg_at_10 value: 37.203 - type: ndcg_at_100 value: 34.556 - type: ndcg_at_1000 value: 43.601 - type: ndcg_at_3 value: 43.004 - type: ndcg_at_5 value: 40.181 - type: precision_at_1 value: 50.15500000000001 - type: precision_at_10 value: 27.276 - type: precision_at_100 value: 8.981 - type: precision_at_1000 value: 2.228 - type: precision_at_3 value: 39.628 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.232 - type: recall_at_10 value: 18.137 - type: recall_at_100 value: 36.101 - type: recall_at_1000 value: 68.733 - type: recall_at_3 value: 10.978 - type: recall_at_5 value: 13.718 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 35.545 - type: map_at_10 value: 52.083 - type: map_at_100 value: 52.954 - type: map_at_1000 value: 52.96999999999999 - type: map_at_3 value: 47.508 - type: map_at_5 value: 50.265 - type: mrr_at_1 value: 40.122 - type: mrr_at_10 value: 54.567 - type: mrr_at_100 value: 55.19199999999999 - type: mrr_at_1000 value: 55.204 - type: mrr_at_3 value: 51.043000000000006 - type: mrr_at_5 value: 53.233 - type: ndcg_at_1 value: 40.122 - type: ndcg_at_10 value: 60.012 - type: ndcg_at_100 value: 63.562 - type: ndcg_at_1000 value: 63.94 - type: ndcg_at_3 value: 51.681 - type: ndcg_at_5 value: 56.154 - type: precision_at_1 value: 40.122 - type: precision_at_10 value: 9.774 - type: precision_at_100 value: 1.176 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 23.426 - type: precision_at_5 value: 16.686 - type: recall_at_1 value: 35.545 - type: recall_at_10 value: 81.557 - type: recall_at_100 value: 96.729 - type: recall_at_1000 value: 99.541 - type: recall_at_3 value: 60.185 - type: recall_at_5 value: 70.411 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_accuracy value: 70.7634001082837 - type: cos_sim_ap value: 74.97527385556558 - type: cos_sim_f1 value: 72.77277277277277 - type: cos_sim_precision value: 69.17221693625119 - type: cos_sim_recall value: 76.76874340021119 - type: dot_accuracy value: 70.7634001082837 - type: dot_ap value: 74.97527385556558 - type: dot_f1 value: 72.77277277277277 - type: dot_precision value: 69.17221693625119 - type: dot_recall value: 76.76874340021119 - type: euclidean_accuracy value: 70.7634001082837 - type: euclidean_ap value: 74.97527385556558 - type: euclidean_f1 value: 72.77277277277277 - type: euclidean_precision value: 69.17221693625119 - type: euclidean_recall value: 76.76874340021119 - type: manhattan_accuracy value: 69.89713048186248 - type: manhattan_ap value: 74.25943370061067 - type: manhattan_f1 value: 72.17268887846082 - type: manhattan_precision value: 64.94932432432432 - type: manhattan_recall value: 81.20380147835269 - type: max_accuracy value: 70.7634001082837 - type: max_ap value: 74.97527385556558 - type: max_f1 value: 72.77277277277277 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 92.92000000000002 - type: ap value: 91.98475625106201 - type: f1 value: 92.91841470541901 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_pearson value: 41.23764415526825 - type: cos_sim_spearman value: 46.872669471694664 - type: euclidean_pearson value: 46.434144530918566 - type: euclidean_spearman value: 46.872669471694664 - type: manhattan_pearson value: 46.39678126910133 - type: manhattan_spearman value: 46.55877754642116 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_pearson value: 28.77503601696299 - type: cos_sim_spearman value: 31.818095557325606 - type: euclidean_pearson value: 29.811479220397125 - type: euclidean_spearman value: 31.817046821577673 - type: manhattan_pearson value: 29.901628633314214 - type: manhattan_spearman value: 31.991472038092084 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 68.908 - type: map_at_10 value: 83.19 - type: map_at_100 value: 83.842 - type: map_at_1000 value: 83.858 - type: map_at_3 value: 80.167 - type: map_at_5 value: 82.053 - type: mrr_at_1 value: 79.46 - type: mrr_at_10 value: 86.256 - type: mrr_at_100 value: 86.37 - type: mrr_at_1000 value: 86.371 - type: mrr_at_3 value: 85.177 - type: mrr_at_5 value: 85.908 - type: ndcg_at_1 value: 79.5 - type: ndcg_at_10 value: 87.244 - type: ndcg_at_100 value: 88.532 - type: ndcg_at_1000 value: 88.626 - type: ndcg_at_3 value: 84.161 - type: ndcg_at_5 value: 85.835 - type: precision_at_1 value: 79.5 - type: precision_at_10 value: 13.339 - type: precision_at_100 value: 1.53 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 36.97 - type: precision_at_5 value: 24.384 - type: recall_at_1 value: 68.908 - type: recall_at_10 value: 95.179 - type: recall_at_100 value: 99.579 - type: recall_at_1000 value: 99.964 - type: recall_at_3 value: 86.424 - type: recall_at_5 value: 91.065 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 65.17897847862794 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.22194961632586 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.668 - type: map_at_10 value: 13.921 - type: map_at_100 value: 16.391 - type: map_at_1000 value: 16.749 - type: map_at_3 value: 10.001999999999999 - type: map_at_5 value: 11.974 - type: mrr_at_1 value: 27.800000000000004 - type: mrr_at_10 value: 39.290000000000006 - type: mrr_at_100 value: 40.313 - type: mrr_at_1000 value: 40.355999999999995 - type: mrr_at_3 value: 35.667 - type: mrr_at_5 value: 37.742 - type: ndcg_at_1 value: 27.800000000000004 - type: ndcg_at_10 value: 23.172 - type: ndcg_at_100 value: 32.307 - type: ndcg_at_1000 value: 38.048 - type: ndcg_at_3 value: 22.043 - type: ndcg_at_5 value: 19.287000000000003 - type: precision_at_1 value: 27.800000000000004 - type: precision_at_10 value: 11.95 - type: precision_at_100 value: 2.5260000000000002 - type: precision_at_1000 value: 0.38999999999999996 - type: precision_at_3 value: 20.433 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 5.668 - type: recall_at_10 value: 24.22 - type: recall_at_100 value: 51.217 - type: recall_at_1000 value: 79.10000000000001 - type: recall_at_3 value: 12.443 - type: recall_at_5 value: 17.068 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.83535239748218 - type: cos_sim_spearman value: 73.98553311584509 - type: euclidean_pearson value: 79.57336200069007 - type: euclidean_spearman value: 73.98553926018461 - type: manhattan_pearson value: 79.02277757114132 - type: manhattan_spearman value: 73.52350678760683 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 81.99055838690317 - type: cos_sim_spearman value: 72.05290668592296 - type: euclidean_pearson value: 81.7130610313565 - type: euclidean_spearman value: 72.0529066787229 - type: manhattan_pearson value: 82.09213883730894 - type: manhattan_spearman value: 72.5171577483134 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.4685161191763 - type: cos_sim_spearman value: 84.4847436140129 - type: euclidean_pearson value: 84.05016757016948 - type: euclidean_spearman value: 84.48474353891532 - type: manhattan_pearson value: 83.83064062713048 - type: manhattan_spearman value: 84.30431591842805 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.00171021092486 - type: cos_sim_spearman value: 77.91329577609622 - type: euclidean_pearson value: 81.49758593915315 - type: euclidean_spearman value: 77.91329577609622 - type: manhattan_pearson value: 81.23255996803785 - type: manhattan_spearman value: 77.80027024941825 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.62608607472492 - type: cos_sim_spearman value: 87.62293916855751 - type: euclidean_pearson value: 87.04313886714989 - type: euclidean_spearman value: 87.62293907119869 - type: manhattan_pearson value: 86.97266321040769 - type: manhattan_spearman value: 87.61807042381702 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.8012095789289 - type: cos_sim_spearman value: 81.91868918081325 - type: euclidean_pearson value: 81.2267973811213 - type: euclidean_spearman value: 81.91868918081325 - type: manhattan_pearson value: 81.0173457901168 - type: manhattan_spearman value: 81.79743115887055 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.39698537303725 - type: cos_sim_spearman value: 88.78668529808967 - type: euclidean_pearson value: 88.78863351718252 - type: euclidean_spearman value: 88.78668529808967 - type: manhattan_pearson value: 88.41678215762478 - type: manhattan_spearman value: 88.3827998418763 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 68.49024974161408 - type: cos_sim_spearman value: 69.19917146180619 - type: euclidean_pearson value: 70.48882819806336 - type: euclidean_spearman value: 69.19917146180619 - type: manhattan_pearson value: 70.86827961779932 - type: manhattan_spearman value: 69.38456983992613 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 67.41628669863584 - type: cos_sim_spearman value: 67.87238206703478 - type: euclidean_pearson value: 67.67834985311778 - type: euclidean_spearman value: 67.87238206703478 - type: manhattan_pearson value: 68.23423896742973 - type: manhattan_spearman value: 68.27069260687092 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_pearson value: 77.31628954400037 - type: cos_sim_spearman value: 76.83296022489624 - type: euclidean_pearson value: 76.69680425261211 - type: euclidean_spearman value: 76.83287843321102 - type: manhattan_pearson value: 76.65603163327958 - type: manhattan_spearman value: 76.80803503360451 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.31376078795105 - type: cos_sim_spearman value: 83.3985199217591 - type: euclidean_pearson value: 84.06630133719332 - type: euclidean_spearman value: 83.3985199217591 - type: manhattan_pearson value: 83.7896654474364 - type: manhattan_spearman value: 83.1885039212299 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.83161002188668 - type: mrr value: 96.19253114351153 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 48.132999999999996 - type: map_at_10 value: 58.541 - type: map_at_100 value: 59.34 - type: map_at_1000 value: 59.367999999999995 - type: map_at_3 value: 55.191 - type: map_at_5 value: 57.084 - type: mrr_at_1 value: 51.0 - type: mrr_at_10 value: 59.858 - type: mrr_at_100 value: 60.474000000000004 - type: mrr_at_1000 value: 60.501000000000005 - type: mrr_at_3 value: 57.111000000000004 - type: mrr_at_5 value: 58.694 - type: ndcg_at_1 value: 51.0 - type: ndcg_at_10 value: 63.817 - type: ndcg_at_100 value: 67.229 - type: ndcg_at_1000 value: 67.94 - type: ndcg_at_3 value: 57.896 - type: ndcg_at_5 value: 60.785999999999994 - type: precision_at_1 value: 51.0 - type: precision_at_10 value: 8.933 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 23.111 - type: precision_at_5 value: 15.733 - type: recall_at_1 value: 48.132999999999996 - type: recall_at_10 value: 78.922 - type: recall_at_100 value: 94.167 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 62.806 - type: recall_at_5 value: 70.078 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.88415841584158 - type: cos_sim_ap value: 97.72557886493401 - type: cos_sim_f1 value: 94.1294530858003 - type: cos_sim_precision value: 94.46122860020141 - type: cos_sim_recall value: 93.8 - type: dot_accuracy value: 99.88415841584158 - type: dot_ap value: 97.72557439066108 - type: dot_f1 value: 94.1294530858003 - type: dot_precision value: 94.46122860020141 - type: dot_recall value: 93.8 - type: euclidean_accuracy value: 99.88415841584158 - type: euclidean_ap value: 97.72557439066108 - type: euclidean_f1 value: 94.1294530858003 - type: euclidean_precision value: 94.46122860020141 - type: euclidean_recall value: 93.8 - type: manhattan_accuracy value: 99.88514851485148 - type: manhattan_ap value: 97.73324334051959 - type: manhattan_f1 value: 94.1825476429288 - type: manhattan_precision value: 94.46680080482898 - type: manhattan_recall value: 93.89999999999999 - type: max_accuracy value: 99.88514851485148 - type: max_ap value: 97.73324334051959 - type: max_f1 value: 94.1825476429288 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 72.8168026381278 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 44.30948635130784 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.11268548719803 - type: mrr value: 55.08079747050335 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.82885852096243 - type: cos_sim_spearman value: 30.800770979226076 - type: dot_pearson value: 30.82885608827704 - type: dot_spearman value: 30.800770979226076 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 66.73038448968596 - type: mrr value: 77.26510193334836 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: map_at_1 value: 28.157 - type: map_at_10 value: 79.00399999999999 - type: map_at_100 value: 82.51899999999999 - type: map_at_1000 value: 82.577 - type: map_at_3 value: 55.614 - type: map_at_5 value: 68.292 - type: mrr_at_1 value: 91.167 - type: mrr_at_10 value: 93.391 - type: mrr_at_100 value: 93.467 - type: mrr_at_1000 value: 93.47 - type: mrr_at_3 value: 93.001 - type: mrr_at_5 value: 93.254 - type: ndcg_at_1 value: 91.167 - type: ndcg_at_10 value: 86.155 - type: ndcg_at_100 value: 89.425 - type: ndcg_at_1000 value: 89.983 - type: ndcg_at_3 value: 87.516 - type: ndcg_at_5 value: 86.148 - type: precision_at_1 value: 91.167 - type: precision_at_10 value: 42.697 - type: precision_at_100 value: 5.032 - type: precision_at_1000 value: 0.516 - type: precision_at_3 value: 76.45100000000001 - type: precision_at_5 value: 64.051 - type: recall_at_1 value: 28.157 - type: recall_at_10 value: 84.974 - type: recall_at_100 value: 95.759 - type: recall_at_1000 value: 98.583 - type: recall_at_3 value: 57.102 - type: recall_at_5 value: 71.383 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 55.031 - type: f1 value: 53.07992810732314 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.20400000000000001 - type: map_at_10 value: 1.27 - type: map_at_100 value: 7.993 - type: map_at_1000 value: 20.934 - type: map_at_3 value: 0.469 - type: map_at_5 value: 0.716 - type: mrr_at_1 value: 76.0 - type: mrr_at_10 value: 84.967 - type: mrr_at_100 value: 84.967 - type: mrr_at_1000 value: 84.967 - type: mrr_at_3 value: 83.667 - type: mrr_at_5 value: 84.967 - type: ndcg_at_1 value: 69.0 - type: ndcg_at_10 value: 59.243 - type: ndcg_at_100 value: 48.784 - type: ndcg_at_1000 value: 46.966 - type: ndcg_at_3 value: 64.14 - type: ndcg_at_5 value: 61.60600000000001 - type: precision_at_1 value: 76.0 - type: precision_at_10 value: 62.6 - type: precision_at_100 value: 50.18 - type: precision_at_1000 value: 21.026 - type: precision_at_3 value: 68.667 - type: precision_at_5 value: 66.0 - type: recall_at_1 value: 0.20400000000000001 - type: recall_at_10 value: 1.582 - type: recall_at_100 value: 11.988 - type: recall_at_1000 value: 44.994 - type: recall_at_3 value: 0.515 - type: recall_at_5 value: 0.844 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 72.80915114296552 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 70.86374654127641 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.3009999999999997 - type: map_at_10 value: 11.566 - type: map_at_100 value: 17.645 - type: map_at_1000 value: 19.206 - type: map_at_3 value: 6.986000000000001 - type: map_at_5 value: 8.716 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 58.287 - type: mrr_at_100 value: 59.111000000000004 - type: mrr_at_1000 value: 59.111000000000004 - type: mrr_at_3 value: 55.102 - type: mrr_at_5 value: 57.449 - type: ndcg_at_1 value: 39.796 - type: ndcg_at_10 value: 29.059 - type: ndcg_at_100 value: 40.629 - type: ndcg_at_1000 value: 51.446000000000005 - type: ndcg_at_3 value: 36.254999999999995 - type: ndcg_at_5 value: 32.216 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 8.041 - type: precision_at_1000 value: 1.551 - type: precision_at_3 value: 36.735 - type: precision_at_5 value: 30.203999999999997 - type: recall_at_1 value: 3.3009999999999997 - type: recall_at_10 value: 17.267 - type: recall_at_100 value: 49.36 - type: recall_at_1000 value: 83.673 - type: recall_at_3 value: 8.049000000000001 - type: recall_at_5 value: 11.379999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 88.7576 - type: ap value: 35.52110634325751 - type: f1 value: 74.14476947482417 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 73.52009054895304 - type: f1 value: 73.81407409876577 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 54.35358706465052 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 83.65619598259522 - type: cos_sim_ap value: 65.824087818991 - type: cos_sim_f1 value: 61.952620244077536 - type: cos_sim_precision value: 56.676882661996494 - type: cos_sim_recall value: 68.311345646438 - type: dot_accuracy value: 83.65619598259522 - type: dot_ap value: 65.82406256999921 - type: dot_f1 value: 61.952620244077536 - type: dot_precision value: 56.676882661996494 - type: dot_recall value: 68.311345646438 - type: euclidean_accuracy value: 83.65619598259522 - type: euclidean_ap value: 65.82409143427542 - type: euclidean_f1 value: 61.952620244077536 - type: euclidean_precision value: 56.676882661996494 - type: euclidean_recall value: 68.311345646438 - type: manhattan_accuracy value: 83.4296954163438 - type: manhattan_ap value: 65.20662449614932 - type: manhattan_f1 value: 61.352885525070946 - type: manhattan_precision value: 55.59365623660523 - type: manhattan_recall value: 68.44327176781002 - type: max_accuracy value: 83.65619598259522 - type: max_ap value: 65.82409143427542 - type: max_f1 value: 61.952620244077536 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.90119144642372 - type: cos_sim_ap value: 84.04753852793387 - type: cos_sim_f1 value: 76.27737226277372 - type: cos_sim_precision value: 73.86757068667052 - type: cos_sim_recall value: 78.84970742223591 - type: dot_accuracy value: 87.90119144642372 - type: dot_ap value: 84.04753668117337 - type: dot_f1 value: 76.27737226277372 - type: dot_precision value: 73.86757068667052 - type: dot_recall value: 78.84970742223591 - type: euclidean_accuracy value: 87.90119144642372 - type: euclidean_ap value: 84.04754553468206 - type: euclidean_f1 value: 76.27737226277372 - type: euclidean_precision value: 73.86757068667052 - type: euclidean_recall value: 78.84970742223591 - type: manhattan_accuracy value: 87.87014398261343 - type: manhattan_ap value: 84.05164646221583 - type: manhattan_f1 value: 76.31392706820128 - type: manhattan_precision value: 73.91586694566708 - type: manhattan_recall value: 78.87280566676932 - type: max_accuracy value: 87.90119144642372 - type: max_ap value: 84.05164646221583 - type: max_f1 value: 76.31392706820128 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: map_at_1 value: 63.6 - type: map_at_10 value: 72.673 - type: map_at_100 value: 73.05199999999999 - type: map_at_1000 value: 73.057 - type: map_at_3 value: 70.833 - type: map_at_5 value: 72.05799999999999 - type: mrr_at_1 value: 63.6 - type: mrr_at_10 value: 72.673 - type: mrr_at_100 value: 73.05199999999999 - type: mrr_at_1000 value: 73.057 - type: mrr_at_3 value: 70.833 - type: mrr_at_5 value: 72.05799999999999 - type: ndcg_at_1 value: 63.6 - type: ndcg_at_10 value: 76.776 - type: ndcg_at_100 value: 78.52900000000001 - type: ndcg_at_1000 value: 78.696 - type: ndcg_at_3 value: 73.093 - type: ndcg_at_5 value: 75.288 - type: precision_at_1 value: 63.6 - type: precision_at_10 value: 8.95 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 26.533 - type: precision_at_5 value: 16.98 - type: recall_at_1 value: 63.6 - type: recall_at_10 value: 89.5 - type: recall_at_100 value: 97.5 - type: recall_at_1000 value: 98.9 - type: recall_at_3 value: 79.60000000000001 - type: recall_at_5 value: 84.89999999999999 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 89.39999999999999 - type: ap value: 75.52087544076016 - type: f1 value: 87.7629629899278 --- <p align=\"center\"> <img src=\"images/gme_logo.png\" alt=\"GME Logo\" style=\"width: 100%; max-width: 450px;\"> </p> <p align=\"center\"><b>GME: General Multimodal Embedding</b></p> ## GME-Qwen2-VL-2B We are excited to present series of unified **multimodal embedding models**, which are based on the advanced Qwen2-VL multimodal large language models (MLLMs). The models support three types of input: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance. **Key Enhancements of GME Models**: - **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches. - **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**). - **Dynamic Image Resolution**: Benefiting from and our training data, GME models support dynamic resolution image input. - **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots. This capability is particularly beneficial for complex document understanding scenarios, such as multimodal retrieval-augmented generation (RAG) applications focused on academic papers. **Developed by**: Tongyi Lab, Alibaba Group **Paper**: GME: Improving Universal Multimodal Retrieval by Multimodal LLMs ## Model List | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| MTEB-zh | UMRB | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( | 2.21B | 32768 | 1536 | 65.27 | 66.92 | 64.45 | |[]( | 8.29B | 32768 | 3584 | 67.48 | 69.73 | 67.44 | ## Usage **Use with custom code** ## Evaluation We validated the performance on our universal multimodal retrieval benchmark (**UMRB**) among others. | | | Single-modal | | Cross-modal | | | Fused-modal | | | | Avg. | |--------------------|------|:------------:|:---------:|:-----------:|:-----------:|:---------:|:-----------:|:----------:|:----------:|:-----------:|:----------:| | | | T→T (16) | I→I (1) | T→I (4) | T→VD (10) | I→T (4) | T→IT (2) | IT→T (5) | IT→I (2) | IT→IT (3) | (47) | | VISTA | 0.2B | 55.15 | **31.98** | 32.88 | 10.12 | 31.23 | 45.81 | 53.32 | 8.97 | 26.26 | 37.32 | | CLIP-SF | 0.4B | 39.75 | 31.42 | 59.05 | 24.09 | 62.95 | 66.41 | 53.32 | 34.9 | 55.65 | 43.66 | | One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.01 | | DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.04 | | E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.52 | | **GME-Qwen2-VL-2B** | 2.2B | 55.93 | 29.86 | 57.36 | 87.84 | 61.93 | 76.47 | 64.58 | 37.02 | 66.47 | 64.45 | | **GME-Qwen2-VL-7B** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | **65.83** | **80.94** | **66.18** | **42.56** | **73.62** | **67.44** | The MTEB Leaderboard English tab shows the text embeddings performence of our model. **More detailed experimental results can be found in the paper**. ## Community support ### Fine-tuning GME models can be fine-tuned by SWIFT: ## Limitations - **Single Image Input**: In , an image could be converted into a very large number of visual tokens. We limit the number of visual tokens to 1024 to obtain a good training efficiency. Due to the lack of relevant data, our models and evaluations retain one single image. - **English-only Training**: Our models are trained on english data only. Although the models are multilingual, the multilingual-multimodal embedding performance are not guaranteed. We will extend to multi-image input, image-text interleaved data as well as multilingual data in the future version. ## Redistribution and Use We encourage and value diverse applications of GME models and continuous enhancements to the models themselves. - If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display on your website, user interface, blog post, About page, or product documentation. - If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with . ## Cloud API Services In addition to the open-source GME series models, GME series models are also available as commercial API services on Alibaba Cloud. - MultiModal Embedding Models: The model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Hiring We have open positions for Research Interns and Full-Time Researchers to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of Beijing and Hangzhou, offering a collaborative and dynamic work environment where you can contribute to cutting-edge advancements in artificial intelligence and machine learning. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, please consider cite:",
27
- "model_explanation_gemini": "Performs multilingual (English and Chinese) text and vision-language tasks including sentence similarity, classification, clustering, retrieval, and reranking, based on the Qwen2-VL-2B-Instruct architecture. \n\n**Features**: \n- Multilingual (English, Chinese) \n- Tasks: Sentence similarity, classification, clustering, retrieval, reranking \n- Base model: Qwen/Qwen2-VL-2B-Instruct \n- License: Apache-2.0 \n\n**Comparison**:"
 
 
 
 
 
 
28
  }
 
24
  "region:us"
25
  ],
26
  "description": "--- license: apache-2.0 base_model: - Qwen/Qwen2-VL-2B-Instruct language: - en - zh tags: - mteb - sentence-transformers - transformers - Qwen2-VL - sentence-similarity - vidore model-index: - name: external results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_pearson value: 61.03190209456061 - type: cos_sim_spearman value: 67.54853383020948 - type: euclidean_pearson value: 65.38958681599493 - type: euclidean_spearman value: 67.54853383020948 - type: manhattan_pearson value: 65.25341659273157 - type: manhattan_spearman value: 67.34190190683134 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_pearson value: 50.83794357648487 - type: cos_sim_spearman value: 54.03230997664373 - type: euclidean_pearson value: 55.2072028123375 - type: euclidean_spearman value: 54.032311102613264 - type: manhattan_pearson value: 55.05163232251946 - type: manhattan_spearman value: 53.81272176804127 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 72.55223880597015 - type: ap value: 35.01515316721116 - type: f1 value: 66.44086070814382 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 96.75819999999999 - type: ap value: 95.51009242092881 - type: f1 value: 96.75713119357414 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 61.971999999999994 - type: f1 value: 60.50745575187704 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.49 - type: f1 value: 51.576550662258434 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 36.272999999999996 - type: map_at_10 value: 52.782 - type: map_at_100 value: 53.339999999999996 - type: map_at_1000 value: 53.342999999999996 - type: map_at_3 value: 48.4 - type: map_at_5 value: 50.882000000000005 - type: mrr_at_1 value: 36.984 - type: mrr_at_10 value: 53.052 - type: mrr_at_100 value: 53.604 - type: mrr_at_1000 value: 53.607000000000006 - type: mrr_at_3 value: 48.613 - type: mrr_at_5 value: 51.159 - type: ndcg_at_1 value: 36.272999999999996 - type: ndcg_at_10 value: 61.524 - type: ndcg_at_100 value: 63.796 - type: ndcg_at_1000 value: 63.869 - type: ndcg_at_3 value: 52.456 - type: ndcg_at_5 value: 56.964000000000006 - type: precision_at_1 value: 36.272999999999996 - type: precision_at_10 value: 8.926 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.407999999999998 - type: precision_at_5 value: 15.049999999999999 - type: recall_at_1 value: 36.272999999999996 - type: recall_at_10 value: 89.25999999999999 - type: recall_at_100 value: 98.933 - type: recall_at_1000 value: 99.502 - type: recall_at_3 value: 64.225 - type: recall_at_5 value: 75.249 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 52.45236368396085 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 46.83781937870832 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.653430349851746 - type: mrr value: 74.28736314470387 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.18568151905953 - type: cos_sim_spearman value: 86.47666922475281 - type: euclidean_pearson value: 87.25416218056225 - type: euclidean_spearman value: 86.47666922475281 - type: manhattan_pearson value: 87.04960508086356 - type: manhattan_spearman value: 86.73992823533615 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_pearson value: 75.7464284612374 - type: cos_sim_spearman value: 77.71894224189296 - type: euclidean_pearson value: 77.63454068918787 - type: euclidean_spearman value: 77.71894224189296 - type: manhattan_pearson value: 77.58744810404339 - type: manhattan_spearman value: 77.63293552726073 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 80.2435064935065 - type: f1 value: 79.44078343737895 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 44.68220155432257 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 40.666150477589284 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 44.23533333311907 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 43.01114481307774 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.4349853821696 - type: mrr value: 88.80150793650795 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.56417400982208 - type: mrr value: 89.85813492063491 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.623 - type: map_at_10 value: 40.482 - type: map_at_100 value: 41.997 - type: map_at_1000 value: 42.135 - type: map_at_3 value: 37.754 - type: map_at_5 value: 39.031 - type: mrr_at_1 value: 37.482 - type: mrr_at_10 value: 46.311 - type: mrr_at_100 value: 47.211999999999996 - type: mrr_at_1000 value: 47.27 - type: mrr_at_3 value: 44.157999999999994 - type: mrr_at_5 value: 45.145 - type: ndcg_at_1 value: 37.482 - type: ndcg_at_10 value: 46.142 - type: ndcg_at_100 value: 51.834 - type: ndcg_at_1000 value: 54.164 - type: ndcg_at_3 value: 42.309000000000005 - type: ndcg_at_5 value: 43.485 - type: precision_at_1 value: 37.482 - type: precision_at_10 value: 8.455 - type: precision_at_100 value: 1.3780000000000001 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 20.172 - type: precision_at_5 value: 13.705 - type: recall_at_1 value: 30.623 - type: recall_at_10 value: 56.77100000000001 - type: recall_at_100 value: 80.034 - type: recall_at_1000 value: 94.62899999999999 - type: recall_at_3 value: 44.663000000000004 - type: recall_at_5 value: 48.692 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 27.941 - type: map_at_10 value: 38.437 - type: map_at_100 value: 39.625 - type: map_at_1000 value: 39.753 - type: map_at_3 value: 35.388999999999996 - type: map_at_5 value: 37.113 - type: mrr_at_1 value: 34.522000000000006 - type: mrr_at_10 value: 43.864999999999995 - type: mrr_at_100 value: 44.533 - type: mrr_at_1000 value: 44.580999999999996 - type: mrr_at_3 value: 41.55 - type: mrr_at_5 value: 42.942 - type: ndcg_at_1 value: 34.522000000000006 - type: ndcg_at_10 value: 44.330000000000005 - type: ndcg_at_100 value: 48.61 - type: ndcg_at_1000 value: 50.712999999999994 - type: ndcg_at_3 value: 39.834 - type: ndcg_at_5 value: 42.016 - type: precision_at_1 value: 34.522000000000006 - type: precision_at_10 value: 8.471 - type: precision_at_100 value: 1.3379999999999999 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 19.363 - type: precision_at_5 value: 13.898 - type: recall_at_1 value: 27.941 - type: recall_at_10 value: 55.336 - type: recall_at_100 value: 73.51100000000001 - type: recall_at_1000 value: 86.636 - type: recall_at_3 value: 42.54 - type: recall_at_5 value: 48.392 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 32.681 - type: map_at_10 value: 45.48 - type: map_at_100 value: 46.542 - type: map_at_1000 value: 46.604 - type: map_at_3 value: 42.076 - type: map_at_5 value: 44.076 - type: mrr_at_1 value: 37.492 - type: mrr_at_10 value: 48.746 - type: mrr_at_100 value: 49.485 - type: mrr_at_1000 value: 49.517 - type: mrr_at_3 value: 45.998 - type: mrr_at_5 value: 47.681000000000004 - type: ndcg_at_1 value: 37.492 - type: ndcg_at_10 value: 51.778999999999996 - type: ndcg_at_100 value: 56.294 - type: ndcg_at_1000 value: 57.58 - type: ndcg_at_3 value: 45.856 - type: ndcg_at_5 value: 48.968 - type: precision_at_1 value: 37.492 - type: precision_at_10 value: 8.620999999999999 - type: precision_at_100 value: 1.189 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 20.773 - type: precision_at_5 value: 14.596 - type: recall_at_1 value: 32.681 - type: recall_at_10 value: 67.196 - type: recall_at_100 value: 87.027 - type: recall_at_1000 value: 96.146 - type: recall_at_3 value: 51.565000000000005 - type: recall_at_5 value: 59.123999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 22.421 - type: map_at_10 value: 30.127 - type: map_at_100 value: 31.253999999999998 - type: map_at_1000 value: 31.344 - type: map_at_3 value: 27.673 - type: map_at_5 value: 29.182000000000002 - type: mrr_at_1 value: 24.068 - type: mrr_at_10 value: 31.857000000000003 - type: mrr_at_100 value: 32.808 - type: mrr_at_1000 value: 32.881 - type: mrr_at_3 value: 29.397000000000002 - type: mrr_at_5 value: 30.883 - type: ndcg_at_1 value: 24.068 - type: ndcg_at_10 value: 34.642 - type: ndcg_at_100 value: 40.327 - type: ndcg_at_1000 value: 42.55 - type: ndcg_at_3 value: 29.868 - type: ndcg_at_5 value: 32.461 - type: precision_at_1 value: 24.068 - type: precision_at_10 value: 5.390000000000001 - type: precision_at_100 value: 0.873 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 12.692999999999998 - type: precision_at_5 value: 9.107 - type: recall_at_1 value: 22.421 - type: recall_at_10 value: 46.846 - type: recall_at_100 value: 73.409 - type: recall_at_1000 value: 90.06 - type: recall_at_3 value: 34.198 - type: recall_at_5 value: 40.437 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 16.494 - type: map_at_10 value: 24.4 - type: map_at_100 value: 25.718999999999998 - type: map_at_1000 value: 25.840000000000003 - type: map_at_3 value: 21.731 - type: map_at_5 value: 23.247999999999998 - type: mrr_at_1 value: 20.274 - type: mrr_at_10 value: 28.866000000000003 - type: mrr_at_100 value: 29.889 - type: mrr_at_1000 value: 29.957 - type: mrr_at_3 value: 26.284999999999997 - type: mrr_at_5 value: 27.79 - type: ndcg_at_1 value: 20.274 - type: ndcg_at_10 value: 29.666999999999998 - type: ndcg_at_100 value: 36.095 - type: ndcg_at_1000 value: 38.87 - type: ndcg_at_3 value: 24.672 - type: ndcg_at_5 value: 27.106 - type: precision_at_1 value: 20.274 - type: precision_at_10 value: 5.5969999999999995 - type: precision_at_100 value: 1.04 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.023 - type: precision_at_5 value: 8.98 - type: recall_at_1 value: 16.494 - type: recall_at_10 value: 41.400999999999996 - type: recall_at_100 value: 69.811 - type: recall_at_1000 value: 89.422 - type: recall_at_3 value: 27.834999999999997 - type: recall_at_5 value: 33.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 26.150000000000002 - type: map_at_10 value: 36.012 - type: map_at_100 value: 37.377 - type: map_at_1000 value: 37.497 - type: map_at_3 value: 32.712 - type: map_at_5 value: 34.475 - type: mrr_at_1 value: 32.05 - type: mrr_at_10 value: 41.556 - type: mrr_at_100 value: 42.451 - type: mrr_at_1000 value: 42.498000000000005 - type: mrr_at_3 value: 38.659 - type: mrr_at_5 value: 40.314 - type: ndcg_at_1 value: 32.05 - type: ndcg_at_10 value: 42.132 - type: ndcg_at_100 value: 48.028999999999996 - type: ndcg_at_1000 value: 50.229 - type: ndcg_at_3 value: 36.622 - type: ndcg_at_5 value: 39.062000000000005 - type: precision_at_1 value: 32.05 - type: precision_at_10 value: 7.767 - type: precision_at_100 value: 1.269 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.355999999999998 - type: precision_at_5 value: 12.474 - type: recall_at_1 value: 26.150000000000002 - type: recall_at_10 value: 55.205000000000005 - type: recall_at_100 value: 80.2 - type: recall_at_1000 value: 94.524 - type: recall_at_3 value: 39.322 - type: recall_at_5 value: 45.761 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 23.741 - type: map_at_10 value: 33.51 - type: map_at_100 value: 34.882999999999996 - type: map_at_1000 value: 34.995 - type: map_at_3 value: 30.514000000000003 - type: map_at_5 value: 32.085 - type: mrr_at_1 value: 28.653000000000002 - type: mrr_at_10 value: 38.059 - type: mrr_at_100 value: 39.050000000000004 - type: mrr_at_1000 value: 39.107 - type: mrr_at_3 value: 35.445 - type: mrr_at_5 value: 36.849 - type: ndcg_at_1 value: 28.653000000000002 - type: ndcg_at_10 value: 39.186 - type: ndcg_at_100 value: 45.301 - type: ndcg_at_1000 value: 47.547 - type: ndcg_at_3 value: 34.103 - type: ndcg_at_5 value: 36.239 - type: precision_at_1 value: 28.653000000000002 - type: precision_at_10 value: 7.295 - type: precision_at_100 value: 1.2189999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 16.438 - type: precision_at_5 value: 11.804 - type: recall_at_1 value: 23.741 - type: recall_at_10 value: 51.675000000000004 - type: recall_at_100 value: 78.13799999999999 - type: recall_at_1000 value: 93.12700000000001 - type: recall_at_3 value: 37.033 - type: recall_at_5 value: 42.793 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.281666666666663 - type: map_at_10 value: 34.080666666666666 - type: map_at_100 value: 35.278749999999995 - type: map_at_1000 value: 35.40183333333333 - type: map_at_3 value: 31.45316666666667 - type: map_at_5 value: 32.92716666666667 - type: mrr_at_1 value: 29.78783333333333 - type: mrr_at_10 value: 38.077333333333335 - type: mrr_at_100 value: 38.936499999999995 - type: mrr_at_1000 value: 39.000249999999994 - type: mrr_at_3 value: 35.7735 - type: mrr_at_5 value: 37.07683333333334 - type: ndcg_at_1 value: 29.78783333333333 - type: ndcg_at_10 value: 39.18300000000001 - type: ndcg_at_100 value: 44.444750000000006 - type: ndcg_at_1000 value: 46.90316666666667 - type: ndcg_at_3 value: 34.69308333333333 - type: ndcg_at_5 value: 36.80316666666666 - type: precision_at_1 value: 29.78783333333333 - type: precision_at_10 value: 6.820749999999999 - type: precision_at_100 value: 1.1224166666666666 - type: precision_at_1000 value: 0.1525 - type: precision_at_3 value: 15.936333333333335 - type: precision_at_5 value: 11.282333333333334 - type: recall_at_1 value: 25.281666666666663 - type: recall_at_10 value: 50.282 - type: recall_at_100 value: 73.54558333333334 - type: recall_at_1000 value: 90.64241666666666 - type: recall_at_3 value: 37.800999999999995 - type: recall_at_5 value: 43.223000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 23.452 - type: map_at_10 value: 30.231 - type: map_at_100 value: 31.227 - type: map_at_1000 value: 31.338 - type: map_at_3 value: 28.083000000000002 - type: map_at_5 value: 29.125 - type: mrr_at_1 value: 25.613000000000003 - type: mrr_at_10 value: 32.62 - type: mrr_at_100 value: 33.469 - type: mrr_at_1000 value: 33.554 - type: mrr_at_3 value: 30.368000000000002 - type: mrr_at_5 value: 31.502999999999997 - type: ndcg_at_1 value: 25.613000000000003 - type: ndcg_at_10 value: 34.441 - type: ndcg_at_100 value: 39.253 - type: ndcg_at_1000 value: 42.105 - type: ndcg_at_3 value: 30.183 - type: ndcg_at_5 value: 31.917 - type: precision_at_1 value: 25.613000000000003 - type: precision_at_10 value: 5.367999999999999 - type: precision_at_100 value: 0.848 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 8.773 - type: recall_at_1 value: 23.452 - type: recall_at_10 value: 45.021 - type: recall_at_100 value: 66.563 - type: recall_at_1000 value: 87.713 - type: recall_at_3 value: 33.433 - type: recall_at_5 value: 37.637 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.11 - type: map_at_10 value: 22.832 - type: map_at_100 value: 23.829 - type: map_at_1000 value: 23.959 - type: map_at_3 value: 20.66 - type: map_at_5 value: 21.851000000000003 - type: mrr_at_1 value: 19.408 - type: mrr_at_10 value: 26.354 - type: mrr_at_100 value: 27.237000000000002 - type: mrr_at_1000 value: 27.32 - type: mrr_at_3 value: 24.243000000000002 - type: mrr_at_5 value: 25.430000000000003 - type: ndcg_at_1 value: 19.408 - type: ndcg_at_10 value: 27.239 - type: ndcg_at_100 value: 32.286 - type: ndcg_at_1000 value: 35.498000000000005 - type: ndcg_at_3 value: 23.244 - type: ndcg_at_5 value: 25.080999999999996 - type: precision_at_1 value: 19.408 - type: precision_at_10 value: 4.917 - type: precision_at_100 value: 0.874 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 10.863 - type: precision_at_5 value: 7.887 - type: recall_at_1 value: 16.11 - type: recall_at_10 value: 37.075 - type: recall_at_100 value: 60.251999999999995 - type: recall_at_1000 value: 83.38600000000001 - type: recall_at_3 value: 25.901999999999997 - type: recall_at_5 value: 30.612000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 25.941 - type: map_at_10 value: 33.711999999999996 - type: map_at_100 value: 34.926 - type: map_at_1000 value: 35.05 - type: map_at_3 value: 31.075000000000003 - type: map_at_5 value: 32.611000000000004 - type: mrr_at_1 value: 30.784 - type: mrr_at_10 value: 38.079 - type: mrr_at_100 value: 39.018 - type: mrr_at_1000 value: 39.09 - type: mrr_at_3 value: 35.603 - type: mrr_at_5 value: 36.988 - type: ndcg_at_1 value: 30.784 - type: ndcg_at_10 value: 38.586 - type: ndcg_at_100 value: 44.205 - type: ndcg_at_1000 value: 46.916000000000004 - type: ndcg_at_3 value: 33.899 - type: ndcg_at_5 value: 36.11 - type: precision_at_1 value: 30.784 - type: precision_at_10 value: 6.409 - type: precision_at_100 value: 1.034 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 15.112 - type: precision_at_5 value: 10.728 - type: recall_at_1 value: 25.941 - type: recall_at_10 value: 49.242999999999995 - type: recall_at_100 value: 73.85000000000001 - type: recall_at_1000 value: 92.782 - type: recall_at_3 value: 36.204 - type: recall_at_5 value: 41.908 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 24.401999999999997 - type: map_at_10 value: 33.195 - type: map_at_100 value: 34.699999999999996 - type: map_at_1000 value: 34.946 - type: map_at_3 value: 30.570999999999998 - type: map_at_5 value: 32.0 - type: mrr_at_1 value: 28.656 - type: mrr_at_10 value: 37.039 - type: mrr_at_100 value: 38.049 - type: mrr_at_1000 value: 38.108 - type: mrr_at_3 value: 34.717 - type: mrr_at_5 value: 36.07 - type: ndcg_at_1 value: 28.656 - type: ndcg_at_10 value: 38.557 - type: ndcg_at_100 value: 44.511 - type: ndcg_at_1000 value: 47.346 - type: ndcg_at_3 value: 34.235 - type: ndcg_at_5 value: 36.260999999999996 - type: precision_at_1 value: 28.656 - type: precision_at_10 value: 7.312 - type: precision_at_100 value: 1.451 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 15.942 - type: precision_at_5 value: 11.66 - type: recall_at_1 value: 24.401999999999997 - type: recall_at_10 value: 48.791000000000004 - type: recall_at_100 value: 76.211 - type: recall_at_1000 value: 93.92 - type: recall_at_3 value: 36.975 - type: recall_at_5 value: 42.01 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 19.07 - type: map_at_10 value: 26.608999999999998 - type: map_at_100 value: 27.625 - type: map_at_1000 value: 27.743000000000002 - type: map_at_3 value: 24.532999999999998 - type: map_at_5 value: 25.671 - type: mrr_at_1 value: 20.518 - type: mrr_at_10 value: 28.541 - type: mrr_at_100 value: 29.453000000000003 - type: mrr_at_1000 value: 29.536 - type: mrr_at_3 value: 26.71 - type: mrr_at_5 value: 27.708 - type: ndcg_at_1 value: 20.518 - type: ndcg_at_10 value: 30.855 - type: ndcg_at_100 value: 35.973 - type: ndcg_at_1000 value: 38.827 - type: ndcg_at_3 value: 26.868 - type: ndcg_at_5 value: 28.74 - type: precision_at_1 value: 20.518 - type: precision_at_10 value: 4.843 - type: precision_at_100 value: 0.799 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 11.645 - type: precision_at_5 value: 8.133 - type: recall_at_1 value: 19.07 - type: recall_at_10 value: 41.925000000000004 - type: recall_at_100 value: 65.68 - type: recall_at_1000 value: 86.713 - type: recall_at_3 value: 31.251 - type: recall_at_5 value: 35.653 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 18.762 - type: map_at_10 value: 32.412 - type: map_at_100 value: 34.506 - type: map_at_1000 value: 34.678 - type: map_at_3 value: 27.594 - type: map_at_5 value: 30.128 - type: mrr_at_1 value: 42.345 - type: mrr_at_10 value: 54.443 - type: mrr_at_100 value: 55.05799999999999 - type: mrr_at_1000 value: 55.076 - type: mrr_at_3 value: 51.553000000000004 - type: mrr_at_5 value: 53.269 - type: ndcg_at_1 value: 42.345 - type: ndcg_at_10 value: 42.304 - type: ndcg_at_100 value: 49.425000000000004 - type: ndcg_at_1000 value: 52.123 - type: ndcg_at_3 value: 36.271 - type: ndcg_at_5 value: 38.216 - type: precision_at_1 value: 42.345 - type: precision_at_10 value: 12.808 - type: precision_at_100 value: 2.062 - type: precision_at_1000 value: 0.258 - type: precision_at_3 value: 26.840000000000003 - type: precision_at_5 value: 20.052 - type: recall_at_1 value: 18.762 - type: recall_at_10 value: 47.976 - type: recall_at_100 value: 71.86 - type: recall_at_1000 value: 86.61999999999999 - type: recall_at_3 value: 32.708999999999996 - type: recall_at_5 value: 39.151 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: map_at_1 value: 24.871 - type: map_at_10 value: 37.208999999999996 - type: map_at_100 value: 38.993 - type: map_at_1000 value: 39.122 - type: map_at_3 value: 33.2 - type: map_at_5 value: 35.33 - type: mrr_at_1 value: 37.884 - type: mrr_at_10 value: 46.189 - type: mrr_at_100 value: 47.147 - type: mrr_at_1000 value: 47.195 - type: mrr_at_3 value: 43.728 - type: mrr_at_5 value: 44.994 - type: ndcg_at_1 value: 37.884 - type: ndcg_at_10 value: 43.878 - type: ndcg_at_100 value: 51.002 - type: ndcg_at_1000 value: 53.161 - type: ndcg_at_3 value: 38.729 - type: ndcg_at_5 value: 40.628 - type: precision_at_1 value: 37.884 - type: precision_at_10 value: 9.75 - type: precision_at_100 value: 1.558 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 21.964 - type: precision_at_5 value: 15.719 - type: recall_at_1 value: 24.871 - type: recall_at_10 value: 54.615 - type: recall_at_100 value: 84.276 - type: recall_at_1000 value: 98.578 - type: recall_at_3 value: 38.936 - type: recall_at_5 value: 45.061 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_accuracy value: 76.12748045700542 - type: cos_sim_ap value: 84.47948419710998 - type: cos_sim_f1 value: 77.88108108108108 - type: cos_sim_precision value: 72.43112809169516 - type: cos_sim_recall value: 84.21790974982464 - type: dot_accuracy value: 76.12748045700542 - type: dot_ap value: 84.4933237839786 - type: dot_f1 value: 77.88108108108108 - type: dot_precision value: 72.43112809169516 - type: dot_recall value: 84.21790974982464 - type: euclidean_accuracy value: 76.12748045700542 - type: euclidean_ap value: 84.47947997540409 - type: euclidean_f1 value: 77.88108108108108 - type: euclidean_precision value: 72.43112809169516 - type: euclidean_recall value: 84.21790974982464 - type: manhattan_accuracy value: 75.40589296452195 - type: manhattan_ap value: 83.74383956930585 - type: manhattan_f1 value: 77.0983342289092 - type: manhattan_precision value: 71.34049323786795 - type: manhattan_recall value: 83.86719663315408 - type: max_accuracy value: 76.12748045700542 - type: max_ap value: 84.4933237839786 - type: max_f1 value: 77.88108108108108 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: map_at_1 value: 66.781 - type: map_at_10 value: 74.539 - type: map_at_100 value: 74.914 - type: map_at_1000 value: 74.921 - type: map_at_3 value: 72.734 - type: map_at_5 value: 73.788 - type: mrr_at_1 value: 66.913 - type: mrr_at_10 value: 74.543 - type: mrr_at_100 value: 74.914 - type: mrr_at_1000 value: 74.921 - type: mrr_at_3 value: 72.831 - type: mrr_at_5 value: 73.76899999999999 - type: ndcg_at_1 value: 67.018 - type: ndcg_at_10 value: 78.34299999999999 - type: ndcg_at_100 value: 80.138 - type: ndcg_at_1000 value: 80.322 - type: ndcg_at_3 value: 74.667 - type: ndcg_at_5 value: 76.518 - type: precision_at_1 value: 67.018 - type: precision_at_10 value: 9.115 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 26.906000000000002 - type: precision_at_5 value: 17.092 - type: recall_at_1 value: 66.781 - type: recall_at_10 value: 90.253 - type: recall_at_100 value: 98.52499999999999 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 80.05799999999999 - type: recall_at_5 value: 84.615 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.685 - type: map_at_10 value: 21.65 - type: map_at_100 value: 30.952 - type: map_at_1000 value: 33.049 - type: map_at_3 value: 14.953 - type: map_at_5 value: 17.592 - type: mrr_at_1 value: 72.0 - type: mrr_at_10 value: 78.054 - type: mrr_at_100 value: 78.41900000000001 - type: mrr_at_1000 value: 78.425 - type: mrr_at_3 value: 76.5 - type: mrr_at_5 value: 77.28699999999999 - type: ndcg_at_1 value: 61.25000000000001 - type: ndcg_at_10 value: 46.306000000000004 - type: ndcg_at_100 value: 50.867 - type: ndcg_at_1000 value: 58.533 - type: ndcg_at_3 value: 50.857 - type: ndcg_at_5 value: 48.283 - type: precision_at_1 value: 72.0 - type: precision_at_10 value: 37.3 - type: precision_at_100 value: 11.95 - type: precision_at_1000 value: 2.528 - type: precision_at_3 value: 53.583000000000006 - type: precision_at_5 value: 46.6 - type: recall_at_1 value: 9.685 - type: recall_at_10 value: 27.474999999999998 - type: recall_at_100 value: 56.825 - type: recall_at_1000 value: 81.792 - type: recall_at_3 value: 15.939 - type: recall_at_5 value: 19.853 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: map_at_1 value: 24.528 - type: map_at_10 value: 76.304 - type: map_at_100 value: 79.327 - type: map_at_1000 value: 79.373 - type: map_at_3 value: 52.035 - type: map_at_5 value: 66.074 - type: mrr_at_1 value: 86.05000000000001 - type: mrr_at_10 value: 90.74 - type: mrr_at_100 value: 90.809 - type: mrr_at_1000 value: 90.81099999999999 - type: mrr_at_3 value: 90.30799999999999 - type: mrr_at_5 value: 90.601 - type: ndcg_at_1 value: 86.05000000000001 - type: ndcg_at_10 value: 84.518 - type: ndcg_at_100 value: 87.779 - type: ndcg_at_1000 value: 88.184 - type: ndcg_at_3 value: 82.339 - type: ndcg_at_5 value: 81.613 - type: precision_at_1 value: 86.05000000000001 - type: precision_at_10 value: 40.945 - type: precision_at_100 value: 4.787 - type: precision_at_1000 value: 0.48900000000000005 - type: precision_at_3 value: 74.117 - type: precision_at_5 value: 62.86000000000001 - type: recall_at_1 value: 24.528 - type: recall_at_10 value: 86.78 - type: recall_at_100 value: 97.198 - type: recall_at_1000 value: 99.227 - type: recall_at_3 value: 54.94799999999999 - type: recall_at_5 value: 72.053 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: map_at_1 value: 52.1 - type: map_at_10 value: 62.502 - type: map_at_100 value: 63.026 - type: map_at_1000 value: 63.04 - type: map_at_3 value: 59.782999999999994 - type: map_at_5 value: 61.443000000000005 - type: mrr_at_1 value: 52.1 - type: mrr_at_10 value: 62.502 - type: mrr_at_100 value: 63.026 - type: mrr_at_1000 value: 63.04 - type: mrr_at_3 value: 59.782999999999994 - type: mrr_at_5 value: 61.443000000000005 - type: ndcg_at_1 value: 52.1 - type: ndcg_at_10 value: 67.75999999999999 - type: ndcg_at_100 value: 70.072 - type: ndcg_at_1000 value: 70.441 - type: ndcg_at_3 value: 62.28 - type: ndcg_at_5 value: 65.25800000000001 - type: precision_at_1 value: 52.1 - type: precision_at_10 value: 8.43 - type: precision_at_100 value: 0.946 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 23.166999999999998 - type: precision_at_5 value: 15.340000000000002 - type: recall_at_1 value: 52.1 - type: recall_at_10 value: 84.3 - type: recall_at_100 value: 94.6 - type: recall_at_1000 value: 97.5 - type: recall_at_3 value: 69.5 - type: recall_at_5 value: 76.7 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 62.805000000000014 - type: f1 value: 56.401757250989384 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 83.734 - type: map_at_10 value: 90.089 - type: map_at_100 value: 90.274 - type: map_at_1000 value: 90.286 - type: map_at_3 value: 89.281 - type: map_at_5 value: 89.774 - type: mrr_at_1 value: 90.039 - type: mrr_at_10 value: 94.218 - type: mrr_at_100 value: 94.24 - type: mrr_at_1000 value: 94.24 - type: mrr_at_3 value: 93.979 - type: mrr_at_5 value: 94.137 - type: ndcg_at_1 value: 90.039 - type: ndcg_at_10 value: 92.597 - type: ndcg_at_100 value: 93.147 - type: ndcg_at_1000 value: 93.325 - type: ndcg_at_3 value: 91.64999999999999 - type: ndcg_at_5 value: 92.137 - type: precision_at_1 value: 90.039 - type: precision_at_10 value: 10.809000000000001 - type: precision_at_100 value: 1.133 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 34.338 - type: precision_at_5 value: 21.089 - type: recall_at_1 value: 83.734 - type: recall_at_10 value: 96.161 - type: recall_at_100 value: 98.137 - type: recall_at_1000 value: 99.182 - type: recall_at_3 value: 93.551 - type: recall_at_5 value: 94.878 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.529999999999998 - type: map_at_10 value: 37.229 - type: map_at_100 value: 39.333 - type: map_at_1000 value: 39.491 - type: map_at_3 value: 32.177 - type: map_at_5 value: 35.077999999999996 - type: mrr_at_1 value: 45.678999999999995 - type: mrr_at_10 value: 53.952 - type: mrr_at_100 value: 54.727000000000004 - type: mrr_at_1000 value: 54.761 - type: mrr_at_3 value: 51.568999999999996 - type: mrr_at_5 value: 52.973000000000006 - type: ndcg_at_1 value: 45.678999999999995 - type: ndcg_at_10 value: 45.297 - type: ndcg_at_100 value: 52.516 - type: ndcg_at_1000 value: 55.16 - type: ndcg_at_3 value: 40.569 - type: ndcg_at_5 value: 42.49 - type: precision_at_1 value: 45.678999999999995 - type: precision_at_10 value: 12.269 - type: precision_at_100 value: 1.9709999999999999 - type: precision_at_1000 value: 0.244 - type: precision_at_3 value: 25.72 - type: precision_at_5 value: 19.66 - type: recall_at_1 value: 24.529999999999998 - type: recall_at_10 value: 51.983999999999995 - type: recall_at_100 value: 78.217 - type: recall_at_1000 value: 94.104 - type: recall_at_3 value: 36.449999999999996 - type: recall_at_5 value: 43.336999999999996 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 41.519 - type: map_at_10 value: 64.705 - type: map_at_100 value: 65.554 - type: map_at_1000 value: 65.613 - type: map_at_3 value: 61.478 - type: map_at_5 value: 63.55800000000001 - type: mrr_at_1 value: 83.038 - type: mrr_at_10 value: 87.82900000000001 - type: mrr_at_100 value: 87.96000000000001 - type: mrr_at_1000 value: 87.96300000000001 - type: mrr_at_3 value: 87.047 - type: mrr_at_5 value: 87.546 - type: ndcg_at_1 value: 83.038 - type: ndcg_at_10 value: 72.928 - type: ndcg_at_100 value: 75.778 - type: ndcg_at_1000 value: 76.866 - type: ndcg_at_3 value: 68.46600000000001 - type: ndcg_at_5 value: 71.036 - type: precision_at_1 value: 83.038 - type: precision_at_10 value: 15.040999999999999 - type: precision_at_100 value: 1.7260000000000002 - type: precision_at_1000 value: 0.187 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.188999999999997 - type: recall_at_1 value: 41.519 - type: recall_at_10 value: 75.20599999999999 - type: recall_at_100 value: 86.3 - type: recall_at_1000 value: 93.437 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.473 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 52.04309349749903 - type: f1 value: 39.91893257315586 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 96.0428 - type: ap value: 94.48278082595033 - type: f1 value: 96.0409595432081 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 85.60975609756099 - type: ap value: 54.30148799475452 - type: f1 value: 80.55899583002706 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_pearson value: 66.44418108776416 - type: cos_sim_spearman value: 72.79912770347306 - type: euclidean_pearson value: 71.11194894579198 - type: euclidean_spearman value: 72.79912104971427 - type: manhattan_pearson value: 70.96800061808604 - type: manhattan_spearman value: 72.63525186107175 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 27.9616280919871 - type: mrr value: 26.544047619047618 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: map_at_1 value: 68.32300000000001 - type: map_at_10 value: 77.187 - type: map_at_100 value: 77.496 - type: map_at_1000 value: 77.503 - type: map_at_3 value: 75.405 - type: map_at_5 value: 76.539 - type: mrr_at_1 value: 70.616 - type: mrr_at_10 value: 77.703 - type: mrr_at_100 value: 77.97699999999999 - type: mrr_at_1000 value: 77.984 - type: mrr_at_3 value: 76.139 - type: mrr_at_5 value: 77.125 - type: ndcg_at_1 value: 70.616 - type: ndcg_at_10 value: 80.741 - type: ndcg_at_100 value: 82.123 - type: ndcg_at_1000 value: 82.32300000000001 - type: ndcg_at_3 value: 77.35600000000001 - type: ndcg_at_5 value: 79.274 - type: precision_at_1 value: 70.616 - type: precision_at_10 value: 9.696 - type: precision_at_100 value: 1.038 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 29.026000000000003 - type: precision_at_5 value: 18.433 - type: recall_at_1 value: 68.32300000000001 - type: recall_at_10 value: 91.186 - type: recall_at_100 value: 97.439 - type: recall_at_1000 value: 99.004 - type: recall_at_3 value: 82.218 - type: recall_at_5 value: 86.797 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 21.496000000000002 - type: map_at_10 value: 33.82 - type: map_at_100 value: 35.013 - type: map_at_1000 value: 35.063 - type: map_at_3 value: 29.910999999999998 - type: map_at_5 value: 32.086 - type: mrr_at_1 value: 22.092 - type: mrr_at_10 value: 34.404 - type: mrr_at_100 value: 35.534 - type: mrr_at_1000 value: 35.577999999999996 - type: mrr_at_3 value: 30.544 - type: mrr_at_5 value: 32.711 - type: ndcg_at_1 value: 22.092 - type: ndcg_at_10 value: 40.877 - type: ndcg_at_100 value: 46.619 - type: ndcg_at_1000 value: 47.823 - type: ndcg_at_3 value: 32.861000000000004 - type: ndcg_at_5 value: 36.769 - type: precision_at_1 value: 22.092 - type: precision_at_10 value: 6.54 - type: precision_at_100 value: 0.943 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 14.069 - type: precision_at_5 value: 10.424 - type: recall_at_1 value: 21.496000000000002 - type: recall_at_10 value: 62.67 - type: recall_at_100 value: 89.24499999999999 - type: recall_at_1000 value: 98.312 - type: recall_at_3 value: 40.796 - type: recall_at_5 value: 50.21600000000001 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.74555403556772 - type: f1 value: 95.61381879323093 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 85.82763337893297 - type: f1 value: 63.17139719465236 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.51714862138535 - type: f1 value: 76.3995118440293 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.78143913920646 - type: f1 value: 72.6141122227626 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.03698722259583 - type: f1 value: 79.36511484240766 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.98722259583053 - type: f1 value: 76.5974920207624 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: map_at_1 value: 51.800000000000004 - type: map_at_10 value: 57.938 - type: map_at_100 value: 58.494 - type: map_at_1000 value: 58.541 - type: map_at_3 value: 56.617 - type: map_at_5 value: 57.302 - type: mrr_at_1 value: 51.800000000000004 - type: mrr_at_10 value: 57.938 - type: mrr_at_100 value: 58.494 - type: mrr_at_1000 value: 58.541 - type: mrr_at_3 value: 56.617 - type: mrr_at_5 value: 57.302 - type: ndcg_at_1 value: 51.800000000000004 - type: ndcg_at_10 value: 60.891 - type: ndcg_at_100 value: 63.897000000000006 - type: ndcg_at_1000 value: 65.231 - type: ndcg_at_3 value: 58.108000000000004 - type: ndcg_at_5 value: 59.343 - type: precision_at_1 value: 51.800000000000004 - type: precision_at_10 value: 7.02 - type: precision_at_100 value: 0.8500000000000001 - type: precision_at_1000 value: 0.096 - type: precision_at_3 value: 20.8 - type: precision_at_5 value: 13.08 - type: recall_at_1 value: 51.800000000000004 - type: recall_at_10 value: 70.19999999999999 - type: recall_at_100 value: 85.0 - type: recall_at_1000 value: 95.7 - type: recall_at_3 value: 62.4 - type: recall_at_5 value: 65.4 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 38.68901889835701 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 38.0740589898848 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 33.41312482460189 - type: mrr value: 34.713530863302495 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 80.39333333333335 - type: f1 value: 80.42683132366277 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.232 - type: map_at_10 value: 13.442000000000002 - type: map_at_100 value: 17.443 - type: map_at_1000 value: 19.1 - type: map_at_3 value: 9.794 - type: map_at_5 value: 11.375 - type: mrr_at_1 value: 50.15500000000001 - type: mrr_at_10 value: 58.628 - type: mrr_at_100 value: 59.077 - type: mrr_at_1000 value: 59.119 - type: mrr_at_3 value: 56.914 - type: mrr_at_5 value: 57.921 - type: ndcg_at_1 value: 48.762 - type: ndcg_at_10 value: 37.203 - type: ndcg_at_100 value: 34.556 - type: ndcg_at_1000 value: 43.601 - type: ndcg_at_3 value: 43.004 - type: ndcg_at_5 value: 40.181 - type: precision_at_1 value: 50.15500000000001 - type: precision_at_10 value: 27.276 - type: precision_at_100 value: 8.981 - type: precision_at_1000 value: 2.228 - type: precision_at_3 value: 39.628 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.232 - type: recall_at_10 value: 18.137 - type: recall_at_100 value: 36.101 - type: recall_at_1000 value: 68.733 - type: recall_at_3 value: 10.978 - type: recall_at_5 value: 13.718 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 35.545 - type: map_at_10 value: 52.083 - type: map_at_100 value: 52.954 - type: map_at_1000 value: 52.96999999999999 - type: map_at_3 value: 47.508 - type: map_at_5 value: 50.265 - type: mrr_at_1 value: 40.122 - type: mrr_at_10 value: 54.567 - type: mrr_at_100 value: 55.19199999999999 - type: mrr_at_1000 value: 55.204 - type: mrr_at_3 value: 51.043000000000006 - type: mrr_at_5 value: 53.233 - type: ndcg_at_1 value: 40.122 - type: ndcg_at_10 value: 60.012 - type: ndcg_at_100 value: 63.562 - type: ndcg_at_1000 value: 63.94 - type: ndcg_at_3 value: 51.681 - type: ndcg_at_5 value: 56.154 - type: precision_at_1 value: 40.122 - type: precision_at_10 value: 9.774 - type: precision_at_100 value: 1.176 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 23.426 - type: precision_at_5 value: 16.686 - type: recall_at_1 value: 35.545 - type: recall_at_10 value: 81.557 - type: recall_at_100 value: 96.729 - type: recall_at_1000 value: 99.541 - type: recall_at_3 value: 60.185 - type: recall_at_5 value: 70.411 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_accuracy value: 70.7634001082837 - type: cos_sim_ap value: 74.97527385556558 - type: cos_sim_f1 value: 72.77277277277277 - type: cos_sim_precision value: 69.17221693625119 - type: cos_sim_recall value: 76.76874340021119 - type: dot_accuracy value: 70.7634001082837 - type: dot_ap value: 74.97527385556558 - type: dot_f1 value: 72.77277277277277 - type: dot_precision value: 69.17221693625119 - type: dot_recall value: 76.76874340021119 - type: euclidean_accuracy value: 70.7634001082837 - type: euclidean_ap value: 74.97527385556558 - type: euclidean_f1 value: 72.77277277277277 - type: euclidean_precision value: 69.17221693625119 - type: euclidean_recall value: 76.76874340021119 - type: manhattan_accuracy value: 69.89713048186248 - type: manhattan_ap value: 74.25943370061067 - type: manhattan_f1 value: 72.17268887846082 - type: manhattan_precision value: 64.94932432432432 - type: manhattan_recall value: 81.20380147835269 - type: max_accuracy value: 70.7634001082837 - type: max_ap value: 74.97527385556558 - type: max_f1 value: 72.77277277277277 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 92.92000000000002 - type: ap value: 91.98475625106201 - type: f1 value: 92.91841470541901 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_pearson value: 41.23764415526825 - type: cos_sim_spearman value: 46.872669471694664 - type: euclidean_pearson value: 46.434144530918566 - type: euclidean_spearman value: 46.872669471694664 - type: manhattan_pearson value: 46.39678126910133 - type: manhattan_spearman value: 46.55877754642116 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_pearson value: 28.77503601696299 - type: cos_sim_spearman value: 31.818095557325606 - type: euclidean_pearson value: 29.811479220397125 - type: euclidean_spearman value: 31.817046821577673 - type: manhattan_pearson value: 29.901628633314214 - type: manhattan_spearman value: 31.991472038092084 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 68.908 - type: map_at_10 value: 83.19 - type: map_at_100 value: 83.842 - type: map_at_1000 value: 83.858 - type: map_at_3 value: 80.167 - type: map_at_5 value: 82.053 - type: mrr_at_1 value: 79.46 - type: mrr_at_10 value: 86.256 - type: mrr_at_100 value: 86.37 - type: mrr_at_1000 value: 86.371 - type: mrr_at_3 value: 85.177 - type: mrr_at_5 value: 85.908 - type: ndcg_at_1 value: 79.5 - type: ndcg_at_10 value: 87.244 - type: ndcg_at_100 value: 88.532 - type: ndcg_at_1000 value: 88.626 - type: ndcg_at_3 value: 84.161 - type: ndcg_at_5 value: 85.835 - type: precision_at_1 value: 79.5 - type: precision_at_10 value: 13.339 - type: precision_at_100 value: 1.53 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 36.97 - type: precision_at_5 value: 24.384 - type: recall_at_1 value: 68.908 - type: recall_at_10 value: 95.179 - type: recall_at_100 value: 99.579 - type: recall_at_1000 value: 99.964 - type: recall_at_3 value: 86.424 - type: recall_at_5 value: 91.065 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 65.17897847862794 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.22194961632586 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.668 - type: map_at_10 value: 13.921 - type: map_at_100 value: 16.391 - type: map_at_1000 value: 16.749 - type: map_at_3 value: 10.001999999999999 - type: map_at_5 value: 11.974 - type: mrr_at_1 value: 27.800000000000004 - type: mrr_at_10 value: 39.290000000000006 - type: mrr_at_100 value: 40.313 - type: mrr_at_1000 value: 40.355999999999995 - type: mrr_at_3 value: 35.667 - type: mrr_at_5 value: 37.742 - type: ndcg_at_1 value: 27.800000000000004 - type: ndcg_at_10 value: 23.172 - type: ndcg_at_100 value: 32.307 - type: ndcg_at_1000 value: 38.048 - type: ndcg_at_3 value: 22.043 - type: ndcg_at_5 value: 19.287000000000003 - type: precision_at_1 value: 27.800000000000004 - type: precision_at_10 value: 11.95 - type: precision_at_100 value: 2.5260000000000002 - type: precision_at_1000 value: 0.38999999999999996 - type: precision_at_3 value: 20.433 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 5.668 - type: recall_at_10 value: 24.22 - type: recall_at_100 value: 51.217 - type: recall_at_1000 value: 79.10000000000001 - type: recall_at_3 value: 12.443 - type: recall_at_5 value: 17.068 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.83535239748218 - type: cos_sim_spearman value: 73.98553311584509 - type: euclidean_pearson value: 79.57336200069007 - type: euclidean_spearman value: 73.98553926018461 - type: manhattan_pearson value: 79.02277757114132 - type: manhattan_spearman value: 73.52350678760683 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 81.99055838690317 - type: cos_sim_spearman value: 72.05290668592296 - type: euclidean_pearson value: 81.7130610313565 - type: euclidean_spearman value: 72.0529066787229 - type: manhattan_pearson value: 82.09213883730894 - type: manhattan_spearman value: 72.5171577483134 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.4685161191763 - type: cos_sim_spearman value: 84.4847436140129 - type: euclidean_pearson value: 84.05016757016948 - type: euclidean_spearman value: 84.48474353891532 - type: manhattan_pearson value: 83.83064062713048 - type: manhattan_spearman value: 84.30431591842805 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.00171021092486 - type: cos_sim_spearman value: 77.91329577609622 - type: euclidean_pearson value: 81.49758593915315 - type: euclidean_spearman value: 77.91329577609622 - type: manhattan_pearson value: 81.23255996803785 - type: manhattan_spearman value: 77.80027024941825 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.62608607472492 - type: cos_sim_spearman value: 87.62293916855751 - type: euclidean_pearson value: 87.04313886714989 - type: euclidean_spearman value: 87.62293907119869 - type: manhattan_pearson value: 86.97266321040769 - type: manhattan_spearman value: 87.61807042381702 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.8012095789289 - type: cos_sim_spearman value: 81.91868918081325 - type: euclidean_pearson value: 81.2267973811213 - type: euclidean_spearman value: 81.91868918081325 - type: manhattan_pearson value: 81.0173457901168 - type: manhattan_spearman value: 81.79743115887055 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.39698537303725 - type: cos_sim_spearman value: 88.78668529808967 - type: euclidean_pearson value: 88.78863351718252 - type: euclidean_spearman value: 88.78668529808967 - type: manhattan_pearson value: 88.41678215762478 - type: manhattan_spearman value: 88.3827998418763 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 68.49024974161408 - type: cos_sim_spearman value: 69.19917146180619 - type: euclidean_pearson value: 70.48882819806336 - type: euclidean_spearman value: 69.19917146180619 - type: manhattan_pearson value: 70.86827961779932 - type: manhattan_spearman value: 69.38456983992613 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 67.41628669863584 - type: cos_sim_spearman value: 67.87238206703478 - type: euclidean_pearson value: 67.67834985311778 - type: euclidean_spearman value: 67.87238206703478 - type: manhattan_pearson value: 68.23423896742973 - type: manhattan_spearman value: 68.27069260687092 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_pearson value: 77.31628954400037 - type: cos_sim_spearman value: 76.83296022489624 - type: euclidean_pearson value: 76.69680425261211 - type: euclidean_spearman value: 76.83287843321102 - type: manhattan_pearson value: 76.65603163327958 - type: manhattan_spearman value: 76.80803503360451 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.31376078795105 - type: cos_sim_spearman value: 83.3985199217591 - type: euclidean_pearson value: 84.06630133719332 - type: euclidean_spearman value: 83.3985199217591 - type: manhattan_pearson value: 83.7896654474364 - type: manhattan_spearman value: 83.1885039212299 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.83161002188668 - type: mrr value: 96.19253114351153 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 48.132999999999996 - type: map_at_10 value: 58.541 - type: map_at_100 value: 59.34 - type: map_at_1000 value: 59.367999999999995 - type: map_at_3 value: 55.191 - type: map_at_5 value: 57.084 - type: mrr_at_1 value: 51.0 - type: mrr_at_10 value: 59.858 - type: mrr_at_100 value: 60.474000000000004 - type: mrr_at_1000 value: 60.501000000000005 - type: mrr_at_3 value: 57.111000000000004 - type: mrr_at_5 value: 58.694 - type: ndcg_at_1 value: 51.0 - type: ndcg_at_10 value: 63.817 - type: ndcg_at_100 value: 67.229 - type: ndcg_at_1000 value: 67.94 - type: ndcg_at_3 value: 57.896 - type: ndcg_at_5 value: 60.785999999999994 - type: precision_at_1 value: 51.0 - type: precision_at_10 value: 8.933 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 23.111 - type: precision_at_5 value: 15.733 - type: recall_at_1 value: 48.132999999999996 - type: recall_at_10 value: 78.922 - type: recall_at_100 value: 94.167 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 62.806 - type: recall_at_5 value: 70.078 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.88415841584158 - type: cos_sim_ap value: 97.72557886493401 - type: cos_sim_f1 value: 94.1294530858003 - type: cos_sim_precision value: 94.46122860020141 - type: cos_sim_recall value: 93.8 - type: dot_accuracy value: 99.88415841584158 - type: dot_ap value: 97.72557439066108 - type: dot_f1 value: 94.1294530858003 - type: dot_precision value: 94.46122860020141 - type: dot_recall value: 93.8 - type: euclidean_accuracy value: 99.88415841584158 - type: euclidean_ap value: 97.72557439066108 - type: euclidean_f1 value: 94.1294530858003 - type: euclidean_precision value: 94.46122860020141 - type: euclidean_recall value: 93.8 - type: manhattan_accuracy value: 99.88514851485148 - type: manhattan_ap value: 97.73324334051959 - type: manhattan_f1 value: 94.1825476429288 - type: manhattan_precision value: 94.46680080482898 - type: manhattan_recall value: 93.89999999999999 - type: max_accuracy value: 99.88514851485148 - type: max_ap value: 97.73324334051959 - type: max_f1 value: 94.1825476429288 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 72.8168026381278 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 44.30948635130784 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.11268548719803 - type: mrr value: 55.08079747050335 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.82885852096243 - type: cos_sim_spearman value: 30.800770979226076 - type: dot_pearson value: 30.82885608827704 - type: dot_spearman value: 30.800770979226076 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 66.73038448968596 - type: mrr value: 77.26510193334836 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: map_at_1 value: 28.157 - type: map_at_10 value: 79.00399999999999 - type: map_at_100 value: 82.51899999999999 - type: map_at_1000 value: 82.577 - type: map_at_3 value: 55.614 - type: map_at_5 value: 68.292 - type: mrr_at_1 value: 91.167 - type: mrr_at_10 value: 93.391 - type: mrr_at_100 value: 93.467 - type: mrr_at_1000 value: 93.47 - type: mrr_at_3 value: 93.001 - type: mrr_at_5 value: 93.254 - type: ndcg_at_1 value: 91.167 - type: ndcg_at_10 value: 86.155 - type: ndcg_at_100 value: 89.425 - type: ndcg_at_1000 value: 89.983 - type: ndcg_at_3 value: 87.516 - type: ndcg_at_5 value: 86.148 - type: precision_at_1 value: 91.167 - type: precision_at_10 value: 42.697 - type: precision_at_100 value: 5.032 - type: precision_at_1000 value: 0.516 - type: precision_at_3 value: 76.45100000000001 - type: precision_at_5 value: 64.051 - type: recall_at_1 value: 28.157 - type: recall_at_10 value: 84.974 - type: recall_at_100 value: 95.759 - type: recall_at_1000 value: 98.583 - type: recall_at_3 value: 57.102 - type: recall_at_5 value: 71.383 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 55.031 - type: f1 value: 53.07992810732314 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.20400000000000001 - type: map_at_10 value: 1.27 - type: map_at_100 value: 7.993 - type: map_at_1000 value: 20.934 - type: map_at_3 value: 0.469 - type: map_at_5 value: 0.716 - type: mrr_at_1 value: 76.0 - type: mrr_at_10 value: 84.967 - type: mrr_at_100 value: 84.967 - type: mrr_at_1000 value: 84.967 - type: mrr_at_3 value: 83.667 - type: mrr_at_5 value: 84.967 - type: ndcg_at_1 value: 69.0 - type: ndcg_at_10 value: 59.243 - type: ndcg_at_100 value: 48.784 - type: ndcg_at_1000 value: 46.966 - type: ndcg_at_3 value: 64.14 - type: ndcg_at_5 value: 61.60600000000001 - type: precision_at_1 value: 76.0 - type: precision_at_10 value: 62.6 - type: precision_at_100 value: 50.18 - type: precision_at_1000 value: 21.026 - type: precision_at_3 value: 68.667 - type: precision_at_5 value: 66.0 - type: recall_at_1 value: 0.20400000000000001 - type: recall_at_10 value: 1.582 - type: recall_at_100 value: 11.988 - type: recall_at_1000 value: 44.994 - type: recall_at_3 value: 0.515 - type: recall_at_5 value: 0.844 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 72.80915114296552 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 70.86374654127641 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.3009999999999997 - type: map_at_10 value: 11.566 - type: map_at_100 value: 17.645 - type: map_at_1000 value: 19.206 - type: map_at_3 value: 6.986000000000001 - type: map_at_5 value: 8.716 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 58.287 - type: mrr_at_100 value: 59.111000000000004 - type: mrr_at_1000 value: 59.111000000000004 - type: mrr_at_3 value: 55.102 - type: mrr_at_5 value: 57.449 - type: ndcg_at_1 value: 39.796 - type: ndcg_at_10 value: 29.059 - type: ndcg_at_100 value: 40.629 - type: ndcg_at_1000 value: 51.446000000000005 - type: ndcg_at_3 value: 36.254999999999995 - type: ndcg_at_5 value: 32.216 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 8.041 - type: precision_at_1000 value: 1.551 - type: precision_at_3 value: 36.735 - type: precision_at_5 value: 30.203999999999997 - type: recall_at_1 value: 3.3009999999999997 - type: recall_at_10 value: 17.267 - type: recall_at_100 value: 49.36 - type: recall_at_1000 value: 83.673 - type: recall_at_3 value: 8.049000000000001 - type: recall_at_5 value: 11.379999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 88.7576 - type: ap value: 35.52110634325751 - type: f1 value: 74.14476947482417 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 73.52009054895304 - type: f1 value: 73.81407409876577 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 54.35358706465052 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 83.65619598259522 - type: cos_sim_ap value: 65.824087818991 - type: cos_sim_f1 value: 61.952620244077536 - type: cos_sim_precision value: 56.676882661996494 - type: cos_sim_recall value: 68.311345646438 - type: dot_accuracy value: 83.65619598259522 - type: dot_ap value: 65.82406256999921 - type: dot_f1 value: 61.952620244077536 - type: dot_precision value: 56.676882661996494 - type: dot_recall value: 68.311345646438 - type: euclidean_accuracy value: 83.65619598259522 - type: euclidean_ap value: 65.82409143427542 - type: euclidean_f1 value: 61.952620244077536 - type: euclidean_precision value: 56.676882661996494 - type: euclidean_recall value: 68.311345646438 - type: manhattan_accuracy value: 83.4296954163438 - type: manhattan_ap value: 65.20662449614932 - type: manhattan_f1 value: 61.352885525070946 - type: manhattan_precision value: 55.59365623660523 - type: manhattan_recall value: 68.44327176781002 - type: max_accuracy value: 83.65619598259522 - type: max_ap value: 65.82409143427542 - type: max_f1 value: 61.952620244077536 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.90119144642372 - type: cos_sim_ap value: 84.04753852793387 - type: cos_sim_f1 value: 76.27737226277372 - type: cos_sim_precision value: 73.86757068667052 - type: cos_sim_recall value: 78.84970742223591 - type: dot_accuracy value: 87.90119144642372 - type: dot_ap value: 84.04753668117337 - type: dot_f1 value: 76.27737226277372 - type: dot_precision value: 73.86757068667052 - type: dot_recall value: 78.84970742223591 - type: euclidean_accuracy value: 87.90119144642372 - type: euclidean_ap value: 84.04754553468206 - type: euclidean_f1 value: 76.27737226277372 - type: euclidean_precision value: 73.86757068667052 - type: euclidean_recall value: 78.84970742223591 - type: manhattan_accuracy value: 87.87014398261343 - type: manhattan_ap value: 84.05164646221583 - type: manhattan_f1 value: 76.31392706820128 - type: manhattan_precision value: 73.91586694566708 - type: manhattan_recall value: 78.87280566676932 - type: max_accuracy value: 87.90119144642372 - type: max_ap value: 84.05164646221583 - type: max_f1 value: 76.31392706820128 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: map_at_1 value: 63.6 - type: map_at_10 value: 72.673 - type: map_at_100 value: 73.05199999999999 - type: map_at_1000 value: 73.057 - type: map_at_3 value: 70.833 - type: map_at_5 value: 72.05799999999999 - type: mrr_at_1 value: 63.6 - type: mrr_at_10 value: 72.673 - type: mrr_at_100 value: 73.05199999999999 - type: mrr_at_1000 value: 73.057 - type: mrr_at_3 value: 70.833 - type: mrr_at_5 value: 72.05799999999999 - type: ndcg_at_1 value: 63.6 - type: ndcg_at_10 value: 76.776 - type: ndcg_at_100 value: 78.52900000000001 - type: ndcg_at_1000 value: 78.696 - type: ndcg_at_3 value: 73.093 - type: ndcg_at_5 value: 75.288 - type: precision_at_1 value: 63.6 - type: precision_at_10 value: 8.95 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 26.533 - type: precision_at_5 value: 16.98 - type: recall_at_1 value: 63.6 - type: recall_at_10 value: 89.5 - type: recall_at_100 value: 97.5 - type: recall_at_1000 value: 98.9 - type: recall_at_3 value: 79.60000000000001 - type: recall_at_5 value: 84.89999999999999 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 89.39999999999999 - type: ap value: 75.52087544076016 - type: f1 value: 87.7629629899278 --- <p align=\"center\"> <img src=\"images/gme_logo.png\" alt=\"GME Logo\" style=\"width: 100%; max-width: 450px;\"> </p> <p align=\"center\"><b>GME: General Multimodal Embedding</b></p> ## GME-Qwen2-VL-2B We are excited to present series of unified **multimodal embedding models**, which are based on the advanced Qwen2-VL multimodal large language models (MLLMs). The models support three types of input: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance. **Key Enhancements of GME Models**: - **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches. - **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**). - **Dynamic Image Resolution**: Benefiting from and our training data, GME models support dynamic resolution image input. - **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots. This capability is particularly beneficial for complex document understanding scenarios, such as multimodal retrieval-augmented generation (RAG) applications focused on academic papers. **Developed by**: Tongyi Lab, Alibaba Group **Paper**: GME: Improving Universal Multimodal Retrieval by Multimodal LLMs ## Model List | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| MTEB-zh | UMRB | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( | 2.21B | 32768 | 1536 | 65.27 | 66.92 | 64.45 | |[]( | 8.29B | 32768 | 3584 | 67.48 | 69.73 | 67.44 | ## Usage **Use with custom code** ## Evaluation We validated the performance on our universal multimodal retrieval benchmark (**UMRB**) among others. | | | Single-modal | | Cross-modal | | | Fused-modal | | | | Avg. | |--------------------|------|:------------:|:---------:|:-----------:|:-----------:|:---------:|:-----------:|:----------:|:----------:|:-----------:|:----------:| | | | T→T (16) | I→I (1) | T→I (4) | T→VD (10) | I→T (4) | T→IT (2) | IT→T (5) | IT→I (2) | IT→IT (3) | (47) | | VISTA | 0.2B | 55.15 | **31.98** | 32.88 | 10.12 | 31.23 | 45.81 | 53.32 | 8.97 | 26.26 | 37.32 | | CLIP-SF | 0.4B | 39.75 | 31.42 | 59.05 | 24.09 | 62.95 | 66.41 | 53.32 | 34.9 | 55.65 | 43.66 | | One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.01 | | DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.04 | | E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.52 | | **GME-Qwen2-VL-2B** | 2.2B | 55.93 | 29.86 | 57.36 | 87.84 | 61.93 | 76.47 | 64.58 | 37.02 | 66.47 | 64.45 | | **GME-Qwen2-VL-7B** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | **65.83** | **80.94** | **66.18** | **42.56** | **73.62** | **67.44** | The MTEB Leaderboard English tab shows the text embeddings performence of our model. **More detailed experimental results can be found in the paper**. ## Community support ### Fine-tuning GME models can be fine-tuned by SWIFT: ## Limitations - **Single Image Input**: In , an image could be converted into a very large number of visual tokens. We limit the number of visual tokens to 1024 to obtain a good training efficiency. Due to the lack of relevant data, our models and evaluations retain one single image. - **English-only Training**: Our models are trained on english data only. Although the models are multilingual, the multilingual-multimodal embedding performance are not guaranteed. We will extend to multi-image input, image-text interleaved data as well as multilingual data in the future version. ## Redistribution and Use We encourage and value diverse applications of GME models and continuous enhancements to the models themselves. - If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display on your website, user interface, blog post, About page, or product documentation. - If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with . ## Cloud API Services In addition to the open-source GME series models, GME series models are also available as commercial API services on Alibaba Cloud. - MultiModal Embedding Models: The model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Hiring We have open positions for Research Interns and Full-Time Researchers to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of Beijing and Hangzhou, offering a collaborative and dynamic work environment where you can contribute to cutting-edge advancements in artificial intelligence and machine learning. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, please consider cite:",
27
+ "model_explanation_gemini": "Performs multilingual (English and Chinese) text and vision-language tasks including sentence similarity, classification, clustering, retrieval, and reranking, based on the Qwen2-VL-2B-Instruct architecture. \n\n**Features**: \n- Multilingual (English, Chinese) \n- Tasks: Sentence similarity, classification, clustering, retrieval, reranking \n- Base model: Qwen/Qwen2-VL-2B-Instruct \n- License: Apache-2.0 \n\n**Comparison**:",
28
+ "release_year": "2024",
29
+ "parameter_count": "2B",
30
+ "is_fine_tuned": true,
31
+ "category": "Embedding",
32
+ "model_family": "Qwen",
33
+ "api_enhanced": true
34
  }
model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json CHANGED
The diff for this file is too large to render. See raw diff
 
model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json CHANGED
The diff for this file is too large to render. See raw diff
 
model_data_json/Alibaba-NLP_gte-base-en-v1.5.json CHANGED
@@ -23,5 +23,10 @@
23
  "region:us"
24
  ],
25
  "description": "--- library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.7910447761194 - type: ap value: 37.053785713650626 - type: f1 value: 68.51101510998551 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.016875 - type: ap value: 89.17750268426342 - type: f1 value: 92.9970977240524 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.312000000000005 - type: f1 value: 52.98175784163017 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 38.193 - type: map_at_10 value: 54.848 - type: map_at_100 value: 55.388000000000005 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.427 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 39.047 - type: mrr_at_10 value: 55.153 - type: mrr_at_100 value: 55.686 - type: mrr_at_1000 value: 55.688 - type: mrr_at_3 value: 50.676 - type: mrr_at_5 value: 53.417 - type: ndcg_at_1 value: 38.193 - type: ndcg_at_10 value: 63.486 - type: ndcg_at_100 value: 65.58 - type: ndcg_at_1000 value: 65.61 - type: ndcg_at_3 value: 54.494 - type: ndcg_at_5 value: 59.339 - type: precision_at_1 value: 38.193 - type: precision_at_10 value: 9.075 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.096 - type: precision_at_5 value: 15.619 - type: recall_at_1 value: 38.193 - type: recall_at_10 value: 90.754 - type: recall_at_100 value: 99.431 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.28699999999999 - type: recall_at_5 value: 78.094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.508221208908964 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.04668382560096 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.828759903716815 - type: mrr value: 74.37343358395991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.03673698773017 - type: cos_sim_spearman value: 83.6470866785058 - type: euclidean_pearson value: 82.64048673096565 - type: euclidean_spearman value: 83.63142367101115 - type: manhattan_pearson value: 82.71493099760228 - type: manhattan_spearman value: 83.60491704294326 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.73376623376623 - type: f1 value: 86.70294049278262 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.31923804167062 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.552547125348454 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.567 - type: map_at_10 value: 41.269 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.84 - type: map_at_3 value: 37.567 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 46.900999999999996 - type: mrr_at_100 value: 47.662 - type: mrr_at_1000 value: 47.713 - type: mrr_at_3 value: 43.801 - type: mrr_at_5 value: 45.689 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 47.73 - type: ndcg_at_100 value: 53.128 - type: ndcg_at_1000 value: 55.300000000000004 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.782 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 9.142 - type: precision_at_100 value: 1.485 - type: precision_at_1000 value: 0.197 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.535 - type: recall_at_1 value: 30.567 - type: recall_at_10 value: 60.602999999999994 - type: recall_at_100 value: 83.22800000000001 - type: recall_at_1000 value: 96.696 - type: recall_at_3 value: 44.336999999999996 - type: recall_at_5 value: 51.949 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 28.538000000000004 - type: map_at_10 value: 38.757999999999996 - type: map_at_100 value: 40.129 - type: map_at_1000 value: 40.262 - type: map_at_3 value: 35.866 - type: map_at_5 value: 37.417 - type: mrr_at_1 value: 36.051 - type: mrr_at_10 value: 44.868 - type: mrr_at_100 value: 45.568999999999996 - type: mrr_at_1000 value: 45.615 - type: mrr_at_3 value: 42.558 - type: mrr_at_5 value: 43.883 - type: ndcg_at_1 value: 36.051 - type: ndcg_at_10 value: 44.584 - type: ndcg_at_100 value: 49.356 - type: ndcg_at_1000 value: 51.39 - type: ndcg_at_3 value: 40.389 - type: ndcg_at_5 value: 42.14 - type: precision_at_1 value: 36.051 - type: precision_at_10 value: 8.446 - type: precision_at_100 value: 1.411 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 19.639 - type: precision_at_5 value: 13.796 - type: recall_at_1 value: 28.538000000000004 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_100 value: 75.098 - type: recall_at_1000 value: 87.848 - type: recall_at_3 value: 42.236000000000004 - type: recall_at_5 value: 47.377 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 37.188 - type: map_at_10 value: 50.861000000000004 - type: map_at_100 value: 51.917 - type: map_at_1000 value: 51.964999999999996 - type: map_at_3 value: 47.144000000000005 - type: map_at_5 value: 49.417 - type: mrr_at_1 value: 42.571 - type: mrr_at_10 value: 54.086999999999996 - type: mrr_at_100 value: 54.739000000000004 - type: mrr_at_1000 value: 54.762 - type: mrr_at_3 value: 51.285000000000004 - type: mrr_at_5 value: 53.0 - type: ndcg_at_1 value: 42.571 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.426 - type: ndcg_at_3 value: 51.0 - type: ndcg_at_5 value: 54.346000000000004 - type: precision_at_1 value: 42.571 - type: precision_at_10 value: 9.467 - type: precision_at_100 value: 1.2550000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.114 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 37.188 - type: recall_at_10 value: 73.068 - type: recall_at_100 value: 91.203 - type: recall_at_1000 value: 97.916 - type: recall_at_3 value: 56.552 - type: recall_at_5 value: 64.567 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 25.041000000000004 - type: map_at_10 value: 33.86 - type: map_at_100 value: 34.988 - type: map_at_1000 value: 35.064 - type: map_at_3 value: 31.049 - type: map_at_5 value: 32.845 - type: mrr_at_1 value: 26.893 - type: mrr_at_10 value: 35.594 - type: mrr_at_100 value: 36.617 - type: mrr_at_1000 value: 36.671 - type: mrr_at_3 value: 33.051 - type: mrr_at_5 value: 34.61 - type: ndcg_at_1 value: 26.893 - type: ndcg_at_10 value: 38.674 - type: ndcg_at_100 value: 44.178 - type: ndcg_at_1000 value: 46.089999999999996 - type: ndcg_at_3 value: 33.485 - type: ndcg_at_5 value: 36.402 - type: precision_at_1 value: 26.893 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.2 - type: precision_at_5 value: 10.26 - type: recall_at_1 value: 25.041000000000004 - type: recall_at_10 value: 51.666000000000004 - type: recall_at_100 value: 76.896 - type: recall_at_1000 value: 91.243 - type: recall_at_3 value: 38.035999999999994 - type: recall_at_5 value: 44.999 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.909999999999998 - type: map_at_10 value: 23.901 - type: map_at_100 value: 25.165 - type: map_at_1000 value: 25.291000000000004 - type: map_at_3 value: 21.356 - type: map_at_5 value: 22.816 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 28.382 - type: mrr_at_100 value: 29.465000000000003 - type: mrr_at_1000 value: 29.535 - type: mrr_at_3 value: 25.933 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 29.099000000000004 - type: ndcg_at_100 value: 35.127 - type: ndcg_at_1000 value: 38.096000000000004 - type: ndcg_at_3 value: 24.464 - type: ndcg_at_5 value: 26.709 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.398 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.774 - type: precision_at_5 value: 8.632 - type: recall_at_1 value: 15.909999999999998 - type: recall_at_10 value: 40.672000000000004 - type: recall_at_100 value: 66.855 - type: recall_at_1000 value: 87.922 - type: recall_at_3 value: 28.069 - type: recall_at_5 value: 33.812 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 30.175 - type: map_at_10 value: 41.36 - type: map_at_100 value: 42.701 - type: map_at_1000 value: 42.817 - type: map_at_3 value: 37.931 - type: map_at_5 value: 39.943 - type: mrr_at_1 value: 35.611 - type: mrr_at_10 value: 46.346 - type: mrr_at_100 value: 47.160000000000004 - type: mrr_at_1000 value: 47.203 - type: mrr_at_3 value: 43.712 - type: mrr_at_5 value: 45.367000000000004 - type: ndcg_at_1 value: 35.611 - type: ndcg_at_10 value: 47.532000000000004 - type: ndcg_at_100 value: 53.003 - type: ndcg_at_1000 value: 55.007 - type: ndcg_at_3 value: 42.043 - type: ndcg_at_5 value: 44.86 - type: precision_at_1 value: 35.611 - type: precision_at_10 value: 8.624 - type: precision_at_100 value: 1.332 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 20.083000000000002 - type: precision_at_5 value: 14.437 - type: recall_at_1 value: 30.175 - type: recall_at_10 value: 60.5 - type: recall_at_100 value: 83.399 - type: recall_at_1000 value: 96.255 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 52.432 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 22.467000000000002 - type: map_at_10 value: 33.812999999999995 - type: map_at_100 value: 35.248000000000005 - type: map_at_1000 value: 35.359 - type: map_at_3 value: 30.316 - type: map_at_5 value: 32.233000000000004 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 38.979 - type: mrr_at_100 value: 39.937 - type: mrr_at_1000 value: 39.989999999999995 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.871 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 40.282000000000004 - type: ndcg_at_100 value: 46.22 - type: ndcg_at_1000 value: 48.507 - type: ndcg_at_3 value: 34.596 - type: ndcg_at_5 value: 37.267 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 1.257 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.275 - type: precision_at_5 value: 12.556999999999999 - type: recall_at_1 value: 22.467000000000002 - type: recall_at_10 value: 54.14099999999999 - type: recall_at_100 value: 79.593 - type: recall_at_1000 value: 95.063 - type: recall_at_3 value: 38.539 - type: recall_at_5 value: 45.403 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 24.18591666666667 - type: map_at_10 value: 33.84258333333333 - type: map_at_100 value: 35.11391666666666 - type: map_at_1000 value: 35.23258333333333 - type: map_at_3 value: 30.764249999999997 - type: map_at_5 value: 32.52333333333334 - type: mrr_at_1 value: 28.54733333333333 - type: mrr_at_10 value: 37.81725 - type: mrr_at_100 value: 38.716499999999996 - type: mrr_at_1000 value: 38.77458333333333 - type: mrr_at_3 value: 35.157833333333336 - type: mrr_at_5 value: 36.69816666666667 - type: ndcg_at_1 value: 28.54733333333333 - type: ndcg_at_10 value: 39.51508333333334 - type: ndcg_at_100 value: 44.95316666666666 - type: ndcg_at_1000 value: 47.257083333333334 - type: ndcg_at_3 value: 34.205833333333324 - type: ndcg_at_5 value: 36.78266666666667 - type: precision_at_1 value: 28.54733333333333 - type: precision_at_10 value: 7.082583333333334 - type: precision_at_100 value: 1.1590833333333332 - type: precision_at_1000 value: 0.15516666666666662 - type: precision_at_3 value: 15.908750000000001 - type: precision_at_5 value: 11.505416666666669 - type: recall_at_1 value: 24.18591666666667 - type: recall_at_10 value: 52.38758333333333 - type: recall_at_100 value: 76.13666666666667 - type: recall_at_1000 value: 91.99066666666667 - type: recall_at_3 value: 37.78333333333334 - type: recall_at_5 value: 44.30141666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 29.781000000000002 - type: map_at_100 value: 30.847 - type: map_at_1000 value: 30.94 - type: map_at_3 value: 27.167 - type: map_at_5 value: 28.633999999999997 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 32.476 - type: mrr_at_100 value: 33.337 - type: mrr_at_1000 value: 33.403 - type: mrr_at_3 value: 29.881999999999998 - type: mrr_at_5 value: 31.339 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 34.596 - type: ndcg_at_100 value: 39.635 - type: ndcg_at_1000 value: 42.079 - type: ndcg_at_3 value: 29.516 - type: ndcg_at_5 value: 31.959 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 9.171999999999999 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 46.826 - type: recall_at_100 value: 69.554 - type: recall_at_1000 value: 87.749 - type: recall_at_3 value: 33.016 - type: recall_at_5 value: 38.97 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 15.614 - type: map_at_10 value: 22.927 - type: map_at_100 value: 24.185000000000002 - type: map_at_1000 value: 24.319 - type: map_at_3 value: 20.596 - type: map_at_5 value: 21.854000000000003 - type: mrr_at_1 value: 18.858 - type: mrr_at_10 value: 26.535999999999998 - type: mrr_at_100 value: 27.582 - type: mrr_at_1000 value: 27.665 - type: mrr_at_3 value: 24.295 - type: mrr_at_5 value: 25.532 - type: ndcg_at_1 value: 18.858 - type: ndcg_at_10 value: 27.583000000000002 - type: ndcg_at_100 value: 33.635 - type: ndcg_at_1000 value: 36.647 - type: ndcg_at_3 value: 23.348 - type: ndcg_at_5 value: 25.257 - type: precision_at_1 value: 18.858 - type: precision_at_10 value: 5.158 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 11.092 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.614 - type: recall_at_10 value: 37.916 - type: recall_at_100 value: 65.205 - type: recall_at_1000 value: 86.453 - type: recall_at_3 value: 26.137 - type: recall_at_5 value: 31.087999999999997 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 23.078000000000003 - type: map_at_10 value: 31.941999999999997 - type: map_at_100 value: 33.196999999999996 - type: map_at_1000 value: 33.303 - type: map_at_3 value: 28.927000000000003 - type: map_at_5 value: 30.707 - type: mrr_at_1 value: 26.866 - type: mrr_at_10 value: 35.557 - type: mrr_at_100 value: 36.569 - type: mrr_at_1000 value: 36.632 - type: mrr_at_3 value: 32.897999999999996 - type: mrr_at_5 value: 34.437 - type: ndcg_at_1 value: 26.866 - type: ndcg_at_10 value: 37.372 - type: ndcg_at_100 value: 43.248 - type: ndcg_at_1000 value: 45.632 - type: ndcg_at_3 value: 31.852999999999998 - type: ndcg_at_5 value: 34.582 - type: precision_at_1 value: 26.866 - type: precision_at_10 value: 6.511 - type: precision_at_100 value: 1.078 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 14.582999999999998 - type: precision_at_5 value: 10.634 - type: recall_at_1 value: 23.078000000000003 - type: recall_at_10 value: 50.334 - type: recall_at_100 value: 75.787 - type: recall_at_1000 value: 92.485 - type: recall_at_3 value: 35.386 - type: recall_at_5 value: 42.225 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.203999999999997 - type: map_at_10 value: 31.276 - type: map_at_100 value: 32.844 - type: map_at_1000 value: 33.062999999999995 - type: map_at_3 value: 27.733999999999998 - type: map_at_5 value: 29.64 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 36.083 - type: mrr_at_100 value: 37.008 - type: mrr_at_1000 value: 37.076 - type: mrr_at_3 value: 33.004 - type: mrr_at_5 value: 34.664 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 37.763000000000005 - type: ndcg_at_100 value: 43.566 - type: ndcg_at_1000 value: 46.356 - type: ndcg_at_3 value: 31.673000000000002 - type: ndcg_at_5 value: 34.501 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 7.470000000000001 - type: precision_at_100 value: 1.502 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 14.756 - type: precision_at_5 value: 11.225 - type: recall_at_1 value: 22.203999999999997 - type: recall_at_10 value: 51.437999999999995 - type: recall_at_100 value: 76.845 - type: recall_at_1000 value: 94.38600000000001 - type: recall_at_3 value: 34.258 - type: recall_at_5 value: 41.512 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.474 - type: map_at_10 value: 26.362999999999996 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.567999999999998 - type: map_at_3 value: 23.518 - type: map_at_5 value: 25.068 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.998 - type: mrr_at_100 value: 28.953 - type: mrr_at_1000 value: 29.03 - type: mrr_at_3 value: 25.230999999999998 - type: mrr_at_5 value: 26.654 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.684 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.555 - type: ndcg_at_3 value: 26.057000000000002 - type: ndcg_at_5 value: 28.587 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 11.583 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.474 - type: recall_at_10 value: 46.497 - type: recall_at_100 value: 69.977 - type: recall_at_1000 value: 89.872 - type: recall_at_3 value: 31.385999999999996 - type: recall_at_5 value: 37.283 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.173 - type: map_at_10 value: 30.407 - type: map_at_100 value: 32.528 - type: map_at_1000 value: 32.698 - type: map_at_3 value: 25.523 - type: map_at_5 value: 28.038 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.515 - type: mrr_at_100 value: 52.214000000000006 - type: mrr_at_1000 value: 52.237 - type: mrr_at_3 value: 48.502 - type: mrr_at_5 value: 50.251000000000005 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 40.355000000000004 - type: ndcg_at_100 value: 47.68 - type: ndcg_at_1000 value: 50.370000000000005 - type: ndcg_at_3 value: 33.946 - type: ndcg_at_5 value: 36.057 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.508 - type: precision_at_100 value: 2.054 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 25.581 - type: precision_at_5 value: 19.256999999999998 - type: recall_at_1 value: 17.173 - type: recall_at_10 value: 46.967 - type: recall_at_100 value: 71.47200000000001 - type: recall_at_1000 value: 86.238 - type: recall_at_3 value: 30.961 - type: recall_at_5 value: 37.539 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.999 - type: map_at_10 value: 18.989 - type: map_at_100 value: 26.133 - type: map_at_1000 value: 27.666 - type: map_at_3 value: 13.918 - type: map_at_5 value: 16.473 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.161 - type: mrr_at_100 value: 74.516 - type: mrr_at_1000 value: 74.524 - type: mrr_at_3 value: 72.875 - type: mrr_at_5 value: 73.613 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 39.902 - type: ndcg_at_100 value: 44.212 - type: ndcg_at_1000 value: 51.62 - type: ndcg_at_3 value: 45.193 - type: ndcg_at_5 value: 42.541000000000004 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 30.425 - type: precision_at_100 value: 9.754999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 48.25 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.999 - type: recall_at_10 value: 24.133 - type: recall_at_100 value: 49.138999999999996 - type: recall_at_1000 value: 72.639 - type: recall_at_3 value: 15.287999999999998 - type: recall_at_5 value: 19.415 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.38999999999999 - type: f1 value: 41.444205512055234 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.35000000000001 - type: map_at_10 value: 92.837 - type: map_at_100 value: 92.996 - type: map_at_1000 value: 93.006 - type: map_at_3 value: 92.187 - type: map_at_5 value: 92.595 - type: mrr_at_1 value: 93.864 - type: mrr_at_10 value: 96.723 - type: mrr_at_100 value: 96.72500000000001 - type: mrr_at_1000 value: 96.72500000000001 - type: mrr_at_3 value: 96.64 - type: mrr_at_5 value: 96.71499999999999 - type: ndcg_at_1 value: 93.864 - type: ndcg_at_10 value: 94.813 - type: ndcg_at_100 value: 95.243 - type: ndcg_at_1000 value: 95.38600000000001 - type: ndcg_at_3 value: 94.196 - type: ndcg_at_5 value: 94.521 - type: precision_at_1 value: 93.864 - type: precision_at_10 value: 10.951 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 35.114000000000004 - type: precision_at_5 value: 21.476 - type: recall_at_1 value: 87.35000000000001 - type: recall_at_10 value: 96.941 - type: recall_at_100 value: 98.397 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 95.149 - type: recall_at_5 value: 96.131 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.476 - type: map_at_10 value: 40.11 - type: map_at_100 value: 42.229 - type: map_at_1000 value: 42.378 - type: map_at_3 value: 34.512 - type: map_at_5 value: 38.037 - type: mrr_at_1 value: 47.839999999999996 - type: mrr_at_10 value: 57.053 - type: mrr_at_100 value: 57.772 - type: mrr_at_1000 value: 57.799 - type: mrr_at_3 value: 54.552 - type: mrr_at_5 value: 56.011 - type: ndcg_at_1 value: 47.839999999999996 - type: ndcg_at_10 value: 48.650999999999996 - type: ndcg_at_100 value: 55.681000000000004 - type: ndcg_at_1000 value: 57.979 - type: ndcg_at_3 value: 43.923 - type: ndcg_at_5 value: 46.037 - type: precision_at_1 value: 47.839999999999996 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 2.0660000000000003 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 29.064 - type: precision_at_5 value: 22.006 - type: recall_at_1 value: 24.476 - type: recall_at_10 value: 56.216 - type: recall_at_100 value: 81.798 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 39.357 - type: recall_at_5 value: 47.802 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 42.728 - type: map_at_10 value: 57.737 - type: map_at_100 value: 58.531 - type: map_at_1000 value: 58.594 - type: map_at_3 value: 54.869 - type: map_at_5 value: 56.55 - type: mrr_at_1 value: 85.456 - type: mrr_at_10 value: 90.062 - type: mrr_at_100 value: 90.159 - type: mrr_at_1000 value: 90.16 - type: mrr_at_3 value: 89.37899999999999 - type: mrr_at_5 value: 89.81 - type: ndcg_at_1 value: 85.456 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.341 - type: ndcg_at_1000 value: 71.538 - type: ndcg_at_3 value: 63.735 - type: ndcg_at_5 value: 65.823 - type: precision_at_1 value: 85.456 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.545 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 38.861000000000004 - type: precision_at_5 value: 24.964 - type: recall_at_1 value: 42.728 - type: recall_at_10 value: 67.252 - type: recall_at_100 value: 77.265 - type: recall_at_1000 value: 85.246 - type: recall_at_3 value: 58.292 - type: recall_at_5 value: 62.41100000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 87.4836 - type: ap value: 82.29552224030336 - type: f1 value: 87.42791432227448 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.015 - type: map_at_10 value: 35.621 - type: map_at_100 value: 36.809 - type: map_at_1000 value: 36.853 - type: map_at_3 value: 31.832 - type: map_at_5 value: 34.006 - type: mrr_at_1 value: 23.738999999999997 - type: mrr_at_10 value: 36.309999999999995 - type: mrr_at_100 value: 37.422 - type: mrr_at_1000 value: 37.461 - type: mrr_at_3 value: 32.592999999999996 - type: mrr_at_5 value: 34.736 - type: ndcg_at_1 value: 23.724999999999998 - type: ndcg_at_10 value: 42.617 - type: ndcg_at_100 value: 48.217999999999996 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 34.905 - type: ndcg_at_5 value: 38.769 - type: precision_at_1 value: 23.724999999999998 - type: precision_at_10 value: 6.689 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.89 - type: precision_at_5 value: 10.897 - type: recall_at_1 value: 23.015 - type: recall_at_10 value: 64.041 - type: recall_at_100 value: 89.724 - type: recall_at_1000 value: 98.00999999999999 - type: recall_at_3 value: 43.064 - type: recall_at_5 value: 52.31099999999999 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.49794801641588 - type: f1 value: 96.28931114498003 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.81121751025992 - type: f1 value: 63.18740125901853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.66644250168123 - type: f1 value: 74.93211186867839 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.77202420981843 - type: f1 value: 81.63681969283554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.596687684870645 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.26965660101405 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.33619694846802 - type: mrr value: 32.53719657720334 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.0729999999999995 - type: map_at_10 value: 13.245999999999999 - type: map_at_100 value: 16.747999999999998 - type: map_at_1000 value: 18.163 - type: map_at_3 value: 10.064 - type: map_at_5 value: 11.513 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 58.092 - type: mrr_at_100 value: 58.752 - type: mrr_at_1000 value: 58.78 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 57.389 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 35.881 - type: ndcg_at_100 value: 32.751999999999995 - type: ndcg_at_1000 value: 41.498000000000005 - type: ndcg_at_3 value: 42.518 - type: ndcg_at_5 value: 39.550999999999995 - type: precision_at_1 value: 49.536 - type: precision_at_10 value: 26.316 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.081 - type: precision_at_3 value: 39.938 - type: precision_at_5 value: 34.056 - type: recall_at_1 value: 6.0729999999999995 - type: recall_at_10 value: 16.593 - type: recall_at_100 value: 32.883 - type: recall_at_1000 value: 64.654 - type: recall_at_3 value: 11.174000000000001 - type: recall_at_5 value: 13.528 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 30.043 - type: map_at_10 value: 45.318999999999996 - type: map_at_100 value: 46.381 - type: map_at_1000 value: 46.412 - type: map_at_3 value: 40.941 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 33.98 - type: mrr_at_10 value: 47.870000000000005 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.703 - type: mrr_at_3 value: 44.341 - type: mrr_at_5 value: 46.547 - type: ndcg_at_1 value: 33.98 - type: ndcg_at_10 value: 52.957 - type: ndcg_at_100 value: 57.434 - type: ndcg_at_1000 value: 58.103 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 49.353 - type: precision_at_1 value: 33.98 - type: precision_at_10 value: 8.786 - type: precision_at_100 value: 1.1280000000000001 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.577 - type: precision_at_5 value: 14.942 - type: recall_at_1 value: 30.043 - type: recall_at_10 value: 73.593 - type: recall_at_100 value: 93.026 - type: recall_at_1000 value: 97.943 - type: recall_at_3 value: 52.955 - type: recall_at_5 value: 63.132 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.808 - type: map_at_10 value: 84.675 - type: map_at_100 value: 85.322 - type: map_at_1000 value: 85.33800000000001 - type: map_at_3 value: 81.68900000000001 - type: map_at_5 value: 83.543 - type: mrr_at_1 value: 81.5 - type: mrr_at_10 value: 87.59700000000001 - type: mrr_at_100 value: 87.705 - type: mrr_at_1000 value: 87.70599999999999 - type: mrr_at_3 value: 86.607 - type: mrr_at_5 value: 87.289 - type: ndcg_at_1 value: 81.51 - type: ndcg_at_10 value: 88.41799999999999 - type: ndcg_at_100 value: 89.644 - type: ndcg_at_1000 value: 89.725 - type: ndcg_at_3 value: 85.49900000000001 - type: ndcg_at_5 value: 87.078 - type: precision_at_1 value: 81.51 - type: precision_at_10 value: 13.438 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.363 - type: precision_at_5 value: 24.57 - type: recall_at_1 value: 70.808 - type: recall_at_10 value: 95.575 - type: recall_at_100 value: 99.667 - type: recall_at_1000 value: 99.98899999999999 - type: recall_at_3 value: 87.223 - type: recall_at_5 value: 91.682 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 58.614831329137715 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.86580408560826 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 13.014000000000001 - type: map_at_100 value: 15.412999999999998 - type: map_at_1000 value: 15.756999999999998 - type: map_at_3 value: 9.216000000000001 - type: map_at_5 value: 11.036999999999999 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 37.133 - type: mrr_at_100 value: 38.165 - type: mrr_at_1000 value: 38.198 - type: mrr_at_3 value: 33.217 - type: mrr_at_5 value: 35.732 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.918000000000003 - type: ndcg_at_100 value: 30.983 - type: ndcg_at_1000 value: 36.629 - type: ndcg_at_3 value: 20.544999999999998 - type: ndcg_at_5 value: 18.192 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 11.44 - type: precision_at_100 value: 2.459 - type: precision_at_1000 value: 0.381 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 16.16 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 23.215 - type: recall_at_100 value: 49.902 - type: recall_at_1000 value: 77.403 - type: recall_at_3 value: 11.733 - type: recall_at_5 value: 16.372999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.9365442977452 - type: cos_sim_spearman value: 79.36960687383745 - type: euclidean_pearson value: 79.6045204840714 - type: euclidean_spearman value: 79.26382712751337 - type: manhattan_pearson value: 79.4805084789529 - type: manhattan_spearman value: 79.21847863209523 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.27906192961453 - type: cos_sim_spearman value: 74.38364712099211 - type: euclidean_pearson value: 78.54358927241223 - type: euclidean_spearman value: 74.22185560806376 - type: manhattan_pearson value: 78.50904327377751 - type: manhattan_spearman value: 74.2627500781748 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.66863742649639 - type: cos_sim_spearman value: 84.70630905216271 - type: euclidean_pearson value: 84.64498334705334 - type: euclidean_spearman value: 84.87204770690148 - type: manhattan_pearson value: 84.65774227976077 - type: manhattan_spearman value: 84.91251851797985 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.1577763924467 - type: cos_sim_spearman value: 80.10314039230198 - type: euclidean_pearson value: 81.51346991046043 - type: euclidean_spearman value: 80.08678485109435 - type: manhattan_pearson value: 81.57058914661894 - type: manhattan_spearman value: 80.1516230725106 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.40310839662533 - type: cos_sim_spearman value: 87.16293477217867 - type: euclidean_pearson value: 86.50688711184775 - type: euclidean_spearman value: 87.08651444923031 - type: manhattan_pearson value: 86.54674677557857 - type: manhattan_spearman value: 87.15079017870971 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.32886275207817 - type: cos_sim_spearman value: 85.0190460590732 - type: euclidean_pearson value: 84.42553652784679 - type: euclidean_spearman value: 85.20027364279328 - type: manhattan_pearson value: 84.42926246281078 - type: manhattan_spearman value: 85.20187419804306 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.76732216967812 - type: cos_sim_spearman value: 90.63701653633909 - type: euclidean_pearson value: 90.26678186114682 - type: euclidean_spearman value: 90.67288073455427 - type: manhattan_pearson value: 90.20772020584582 - type: manhattan_spearman value: 90.60764863983702 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 69.09280387698125 - type: cos_sim_spearman value: 68.62743151172162 - type: euclidean_pearson value: 69.89386398104689 - type: euclidean_spearman value: 68.71191066733556 - type: manhattan_pearson value: 69.92516500604872 - type: manhattan_spearman value: 68.80452846992576 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.13178592019887 - type: cos_sim_spearman value: 86.03947178806887 - type: euclidean_pearson value: 85.87029414285313 - type: euclidean_spearman value: 86.04960843306998 - type: manhattan_pearson value: 85.92946858580146 - type: manhattan_spearman value: 86.12575341860442 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.16657063002837 - type: mrr value: 95.73671063867141 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 63.510999999999996 - type: map_at_10 value: 72.76899999999999 - type: map_at_100 value: 73.303 - type: map_at_1000 value: 73.32499999999999 - type: map_at_3 value: 70.514 - type: map_at_5 value: 71.929 - type: mrr_at_1 value: 66.333 - type: mrr_at_10 value: 73.75 - type: mrr_at_100 value: 74.119 - type: mrr_at_1000 value: 74.138 - type: mrr_at_3 value: 72.222 - type: mrr_at_5 value: 73.122 - type: ndcg_at_1 value: 66.333 - type: ndcg_at_10 value: 76.774 - type: ndcg_at_100 value: 78.78500000000001 - type: ndcg_at_1000 value: 79.254 - type: ndcg_at_3 value: 73.088 - type: ndcg_at_5 value: 75.002 - type: precision_at_1 value: 66.333 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.222 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 63.510999999999996 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.86699999999999 - type: recall_at_5 value: 82.73899999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.78514851485149 - type: cos_sim_ap value: 94.94214383862038 - type: cos_sim_f1 value: 89.02255639097744 - type: cos_sim_precision value: 89.2462311557789 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.78217821782178 - type: dot_ap value: 94.69965247836805 - type: dot_f1 value: 88.78695208970439 - type: dot_precision value: 90.54054054054053 - type: dot_recall value: 87.1 - type: euclidean_accuracy value: 99.78118811881188 - type: euclidean_ap value: 94.9865187695411 - type: euclidean_f1 value: 88.99950223992036 - type: euclidean_precision value: 88.60257680872151 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.78811881188119 - type: manhattan_ap value: 95.0021236766459 - type: manhattan_f1 value: 89.12071535022356 - type: manhattan_precision value: 88.54886475814413 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.78811881188119 - type: max_ap value: 95.0021236766459 - type: max_f1 value: 89.12071535022356 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.93190546593995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 37.602808534760655 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.29214480978073 - type: mrr value: 53.123169722434426 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.967800769650022 - type: cos_sim_spearman value: 31.168490040206926 - type: dot_pearson value: 30.888603021128553 - type: dot_spearman value: 31.028241262520385 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22300000000000003 - type: map_at_10 value: 1.781 - type: map_at_100 value: 9.905999999999999 - type: map_at_1000 value: 23.455000000000002 - type: map_at_3 value: 0.569 - type: map_at_5 value: 0.918 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 55.32 - type: ndcg_at_1000 value: 49.532 - type: ndcg_at_3 value: 73.715 - type: ndcg_at_5 value: 72.74199999999999 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 78.8 - type: precision_at_100 value: 56.32 - type: precision_at_1000 value: 21.504 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.0 - type: recall_at_1 value: 0.22300000000000003 - type: recall_at_10 value: 2.049 - type: recall_at_100 value: 13.553 - type: recall_at_1000 value: 46.367999999999995 - type: recall_at_3 value: 0.604 - type: recall_at_5 value: 1.015 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0380000000000003 - type: map_at_10 value: 10.188 - type: map_at_100 value: 16.395 - type: map_at_1000 value: 18.024 - type: map_at_3 value: 6.236 - type: map_at_5 value: 7.276000000000001 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 46.292 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.446 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.32 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 25.219 - type: ndcg_at_100 value: 37.802 - type: ndcg_at_1000 value: 49.274 - type: ndcg_at_3 value: 28.605999999999998 - type: ndcg_at_5 value: 26.21 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.776 - type: precision_at_1000 value: 1.522 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 3.0380000000000003 - type: recall_at_10 value: 16.298000000000002 - type: recall_at_100 value: 48.712 - type: recall_at_1000 value: 83.16799999999999 - type: recall_at_3 value: 7.265000000000001 - type: recall_at_5 value: 9.551 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 83.978 - type: ap value: 24.751887949330015 - type: f1 value: 66.8685134049279 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.573288058856825 - type: f1 value: 61.973261751726604 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.75483298792469 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.36824223639506 - type: cos_sim_ap value: 75.53126388573047 - type: cos_sim_f1 value: 67.9912831688245 - type: cos_sim_precision value: 66.11817501869858 - type: cos_sim_recall value: 69.9736147757256 - type: dot_accuracy value: 86.39804494248078 - type: dot_ap value: 75.27598891718046 - type: dot_f1 value: 67.91146284159763 - type: dot_precision value: 63.90505003490807 - type: dot_recall value: 72.45382585751979 - type: euclidean_accuracy value: 86.36228169517793 - type: euclidean_ap value: 75.51438087434647 - type: euclidean_f1 value: 68.02370523061066 - type: euclidean_precision value: 66.46525679758308 - type: euclidean_recall value: 69.65699208443272 - type: manhattan_accuracy value: 86.46361089586935 - type: manhattan_ap value: 75.50800785730111 - type: manhattan_f1 value: 67.9220437187253 - type: manhattan_precision value: 67.79705573080967 - type: manhattan_recall value: 68.04749340369392 - type: max_accuracy value: 86.46361089586935 - type: max_ap value: 75.53126388573047 - type: max_f1 value: 68.02370523061066 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.80350836341057 - type: cos_sim_ap value: 85.51101933260743 - type: cos_sim_f1 value: 77.9152271629704 - type: cos_sim_precision value: 75.27815662910056 - type: cos_sim_recall value: 80.74376347397599 - type: dot_accuracy value: 88.84425815966158 - type: dot_ap value: 85.49726945962519 - type: dot_f1 value: 77.94445269567801 - type: dot_precision value: 75.27251864601261 - type: dot_recall value: 80.81305820757623 - type: euclidean_accuracy value: 88.80350836341057 - type: euclidean_ap value: 85.4882880790211 - type: euclidean_f1 value: 77.87063284615103 - type: euclidean_precision value: 74.61022927689595 - type: euclidean_recall value: 81.42901139513397 - type: manhattan_accuracy value: 88.7161873714441 - type: manhattan_ap value: 85.45753871906821 - type: manhattan_f1 value: 77.8686401480111 - type: manhattan_precision value: 74.95903683123174 - type: manhattan_recall value: 81.01324299353249 - type: max_accuracy value: 88.84425815966158 - type: max_ap value: 85.51101933260743 - type: max_f1 value: 77.94445269567801 --- <!-- **English** | 中文 --> # gte-base-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with : Use with : Use with infinity: Infinity is a MIT licensed server for OpenAI-compatible deployment. ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000 - CPT: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
26
- "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and similarity in English, optimized for performance across various benchmarks.\n\nModel Features: \n- Sentence embedding generation \n- Supports classification, retrieval, clustering, and similarity tasks \n- Optimized for English language \n- Evaluated on multiple MTEB benchmarks \n\nComparative Explanation: \nOutperforms or matches other models in tasks like Amazon review classification (93.02% accuracy) and BIOSSES semantic textual similarity (85.04 Pearson correlation),"
 
 
 
 
 
27
  }
 
23
  "region:us"
24
  ],
25
  "description": "--- library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.7910447761194 - type: ap value: 37.053785713650626 - type: f1 value: 68.51101510998551 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.016875 - type: ap value: 89.17750268426342 - type: f1 value: 92.9970977240524 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.312000000000005 - type: f1 value: 52.98175784163017 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 38.193 - type: map_at_10 value: 54.848 - type: map_at_100 value: 55.388000000000005 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.427 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 39.047 - type: mrr_at_10 value: 55.153 - type: mrr_at_100 value: 55.686 - type: mrr_at_1000 value: 55.688 - type: mrr_at_3 value: 50.676 - type: mrr_at_5 value: 53.417 - type: ndcg_at_1 value: 38.193 - type: ndcg_at_10 value: 63.486 - type: ndcg_at_100 value: 65.58 - type: ndcg_at_1000 value: 65.61 - type: ndcg_at_3 value: 54.494 - type: ndcg_at_5 value: 59.339 - type: precision_at_1 value: 38.193 - type: precision_at_10 value: 9.075 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.096 - type: precision_at_5 value: 15.619 - type: recall_at_1 value: 38.193 - type: recall_at_10 value: 90.754 - type: recall_at_100 value: 99.431 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.28699999999999 - type: recall_at_5 value: 78.094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.508221208908964 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.04668382560096 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.828759903716815 - type: mrr value: 74.37343358395991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.03673698773017 - type: cos_sim_spearman value: 83.6470866785058 - type: euclidean_pearson value: 82.64048673096565 - type: euclidean_spearman value: 83.63142367101115 - type: manhattan_pearson value: 82.71493099760228 - type: manhattan_spearman value: 83.60491704294326 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.73376623376623 - type: f1 value: 86.70294049278262 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.31923804167062 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.552547125348454 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.567 - type: map_at_10 value: 41.269 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.84 - type: map_at_3 value: 37.567 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 46.900999999999996 - type: mrr_at_100 value: 47.662 - type: mrr_at_1000 value: 47.713 - type: mrr_at_3 value: 43.801 - type: mrr_at_5 value: 45.689 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 47.73 - type: ndcg_at_100 value: 53.128 - type: ndcg_at_1000 value: 55.300000000000004 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.782 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 9.142 - type: precision_at_100 value: 1.485 - type: precision_at_1000 value: 0.197 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.535 - type: recall_at_1 value: 30.567 - type: recall_at_10 value: 60.602999999999994 - type: recall_at_100 value: 83.22800000000001 - type: recall_at_1000 value: 96.696 - type: recall_at_3 value: 44.336999999999996 - type: recall_at_5 value: 51.949 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 28.538000000000004 - type: map_at_10 value: 38.757999999999996 - type: map_at_100 value: 40.129 - type: map_at_1000 value: 40.262 - type: map_at_3 value: 35.866 - type: map_at_5 value: 37.417 - type: mrr_at_1 value: 36.051 - type: mrr_at_10 value: 44.868 - type: mrr_at_100 value: 45.568999999999996 - type: mrr_at_1000 value: 45.615 - type: mrr_at_3 value: 42.558 - type: mrr_at_5 value: 43.883 - type: ndcg_at_1 value: 36.051 - type: ndcg_at_10 value: 44.584 - type: ndcg_at_100 value: 49.356 - type: ndcg_at_1000 value: 51.39 - type: ndcg_at_3 value: 40.389 - type: ndcg_at_5 value: 42.14 - type: precision_at_1 value: 36.051 - type: precision_at_10 value: 8.446 - type: precision_at_100 value: 1.411 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 19.639 - type: precision_at_5 value: 13.796 - type: recall_at_1 value: 28.538000000000004 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_100 value: 75.098 - type: recall_at_1000 value: 87.848 - type: recall_at_3 value: 42.236000000000004 - type: recall_at_5 value: 47.377 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 37.188 - type: map_at_10 value: 50.861000000000004 - type: map_at_100 value: 51.917 - type: map_at_1000 value: 51.964999999999996 - type: map_at_3 value: 47.144000000000005 - type: map_at_5 value: 49.417 - type: mrr_at_1 value: 42.571 - type: mrr_at_10 value: 54.086999999999996 - type: mrr_at_100 value: 54.739000000000004 - type: mrr_at_1000 value: 54.762 - type: mrr_at_3 value: 51.285000000000004 - type: mrr_at_5 value: 53.0 - type: ndcg_at_1 value: 42.571 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.426 - type: ndcg_at_3 value: 51.0 - type: ndcg_at_5 value: 54.346000000000004 - type: precision_at_1 value: 42.571 - type: precision_at_10 value: 9.467 - type: precision_at_100 value: 1.2550000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.114 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 37.188 - type: recall_at_10 value: 73.068 - type: recall_at_100 value: 91.203 - type: recall_at_1000 value: 97.916 - type: recall_at_3 value: 56.552 - type: recall_at_5 value: 64.567 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 25.041000000000004 - type: map_at_10 value: 33.86 - type: map_at_100 value: 34.988 - type: map_at_1000 value: 35.064 - type: map_at_3 value: 31.049 - type: map_at_5 value: 32.845 - type: mrr_at_1 value: 26.893 - type: mrr_at_10 value: 35.594 - type: mrr_at_100 value: 36.617 - type: mrr_at_1000 value: 36.671 - type: mrr_at_3 value: 33.051 - type: mrr_at_5 value: 34.61 - type: ndcg_at_1 value: 26.893 - type: ndcg_at_10 value: 38.674 - type: ndcg_at_100 value: 44.178 - type: ndcg_at_1000 value: 46.089999999999996 - type: ndcg_at_3 value: 33.485 - type: ndcg_at_5 value: 36.402 - type: precision_at_1 value: 26.893 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.2 - type: precision_at_5 value: 10.26 - type: recall_at_1 value: 25.041000000000004 - type: recall_at_10 value: 51.666000000000004 - type: recall_at_100 value: 76.896 - type: recall_at_1000 value: 91.243 - type: recall_at_3 value: 38.035999999999994 - type: recall_at_5 value: 44.999 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.909999999999998 - type: map_at_10 value: 23.901 - type: map_at_100 value: 25.165 - type: map_at_1000 value: 25.291000000000004 - type: map_at_3 value: 21.356 - type: map_at_5 value: 22.816 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 28.382 - type: mrr_at_100 value: 29.465000000000003 - type: mrr_at_1000 value: 29.535 - type: mrr_at_3 value: 25.933 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 29.099000000000004 - type: ndcg_at_100 value: 35.127 - type: ndcg_at_1000 value: 38.096000000000004 - type: ndcg_at_3 value: 24.464 - type: ndcg_at_5 value: 26.709 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.398 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.774 - type: precision_at_5 value: 8.632 - type: recall_at_1 value: 15.909999999999998 - type: recall_at_10 value: 40.672000000000004 - type: recall_at_100 value: 66.855 - type: recall_at_1000 value: 87.922 - type: recall_at_3 value: 28.069 - type: recall_at_5 value: 33.812 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 30.175 - type: map_at_10 value: 41.36 - type: map_at_100 value: 42.701 - type: map_at_1000 value: 42.817 - type: map_at_3 value: 37.931 - type: map_at_5 value: 39.943 - type: mrr_at_1 value: 35.611 - type: mrr_at_10 value: 46.346 - type: mrr_at_100 value: 47.160000000000004 - type: mrr_at_1000 value: 47.203 - type: mrr_at_3 value: 43.712 - type: mrr_at_5 value: 45.367000000000004 - type: ndcg_at_1 value: 35.611 - type: ndcg_at_10 value: 47.532000000000004 - type: ndcg_at_100 value: 53.003 - type: ndcg_at_1000 value: 55.007 - type: ndcg_at_3 value: 42.043 - type: ndcg_at_5 value: 44.86 - type: precision_at_1 value: 35.611 - type: precision_at_10 value: 8.624 - type: precision_at_100 value: 1.332 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 20.083000000000002 - type: precision_at_5 value: 14.437 - type: recall_at_1 value: 30.175 - type: recall_at_10 value: 60.5 - type: recall_at_100 value: 83.399 - type: recall_at_1000 value: 96.255 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 52.432 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 22.467000000000002 - type: map_at_10 value: 33.812999999999995 - type: map_at_100 value: 35.248000000000005 - type: map_at_1000 value: 35.359 - type: map_at_3 value: 30.316 - type: map_at_5 value: 32.233000000000004 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 38.979 - type: mrr_at_100 value: 39.937 - type: mrr_at_1000 value: 39.989999999999995 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.871 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 40.282000000000004 - type: ndcg_at_100 value: 46.22 - type: ndcg_at_1000 value: 48.507 - type: ndcg_at_3 value: 34.596 - type: ndcg_at_5 value: 37.267 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 1.257 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.275 - type: precision_at_5 value: 12.556999999999999 - type: recall_at_1 value: 22.467000000000002 - type: recall_at_10 value: 54.14099999999999 - type: recall_at_100 value: 79.593 - type: recall_at_1000 value: 95.063 - type: recall_at_3 value: 38.539 - type: recall_at_5 value: 45.403 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 24.18591666666667 - type: map_at_10 value: 33.84258333333333 - type: map_at_100 value: 35.11391666666666 - type: map_at_1000 value: 35.23258333333333 - type: map_at_3 value: 30.764249999999997 - type: map_at_5 value: 32.52333333333334 - type: mrr_at_1 value: 28.54733333333333 - type: mrr_at_10 value: 37.81725 - type: mrr_at_100 value: 38.716499999999996 - type: mrr_at_1000 value: 38.77458333333333 - type: mrr_at_3 value: 35.157833333333336 - type: mrr_at_5 value: 36.69816666666667 - type: ndcg_at_1 value: 28.54733333333333 - type: ndcg_at_10 value: 39.51508333333334 - type: ndcg_at_100 value: 44.95316666666666 - type: ndcg_at_1000 value: 47.257083333333334 - type: ndcg_at_3 value: 34.205833333333324 - type: ndcg_at_5 value: 36.78266666666667 - type: precision_at_1 value: 28.54733333333333 - type: precision_at_10 value: 7.082583333333334 - type: precision_at_100 value: 1.1590833333333332 - type: precision_at_1000 value: 0.15516666666666662 - type: precision_at_3 value: 15.908750000000001 - type: precision_at_5 value: 11.505416666666669 - type: recall_at_1 value: 24.18591666666667 - type: recall_at_10 value: 52.38758333333333 - type: recall_at_100 value: 76.13666666666667 - type: recall_at_1000 value: 91.99066666666667 - type: recall_at_3 value: 37.78333333333334 - type: recall_at_5 value: 44.30141666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 29.781000000000002 - type: map_at_100 value: 30.847 - type: map_at_1000 value: 30.94 - type: map_at_3 value: 27.167 - type: map_at_5 value: 28.633999999999997 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 32.476 - type: mrr_at_100 value: 33.337 - type: mrr_at_1000 value: 33.403 - type: mrr_at_3 value: 29.881999999999998 - type: mrr_at_5 value: 31.339 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 34.596 - type: ndcg_at_100 value: 39.635 - type: ndcg_at_1000 value: 42.079 - type: ndcg_at_3 value: 29.516 - type: ndcg_at_5 value: 31.959 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 9.171999999999999 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 46.826 - type: recall_at_100 value: 69.554 - type: recall_at_1000 value: 87.749 - type: recall_at_3 value: 33.016 - type: recall_at_5 value: 38.97 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 15.614 - type: map_at_10 value: 22.927 - type: map_at_100 value: 24.185000000000002 - type: map_at_1000 value: 24.319 - type: map_at_3 value: 20.596 - type: map_at_5 value: 21.854000000000003 - type: mrr_at_1 value: 18.858 - type: mrr_at_10 value: 26.535999999999998 - type: mrr_at_100 value: 27.582 - type: mrr_at_1000 value: 27.665 - type: mrr_at_3 value: 24.295 - type: mrr_at_5 value: 25.532 - type: ndcg_at_1 value: 18.858 - type: ndcg_at_10 value: 27.583000000000002 - type: ndcg_at_100 value: 33.635 - type: ndcg_at_1000 value: 36.647 - type: ndcg_at_3 value: 23.348 - type: ndcg_at_5 value: 25.257 - type: precision_at_1 value: 18.858 - type: precision_at_10 value: 5.158 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 11.092 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.614 - type: recall_at_10 value: 37.916 - type: recall_at_100 value: 65.205 - type: recall_at_1000 value: 86.453 - type: recall_at_3 value: 26.137 - type: recall_at_5 value: 31.087999999999997 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 23.078000000000003 - type: map_at_10 value: 31.941999999999997 - type: map_at_100 value: 33.196999999999996 - type: map_at_1000 value: 33.303 - type: map_at_3 value: 28.927000000000003 - type: map_at_5 value: 30.707 - type: mrr_at_1 value: 26.866 - type: mrr_at_10 value: 35.557 - type: mrr_at_100 value: 36.569 - type: mrr_at_1000 value: 36.632 - type: mrr_at_3 value: 32.897999999999996 - type: mrr_at_5 value: 34.437 - type: ndcg_at_1 value: 26.866 - type: ndcg_at_10 value: 37.372 - type: ndcg_at_100 value: 43.248 - type: ndcg_at_1000 value: 45.632 - type: ndcg_at_3 value: 31.852999999999998 - type: ndcg_at_5 value: 34.582 - type: precision_at_1 value: 26.866 - type: precision_at_10 value: 6.511 - type: precision_at_100 value: 1.078 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 14.582999999999998 - type: precision_at_5 value: 10.634 - type: recall_at_1 value: 23.078000000000003 - type: recall_at_10 value: 50.334 - type: recall_at_100 value: 75.787 - type: recall_at_1000 value: 92.485 - type: recall_at_3 value: 35.386 - type: recall_at_5 value: 42.225 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.203999999999997 - type: map_at_10 value: 31.276 - type: map_at_100 value: 32.844 - type: map_at_1000 value: 33.062999999999995 - type: map_at_3 value: 27.733999999999998 - type: map_at_5 value: 29.64 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 36.083 - type: mrr_at_100 value: 37.008 - type: mrr_at_1000 value: 37.076 - type: mrr_at_3 value: 33.004 - type: mrr_at_5 value: 34.664 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 37.763000000000005 - type: ndcg_at_100 value: 43.566 - type: ndcg_at_1000 value: 46.356 - type: ndcg_at_3 value: 31.673000000000002 - type: ndcg_at_5 value: 34.501 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 7.470000000000001 - type: precision_at_100 value: 1.502 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 14.756 - type: precision_at_5 value: 11.225 - type: recall_at_1 value: 22.203999999999997 - type: recall_at_10 value: 51.437999999999995 - type: recall_at_100 value: 76.845 - type: recall_at_1000 value: 94.38600000000001 - type: recall_at_3 value: 34.258 - type: recall_at_5 value: 41.512 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.474 - type: map_at_10 value: 26.362999999999996 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.567999999999998 - type: map_at_3 value: 23.518 - type: map_at_5 value: 25.068 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.998 - type: mrr_at_100 value: 28.953 - type: mrr_at_1000 value: 29.03 - type: mrr_at_3 value: 25.230999999999998 - type: mrr_at_5 value: 26.654 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.684 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.555 - type: ndcg_at_3 value: 26.057000000000002 - type: ndcg_at_5 value: 28.587 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 11.583 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.474 - type: recall_at_10 value: 46.497 - type: recall_at_100 value: 69.977 - type: recall_at_1000 value: 89.872 - type: recall_at_3 value: 31.385999999999996 - type: recall_at_5 value: 37.283 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.173 - type: map_at_10 value: 30.407 - type: map_at_100 value: 32.528 - type: map_at_1000 value: 32.698 - type: map_at_3 value: 25.523 - type: map_at_5 value: 28.038 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.515 - type: mrr_at_100 value: 52.214000000000006 - type: mrr_at_1000 value: 52.237 - type: mrr_at_3 value: 48.502 - type: mrr_at_5 value: 50.251000000000005 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 40.355000000000004 - type: ndcg_at_100 value: 47.68 - type: ndcg_at_1000 value: 50.370000000000005 - type: ndcg_at_3 value: 33.946 - type: ndcg_at_5 value: 36.057 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.508 - type: precision_at_100 value: 2.054 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 25.581 - type: precision_at_5 value: 19.256999999999998 - type: recall_at_1 value: 17.173 - type: recall_at_10 value: 46.967 - type: recall_at_100 value: 71.47200000000001 - type: recall_at_1000 value: 86.238 - type: recall_at_3 value: 30.961 - type: recall_at_5 value: 37.539 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.999 - type: map_at_10 value: 18.989 - type: map_at_100 value: 26.133 - type: map_at_1000 value: 27.666 - type: map_at_3 value: 13.918 - type: map_at_5 value: 16.473 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.161 - type: mrr_at_100 value: 74.516 - type: mrr_at_1000 value: 74.524 - type: mrr_at_3 value: 72.875 - type: mrr_at_5 value: 73.613 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 39.902 - type: ndcg_at_100 value: 44.212 - type: ndcg_at_1000 value: 51.62 - type: ndcg_at_3 value: 45.193 - type: ndcg_at_5 value: 42.541000000000004 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 30.425 - type: precision_at_100 value: 9.754999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 48.25 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.999 - type: recall_at_10 value: 24.133 - type: recall_at_100 value: 49.138999999999996 - type: recall_at_1000 value: 72.639 - type: recall_at_3 value: 15.287999999999998 - type: recall_at_5 value: 19.415 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.38999999999999 - type: f1 value: 41.444205512055234 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.35000000000001 - type: map_at_10 value: 92.837 - type: map_at_100 value: 92.996 - type: map_at_1000 value: 93.006 - type: map_at_3 value: 92.187 - type: map_at_5 value: 92.595 - type: mrr_at_1 value: 93.864 - type: mrr_at_10 value: 96.723 - type: mrr_at_100 value: 96.72500000000001 - type: mrr_at_1000 value: 96.72500000000001 - type: mrr_at_3 value: 96.64 - type: mrr_at_5 value: 96.71499999999999 - type: ndcg_at_1 value: 93.864 - type: ndcg_at_10 value: 94.813 - type: ndcg_at_100 value: 95.243 - type: ndcg_at_1000 value: 95.38600000000001 - type: ndcg_at_3 value: 94.196 - type: ndcg_at_5 value: 94.521 - type: precision_at_1 value: 93.864 - type: precision_at_10 value: 10.951 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 35.114000000000004 - type: precision_at_5 value: 21.476 - type: recall_at_1 value: 87.35000000000001 - type: recall_at_10 value: 96.941 - type: recall_at_100 value: 98.397 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 95.149 - type: recall_at_5 value: 96.131 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.476 - type: map_at_10 value: 40.11 - type: map_at_100 value: 42.229 - type: map_at_1000 value: 42.378 - type: map_at_3 value: 34.512 - type: map_at_5 value: 38.037 - type: mrr_at_1 value: 47.839999999999996 - type: mrr_at_10 value: 57.053 - type: mrr_at_100 value: 57.772 - type: mrr_at_1000 value: 57.799 - type: mrr_at_3 value: 54.552 - type: mrr_at_5 value: 56.011 - type: ndcg_at_1 value: 47.839999999999996 - type: ndcg_at_10 value: 48.650999999999996 - type: ndcg_at_100 value: 55.681000000000004 - type: ndcg_at_1000 value: 57.979 - type: ndcg_at_3 value: 43.923 - type: ndcg_at_5 value: 46.037 - type: precision_at_1 value: 47.839999999999996 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 2.0660000000000003 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 29.064 - type: precision_at_5 value: 22.006 - type: recall_at_1 value: 24.476 - type: recall_at_10 value: 56.216 - type: recall_at_100 value: 81.798 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 39.357 - type: recall_at_5 value: 47.802 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 42.728 - type: map_at_10 value: 57.737 - type: map_at_100 value: 58.531 - type: map_at_1000 value: 58.594 - type: map_at_3 value: 54.869 - type: map_at_5 value: 56.55 - type: mrr_at_1 value: 85.456 - type: mrr_at_10 value: 90.062 - type: mrr_at_100 value: 90.159 - type: mrr_at_1000 value: 90.16 - type: mrr_at_3 value: 89.37899999999999 - type: mrr_at_5 value: 89.81 - type: ndcg_at_1 value: 85.456 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.341 - type: ndcg_at_1000 value: 71.538 - type: ndcg_at_3 value: 63.735 - type: ndcg_at_5 value: 65.823 - type: precision_at_1 value: 85.456 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.545 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 38.861000000000004 - type: precision_at_5 value: 24.964 - type: recall_at_1 value: 42.728 - type: recall_at_10 value: 67.252 - type: recall_at_100 value: 77.265 - type: recall_at_1000 value: 85.246 - type: recall_at_3 value: 58.292 - type: recall_at_5 value: 62.41100000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 87.4836 - type: ap value: 82.29552224030336 - type: f1 value: 87.42791432227448 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.015 - type: map_at_10 value: 35.621 - type: map_at_100 value: 36.809 - type: map_at_1000 value: 36.853 - type: map_at_3 value: 31.832 - type: map_at_5 value: 34.006 - type: mrr_at_1 value: 23.738999999999997 - type: mrr_at_10 value: 36.309999999999995 - type: mrr_at_100 value: 37.422 - type: mrr_at_1000 value: 37.461 - type: mrr_at_3 value: 32.592999999999996 - type: mrr_at_5 value: 34.736 - type: ndcg_at_1 value: 23.724999999999998 - type: ndcg_at_10 value: 42.617 - type: ndcg_at_100 value: 48.217999999999996 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 34.905 - type: ndcg_at_5 value: 38.769 - type: precision_at_1 value: 23.724999999999998 - type: precision_at_10 value: 6.689 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.89 - type: precision_at_5 value: 10.897 - type: recall_at_1 value: 23.015 - type: recall_at_10 value: 64.041 - type: recall_at_100 value: 89.724 - type: recall_at_1000 value: 98.00999999999999 - type: recall_at_3 value: 43.064 - type: recall_at_5 value: 52.31099999999999 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.49794801641588 - type: f1 value: 96.28931114498003 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.81121751025992 - type: f1 value: 63.18740125901853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.66644250168123 - type: f1 value: 74.93211186867839 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.77202420981843 - type: f1 value: 81.63681969283554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.596687684870645 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.26965660101405 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.33619694846802 - type: mrr value: 32.53719657720334 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.0729999999999995 - type: map_at_10 value: 13.245999999999999 - type: map_at_100 value: 16.747999999999998 - type: map_at_1000 value: 18.163 - type: map_at_3 value: 10.064 - type: map_at_5 value: 11.513 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 58.092 - type: mrr_at_100 value: 58.752 - type: mrr_at_1000 value: 58.78 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 57.389 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 35.881 - type: ndcg_at_100 value: 32.751999999999995 - type: ndcg_at_1000 value: 41.498000000000005 - type: ndcg_at_3 value: 42.518 - type: ndcg_at_5 value: 39.550999999999995 - type: precision_at_1 value: 49.536 - type: precision_at_10 value: 26.316 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.081 - type: precision_at_3 value: 39.938 - type: precision_at_5 value: 34.056 - type: recall_at_1 value: 6.0729999999999995 - type: recall_at_10 value: 16.593 - type: recall_at_100 value: 32.883 - type: recall_at_1000 value: 64.654 - type: recall_at_3 value: 11.174000000000001 - type: recall_at_5 value: 13.528 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 30.043 - type: map_at_10 value: 45.318999999999996 - type: map_at_100 value: 46.381 - type: map_at_1000 value: 46.412 - type: map_at_3 value: 40.941 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 33.98 - type: mrr_at_10 value: 47.870000000000005 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.703 - type: mrr_at_3 value: 44.341 - type: mrr_at_5 value: 46.547 - type: ndcg_at_1 value: 33.98 - type: ndcg_at_10 value: 52.957 - type: ndcg_at_100 value: 57.434 - type: ndcg_at_1000 value: 58.103 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 49.353 - type: precision_at_1 value: 33.98 - type: precision_at_10 value: 8.786 - type: precision_at_100 value: 1.1280000000000001 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.577 - type: precision_at_5 value: 14.942 - type: recall_at_1 value: 30.043 - type: recall_at_10 value: 73.593 - type: recall_at_100 value: 93.026 - type: recall_at_1000 value: 97.943 - type: recall_at_3 value: 52.955 - type: recall_at_5 value: 63.132 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.808 - type: map_at_10 value: 84.675 - type: map_at_100 value: 85.322 - type: map_at_1000 value: 85.33800000000001 - type: map_at_3 value: 81.68900000000001 - type: map_at_5 value: 83.543 - type: mrr_at_1 value: 81.5 - type: mrr_at_10 value: 87.59700000000001 - type: mrr_at_100 value: 87.705 - type: mrr_at_1000 value: 87.70599999999999 - type: mrr_at_3 value: 86.607 - type: mrr_at_5 value: 87.289 - type: ndcg_at_1 value: 81.51 - type: ndcg_at_10 value: 88.41799999999999 - type: ndcg_at_100 value: 89.644 - type: ndcg_at_1000 value: 89.725 - type: ndcg_at_3 value: 85.49900000000001 - type: ndcg_at_5 value: 87.078 - type: precision_at_1 value: 81.51 - type: precision_at_10 value: 13.438 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.363 - type: precision_at_5 value: 24.57 - type: recall_at_1 value: 70.808 - type: recall_at_10 value: 95.575 - type: recall_at_100 value: 99.667 - type: recall_at_1000 value: 99.98899999999999 - type: recall_at_3 value: 87.223 - type: recall_at_5 value: 91.682 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 58.614831329137715 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.86580408560826 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 13.014000000000001 - type: map_at_100 value: 15.412999999999998 - type: map_at_1000 value: 15.756999999999998 - type: map_at_3 value: 9.216000000000001 - type: map_at_5 value: 11.036999999999999 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 37.133 - type: mrr_at_100 value: 38.165 - type: mrr_at_1000 value: 38.198 - type: mrr_at_3 value: 33.217 - type: mrr_at_5 value: 35.732 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.918000000000003 - type: ndcg_at_100 value: 30.983 - type: ndcg_at_1000 value: 36.629 - type: ndcg_at_3 value: 20.544999999999998 - type: ndcg_at_5 value: 18.192 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 11.44 - type: precision_at_100 value: 2.459 - type: precision_at_1000 value: 0.381 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 16.16 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 23.215 - type: recall_at_100 value: 49.902 - type: recall_at_1000 value: 77.403 - type: recall_at_3 value: 11.733 - type: recall_at_5 value: 16.372999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.9365442977452 - type: cos_sim_spearman value: 79.36960687383745 - type: euclidean_pearson value: 79.6045204840714 - type: euclidean_spearman value: 79.26382712751337 - type: manhattan_pearson value: 79.4805084789529 - type: manhattan_spearman value: 79.21847863209523 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.27906192961453 - type: cos_sim_spearman value: 74.38364712099211 - type: euclidean_pearson value: 78.54358927241223 - type: euclidean_spearman value: 74.22185560806376 - type: manhattan_pearson value: 78.50904327377751 - type: manhattan_spearman value: 74.2627500781748 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.66863742649639 - type: cos_sim_spearman value: 84.70630905216271 - type: euclidean_pearson value: 84.64498334705334 - type: euclidean_spearman value: 84.87204770690148 - type: manhattan_pearson value: 84.65774227976077 - type: manhattan_spearman value: 84.91251851797985 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.1577763924467 - type: cos_sim_spearman value: 80.10314039230198 - type: euclidean_pearson value: 81.51346991046043 - type: euclidean_spearman value: 80.08678485109435 - type: manhattan_pearson value: 81.57058914661894 - type: manhattan_spearman value: 80.1516230725106 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.40310839662533 - type: cos_sim_spearman value: 87.16293477217867 - type: euclidean_pearson value: 86.50688711184775 - type: euclidean_spearman value: 87.08651444923031 - type: manhattan_pearson value: 86.54674677557857 - type: manhattan_spearman value: 87.15079017870971 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.32886275207817 - type: cos_sim_spearman value: 85.0190460590732 - type: euclidean_pearson value: 84.42553652784679 - type: euclidean_spearman value: 85.20027364279328 - type: manhattan_pearson value: 84.42926246281078 - type: manhattan_spearman value: 85.20187419804306 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.76732216967812 - type: cos_sim_spearman value: 90.63701653633909 - type: euclidean_pearson value: 90.26678186114682 - type: euclidean_spearman value: 90.67288073455427 - type: manhattan_pearson value: 90.20772020584582 - type: manhattan_spearman value: 90.60764863983702 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 69.09280387698125 - type: cos_sim_spearman value: 68.62743151172162 - type: euclidean_pearson value: 69.89386398104689 - type: euclidean_spearman value: 68.71191066733556 - type: manhattan_pearson value: 69.92516500604872 - type: manhattan_spearman value: 68.80452846992576 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.13178592019887 - type: cos_sim_spearman value: 86.03947178806887 - type: euclidean_pearson value: 85.87029414285313 - type: euclidean_spearman value: 86.04960843306998 - type: manhattan_pearson value: 85.92946858580146 - type: manhattan_spearman value: 86.12575341860442 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.16657063002837 - type: mrr value: 95.73671063867141 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 63.510999999999996 - type: map_at_10 value: 72.76899999999999 - type: map_at_100 value: 73.303 - type: map_at_1000 value: 73.32499999999999 - type: map_at_3 value: 70.514 - type: map_at_5 value: 71.929 - type: mrr_at_1 value: 66.333 - type: mrr_at_10 value: 73.75 - type: mrr_at_100 value: 74.119 - type: mrr_at_1000 value: 74.138 - type: mrr_at_3 value: 72.222 - type: mrr_at_5 value: 73.122 - type: ndcg_at_1 value: 66.333 - type: ndcg_at_10 value: 76.774 - type: ndcg_at_100 value: 78.78500000000001 - type: ndcg_at_1000 value: 79.254 - type: ndcg_at_3 value: 73.088 - type: ndcg_at_5 value: 75.002 - type: precision_at_1 value: 66.333 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.222 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 63.510999999999996 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.86699999999999 - type: recall_at_5 value: 82.73899999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.78514851485149 - type: cos_sim_ap value: 94.94214383862038 - type: cos_sim_f1 value: 89.02255639097744 - type: cos_sim_precision value: 89.2462311557789 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.78217821782178 - type: dot_ap value: 94.69965247836805 - type: dot_f1 value: 88.78695208970439 - type: dot_precision value: 90.54054054054053 - type: dot_recall value: 87.1 - type: euclidean_accuracy value: 99.78118811881188 - type: euclidean_ap value: 94.9865187695411 - type: euclidean_f1 value: 88.99950223992036 - type: euclidean_precision value: 88.60257680872151 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.78811881188119 - type: manhattan_ap value: 95.0021236766459 - type: manhattan_f1 value: 89.12071535022356 - type: manhattan_precision value: 88.54886475814413 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.78811881188119 - type: max_ap value: 95.0021236766459 - type: max_f1 value: 89.12071535022356 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.93190546593995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 37.602808534760655 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.29214480978073 - type: mrr value: 53.123169722434426 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.967800769650022 - type: cos_sim_spearman value: 31.168490040206926 - type: dot_pearson value: 30.888603021128553 - type: dot_spearman value: 31.028241262520385 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22300000000000003 - type: map_at_10 value: 1.781 - type: map_at_100 value: 9.905999999999999 - type: map_at_1000 value: 23.455000000000002 - type: map_at_3 value: 0.569 - type: map_at_5 value: 0.918 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 55.32 - type: ndcg_at_1000 value: 49.532 - type: ndcg_at_3 value: 73.715 - type: ndcg_at_5 value: 72.74199999999999 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 78.8 - type: precision_at_100 value: 56.32 - type: precision_at_1000 value: 21.504 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.0 - type: recall_at_1 value: 0.22300000000000003 - type: recall_at_10 value: 2.049 - type: recall_at_100 value: 13.553 - type: recall_at_1000 value: 46.367999999999995 - type: recall_at_3 value: 0.604 - type: recall_at_5 value: 1.015 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0380000000000003 - type: map_at_10 value: 10.188 - type: map_at_100 value: 16.395 - type: map_at_1000 value: 18.024 - type: map_at_3 value: 6.236 - type: map_at_5 value: 7.276000000000001 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 46.292 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.446 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.32 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 25.219 - type: ndcg_at_100 value: 37.802 - type: ndcg_at_1000 value: 49.274 - type: ndcg_at_3 value: 28.605999999999998 - type: ndcg_at_5 value: 26.21 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.776 - type: precision_at_1000 value: 1.522 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 3.0380000000000003 - type: recall_at_10 value: 16.298000000000002 - type: recall_at_100 value: 48.712 - type: recall_at_1000 value: 83.16799999999999 - type: recall_at_3 value: 7.265000000000001 - type: recall_at_5 value: 9.551 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 83.978 - type: ap value: 24.751887949330015 - type: f1 value: 66.8685134049279 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.573288058856825 - type: f1 value: 61.973261751726604 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.75483298792469 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.36824223639506 - type: cos_sim_ap value: 75.53126388573047 - type: cos_sim_f1 value: 67.9912831688245 - type: cos_sim_precision value: 66.11817501869858 - type: cos_sim_recall value: 69.9736147757256 - type: dot_accuracy value: 86.39804494248078 - type: dot_ap value: 75.27598891718046 - type: dot_f1 value: 67.91146284159763 - type: dot_precision value: 63.90505003490807 - type: dot_recall value: 72.45382585751979 - type: euclidean_accuracy value: 86.36228169517793 - type: euclidean_ap value: 75.51438087434647 - type: euclidean_f1 value: 68.02370523061066 - type: euclidean_precision value: 66.46525679758308 - type: euclidean_recall value: 69.65699208443272 - type: manhattan_accuracy value: 86.46361089586935 - type: manhattan_ap value: 75.50800785730111 - type: manhattan_f1 value: 67.9220437187253 - type: manhattan_precision value: 67.79705573080967 - type: manhattan_recall value: 68.04749340369392 - type: max_accuracy value: 86.46361089586935 - type: max_ap value: 75.53126388573047 - type: max_f1 value: 68.02370523061066 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.80350836341057 - type: cos_sim_ap value: 85.51101933260743 - type: cos_sim_f1 value: 77.9152271629704 - type: cos_sim_precision value: 75.27815662910056 - type: cos_sim_recall value: 80.74376347397599 - type: dot_accuracy value: 88.84425815966158 - type: dot_ap value: 85.49726945962519 - type: dot_f1 value: 77.94445269567801 - type: dot_precision value: 75.27251864601261 - type: dot_recall value: 80.81305820757623 - type: euclidean_accuracy value: 88.80350836341057 - type: euclidean_ap value: 85.4882880790211 - type: euclidean_f1 value: 77.87063284615103 - type: euclidean_precision value: 74.61022927689595 - type: euclidean_recall value: 81.42901139513397 - type: manhattan_accuracy value: 88.7161873714441 - type: manhattan_ap value: 85.45753871906821 - type: manhattan_f1 value: 77.8686401480111 - type: manhattan_precision value: 74.95903683123174 - type: manhattan_recall value: 81.01324299353249 - type: max_accuracy value: 88.84425815966158 - type: max_ap value: 85.51101933260743 - type: max_f1 value: 77.94445269567801 --- <!-- **English** | 中文 --> # gte-base-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with : Use with : Use with infinity: Infinity is a MIT licensed server for OpenAI-compatible deployment. ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000 - CPT: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
26
+ "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and similarity in English, optimized for performance across various benchmarks.\n\nModel Features: \n- Sentence embedding generation \n- Supports classification, retrieval, clustering, and similarity tasks \n- Optimized for English language \n- Evaluated on multiple MTEB benchmarks \n\nComparative Explanation: \nOutperforms or matches other models in tasks like Amazon review classification (93.02% accuracy) and BIOSSES semantic textual similarity (85.04 Pearson correlation),",
27
+ "release_year": "2024",
28
+ "parameter_count": null,
29
+ "is_fine_tuned": false,
30
+ "category": "Embedding",
31
+ "api_enhanced": true
32
  }
model_data_json/Alibaba-NLP_gte-large-en-v1.5.json CHANGED
@@ -24,5 +24,10 @@
24
  "region:us"
25
  ],
26
  "description": "--- datasets: - allenai/c4 library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.01492537313432 - type: ap value: 35.05341696659522 - type: f1 value: 66.71270310883853 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.97189999999999 - type: ap value: 90.5952493948908 - type: f1 value: 93.95848137716877 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 54.196 - type: f1 value: 53.80122334012787 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 47.297 - type: map_at_10 value: 64.303 - type: map_at_100 value: 64.541 - type: map_at_1000 value: 64.541 - type: map_at_3 value: 60.728 - type: map_at_5 value: 63.114000000000004 - type: mrr_at_1 value: 48.435 - type: mrr_at_10 value: 64.657 - type: mrr_at_100 value: 64.901 - type: mrr_at_1000 value: 64.901 - type: mrr_at_3 value: 61.06 - type: mrr_at_5 value: 63.514 - type: ndcg_at_1 value: 47.297 - type: ndcg_at_10 value: 72.107 - type: ndcg_at_100 value: 72.963 - type: ndcg_at_1000 value: 72.963 - type: ndcg_at_3 value: 65.063 - type: ndcg_at_5 value: 69.352 - type: precision_at_1 value: 47.297 - type: precision_at_10 value: 9.623 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 25.865 - type: precision_at_5 value: 17.596 - type: recall_at_1 value: 47.297 - type: recall_at_10 value: 96.23 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 77.596 - type: recall_at_5 value: 87.98 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.467787861077475 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.39198391914257 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.12794820591384 - type: mrr value: 75.9331442641692 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.85062993863319 - type: cos_sim_spearman value: 85.39049989733459 - type: euclidean_pearson value: 86.00222680278333 - type: euclidean_spearman value: 85.45556162077396 - type: manhattan_pearson value: 85.88769871785621 - type: manhattan_spearman value: 85.11760211290839 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.32792207792208 - type: f1 value: 87.29132945999555 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.5779328301945 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.94425623865118 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.978 - type: map_at_10 value: 44.45 - type: map_at_100 value: 46.19 - type: map_at_1000 value: 46.303 - type: map_at_3 value: 40.849000000000004 - type: map_at_5 value: 42.55 - type: mrr_at_1 value: 40.629 - type: mrr_at_10 value: 50.848000000000006 - type: mrr_at_100 value: 51.669 - type: mrr_at_1000 value: 51.705 - type: mrr_at_3 value: 47.997 - type: mrr_at_5 value: 49.506 - type: ndcg_at_1 value: 40.629 - type: ndcg_at_10 value: 51.102000000000004 - type: ndcg_at_100 value: 57.159000000000006 - type: ndcg_at_1000 value: 58.669000000000004 - type: ndcg_at_3 value: 45.738 - type: ndcg_at_5 value: 47.632999999999996 - type: precision_at_1 value: 40.629 - type: precision_at_10 value: 9.700000000000001 - type: precision_at_100 value: 1.5970000000000002 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.698 - type: precision_at_5 value: 15.393 - type: recall_at_1 value: 32.978 - type: recall_at_10 value: 63.711 - type: recall_at_100 value: 88.39399999999999 - type: recall_at_1000 value: 97.513 - type: recall_at_3 value: 48.025 - type: recall_at_5 value: 53.52 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 30.767 - type: map_at_10 value: 42.195 - type: map_at_100 value: 43.541999999999994 - type: map_at_1000 value: 43.673 - type: map_at_3 value: 38.561 - type: map_at_5 value: 40.532000000000004 - type: mrr_at_1 value: 38.79 - type: mrr_at_10 value: 48.021 - type: mrr_at_100 value: 48.735 - type: mrr_at_1000 value: 48.776 - type: mrr_at_3 value: 45.594 - type: mrr_at_5 value: 46.986 - type: ndcg_at_1 value: 38.79 - type: ndcg_at_10 value: 48.468 - type: ndcg_at_100 value: 53.037 - type: ndcg_at_1000 value: 55.001999999999995 - type: ndcg_at_3 value: 43.409 - type: ndcg_at_5 value: 45.654 - type: precision_at_1 value: 38.79 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.201 - type: precision_at_3 value: 21.21 - type: precision_at_5 value: 15.171999999999999 - type: recall_at_1 value: 30.767 - type: recall_at_10 value: 60.118 - type: recall_at_100 value: 79.271 - type: recall_at_1000 value: 91.43299999999999 - type: recall_at_3 value: 45.36 - type: recall_at_5 value: 51.705 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 40.007 - type: map_at_10 value: 53.529 - type: map_at_100 value: 54.602 - type: map_at_1000 value: 54.647 - type: map_at_3 value: 49.951 - type: map_at_5 value: 52.066 - type: mrr_at_1 value: 45.705 - type: mrr_at_10 value: 56.745000000000005 - type: mrr_at_100 value: 57.43899999999999 - type: mrr_at_1000 value: 57.462999999999994 - type: mrr_at_3 value: 54.25299999999999 - type: mrr_at_5 value: 55.842000000000006 - type: ndcg_at_1 value: 45.705 - type: ndcg_at_10 value: 59.809 - type: ndcg_at_100 value: 63.837999999999994 - type: ndcg_at_1000 value: 64.729 - type: ndcg_at_3 value: 53.994 - type: ndcg_at_5 value: 57.028 - type: precision_at_1 value: 45.705 - type: precision_at_10 value: 9.762 - type: precision_at_100 value: 1.275 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.368000000000002 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 40.007 - type: recall_at_10 value: 75.017 - type: recall_at_100 value: 91.99000000000001 - type: recall_at_1000 value: 98.265 - type: recall_at_3 value: 59.704 - type: recall_at_5 value: 67.109 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 26.639000000000003 - type: map_at_10 value: 35.926 - type: map_at_100 value: 37.126999999999995 - type: map_at_1000 value: 37.202 - type: map_at_3 value: 32.989000000000004 - type: map_at_5 value: 34.465 - type: mrr_at_1 value: 28.475 - type: mrr_at_10 value: 37.7 - type: mrr_at_100 value: 38.753 - type: mrr_at_1000 value: 38.807 - type: mrr_at_3 value: 35.066 - type: mrr_at_5 value: 36.512 - type: ndcg_at_1 value: 28.475 - type: ndcg_at_10 value: 41.245 - type: ndcg_at_100 value: 46.814 - type: ndcg_at_1000 value: 48.571 - type: ndcg_at_3 value: 35.528999999999996 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 28.475 - type: precision_at_10 value: 6.497 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 15.065999999999999 - type: precision_at_5 value: 10.599 - type: recall_at_1 value: 26.639000000000003 - type: recall_at_10 value: 55.759 - type: recall_at_100 value: 80.913 - type: recall_at_1000 value: 93.929 - type: recall_at_3 value: 40.454 - type: recall_at_5 value: 46.439 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.767999999999999 - type: map_at_10 value: 24.811 - type: map_at_100 value: 26.064999999999998 - type: map_at_1000 value: 26.186999999999998 - type: map_at_3 value: 21.736 - type: map_at_5 value: 23.283 - type: mrr_at_1 value: 19.527 - type: mrr_at_10 value: 29.179 - type: mrr_at_100 value: 30.153999999999996 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.223000000000003 - type: mrr_at_5 value: 27.733999999999998 - type: ndcg_at_1 value: 19.527 - type: ndcg_at_10 value: 30.786 - type: ndcg_at_100 value: 36.644 - type: ndcg_at_1000 value: 39.440999999999995 - type: ndcg_at_3 value: 24.958 - type: ndcg_at_5 value: 27.392 - type: precision_at_1 value: 19.527 - type: precision_at_10 value: 5.995 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.520999999999999 - type: precision_at_5 value: 9.129 - type: recall_at_1 value: 15.767999999999999 - type: recall_at_10 value: 44.824000000000005 - type: recall_at_100 value: 70.186 - type: recall_at_1000 value: 89.934 - type: recall_at_3 value: 28.607 - type: recall_at_5 value: 34.836 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 31.952 - type: map_at_10 value: 44.438 - type: map_at_100 value: 45.778 - type: map_at_1000 value: 45.883 - type: map_at_3 value: 41.044000000000004 - type: map_at_5 value: 42.986000000000004 - type: mrr_at_1 value: 39.172000000000004 - type: mrr_at_10 value: 49.76 - type: mrr_at_100 value: 50.583999999999996 - type: mrr_at_1000 value: 50.621 - type: mrr_at_3 value: 47.353 - type: mrr_at_5 value: 48.739 - type: ndcg_at_1 value: 39.172000000000004 - type: ndcg_at_10 value: 50.760000000000005 - type: ndcg_at_100 value: 56.084 - type: ndcg_at_1000 value: 57.865 - type: ndcg_at_3 value: 45.663 - type: ndcg_at_5 value: 48.178 - type: precision_at_1 value: 39.172000000000004 - type: precision_at_10 value: 9.22 - type: precision_at_100 value: 1.387 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 21.976000000000003 - type: precision_at_5 value: 15.457 - type: recall_at_1 value: 31.952 - type: recall_at_10 value: 63.900999999999996 - type: recall_at_100 value: 85.676 - type: recall_at_1000 value: 97.03699999999999 - type: recall_at_3 value: 49.781 - type: recall_at_5 value: 56.330000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 25.332 - type: map_at_10 value: 36.874 - type: map_at_100 value: 38.340999999999994 - type: map_at_1000 value: 38.452 - type: map_at_3 value: 33.068 - type: map_at_5 value: 35.324 - type: mrr_at_1 value: 30.822 - type: mrr_at_10 value: 41.641 - type: mrr_at_100 value: 42.519 - type: mrr_at_1000 value: 42.573 - type: mrr_at_3 value: 38.413000000000004 - type: mrr_at_5 value: 40.542 - type: ndcg_at_1 value: 30.822 - type: ndcg_at_10 value: 43.414 - type: ndcg_at_100 value: 49.196 - type: ndcg_at_1000 value: 51.237 - type: ndcg_at_3 value: 37.230000000000004 - type: ndcg_at_5 value: 40.405 - type: precision_at_1 value: 30.822 - type: precision_at_10 value: 8.379 - type: precision_at_100 value: 1.315 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 18.417 - type: precision_at_5 value: 13.744 - type: recall_at_1 value: 25.332 - type: recall_at_10 value: 57.774 - type: recall_at_100 value: 82.071 - type: recall_at_1000 value: 95.60600000000001 - type: recall_at_3 value: 40.722 - type: recall_at_5 value: 48.754999999999995 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.91033333333334 - type: map_at_10 value: 36.23225000000001 - type: map_at_100 value: 37.55766666666667 - type: map_at_1000 value: 37.672583333333336 - type: map_at_3 value: 32.95666666666667 - type: map_at_5 value: 34.73375 - type: mrr_at_1 value: 30.634 - type: mrr_at_10 value: 40.19449999999999 - type: mrr_at_100 value: 41.099250000000005 - type: mrr_at_1000 value: 41.15091666666667 - type: mrr_at_3 value: 37.4615 - type: mrr_at_5 value: 39.00216666666667 - type: ndcg_at_1 value: 30.634 - type: ndcg_at_10 value: 42.162166666666664 - type: ndcg_at_100 value: 47.60708333333333 - type: ndcg_at_1000 value: 49.68616666666666 - type: ndcg_at_3 value: 36.60316666666666 - type: ndcg_at_5 value: 39.15616666666668 - type: precision_at_1 value: 30.634 - type: precision_at_10 value: 7.6193333333333335 - type: precision_at_100 value: 1.2198333333333333 - type: precision_at_1000 value: 0.15975000000000003 - type: precision_at_3 value: 17.087 - type: precision_at_5 value: 12.298333333333334 - type: recall_at_1 value: 25.91033333333334 - type: recall_at_10 value: 55.67300000000001 - type: recall_at_100 value: 79.20608333333334 - type: recall_at_1000 value: 93.34866666666667 - type: recall_at_3 value: 40.34858333333333 - type: recall_at_5 value: 46.834083333333325 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.006 - type: map_at_10 value: 32.177 - type: map_at_100 value: 33.324999999999996 - type: map_at_1000 value: 33.419 - type: map_at_3 value: 29.952 - type: map_at_5 value: 31.095 - type: mrr_at_1 value: 28.066999999999997 - type: mrr_at_10 value: 34.995 - type: mrr_at_100 value: 35.978 - type: mrr_at_1000 value: 36.042 - type: mrr_at_3 value: 33.103 - type: mrr_at_5 value: 34.001 - type: ndcg_at_1 value: 28.066999999999997 - type: ndcg_at_10 value: 36.481 - type: ndcg_at_100 value: 42.022999999999996 - type: ndcg_at_1000 value: 44.377 - type: ndcg_at_3 value: 32.394 - type: ndcg_at_5 value: 34.108 - type: precision_at_1 value: 28.066999999999997 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 13.804 - type: precision_at_5 value: 9.508999999999999 - type: recall_at_1 value: 25.006 - type: recall_at_10 value: 46.972 - type: recall_at_100 value: 72.138 - type: recall_at_1000 value: 89.479 - type: recall_at_3 value: 35.793 - type: recall_at_5 value: 39.947 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.07 - type: map_at_10 value: 24.447 - type: map_at_100 value: 25.685999999999996 - type: map_at_1000 value: 25.813999999999997 - type: map_at_3 value: 21.634 - type: map_at_5 value: 23.133 - type: mrr_at_1 value: 19.580000000000002 - type: mrr_at_10 value: 28.127999999999997 - type: mrr_at_100 value: 29.119 - type: mrr_at_1000 value: 29.192 - type: mrr_at_3 value: 25.509999999999998 - type: mrr_at_5 value: 26.878 - type: ndcg_at_1 value: 19.580000000000002 - type: ndcg_at_10 value: 29.804000000000002 - type: ndcg_at_100 value: 35.555 - type: ndcg_at_1000 value: 38.421 - type: ndcg_at_3 value: 24.654999999999998 - type: ndcg_at_5 value: 26.881 - type: precision_at_1 value: 19.580000000000002 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 12.033000000000001 - type: precision_at_5 value: 8.871 - type: recall_at_1 value: 16.07 - type: recall_at_10 value: 42.364000000000004 - type: recall_at_100 value: 68.01899999999999 - type: recall_at_1000 value: 88.122 - type: recall_at_3 value: 27.846 - type: recall_at_5 value: 33.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 26.365 - type: map_at_10 value: 36.591 - type: map_at_100 value: 37.730000000000004 - type: map_at_1000 value: 37.84 - type: map_at_3 value: 33.403 - type: map_at_5 value: 35.272999999999996 - type: mrr_at_1 value: 30.503999999999998 - type: mrr_at_10 value: 39.940999999999995 - type: mrr_at_100 value: 40.818 - type: mrr_at_1000 value: 40.876000000000005 - type: mrr_at_3 value: 37.065 - type: mrr_at_5 value: 38.814 - type: ndcg_at_1 value: 30.503999999999998 - type: ndcg_at_10 value: 42.185 - type: ndcg_at_100 value: 47.416000000000004 - type: ndcg_at_1000 value: 49.705 - type: ndcg_at_3 value: 36.568 - type: ndcg_at_5 value: 39.416000000000004 - type: precision_at_1 value: 30.503999999999998 - type: precision_at_10 value: 7.276000000000001 - type: precision_at_100 value: 1.118 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 26.365 - type: recall_at_10 value: 55.616 - type: recall_at_100 value: 78.129 - type: recall_at_1000 value: 93.95599999999999 - type: recall_at_3 value: 40.686 - type: recall_at_5 value: 47.668 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.750999999999998 - type: map_at_10 value: 33.446 - type: map_at_100 value: 35.235 - type: map_at_1000 value: 35.478 - type: map_at_3 value: 29.358 - type: map_at_5 value: 31.525 - type: mrr_at_1 value: 27.668 - type: mrr_at_10 value: 37.694 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.779 - type: mrr_at_3 value: 34.223 - type: mrr_at_5 value: 36.08 - type: ndcg_at_1 value: 27.668 - type: ndcg_at_10 value: 40.557 - type: ndcg_at_100 value: 46.605999999999995 - type: ndcg_at_1000 value: 48.917 - type: ndcg_at_3 value: 33.677 - type: ndcg_at_5 value: 36.85 - type: precision_at_1 value: 27.668 - type: precision_at_10 value: 8.3 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.253 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 12.292 - type: recall_at_1 value: 22.750999999999998 - type: recall_at_10 value: 55.643 - type: recall_at_100 value: 82.151 - type: recall_at_1000 value: 95.963 - type: recall_at_3 value: 36.623 - type: recall_at_5 value: 44.708 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.288999999999998 - type: map_at_10 value: 25.903 - type: map_at_100 value: 27.071 - type: map_at_1000 value: 27.173000000000002 - type: map_at_3 value: 22.935 - type: map_at_5 value: 24.573 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.682000000000002 - type: mrr_at_100 value: 28.691 - type: mrr_at_1000 value: 28.761 - type: mrr_at_3 value: 24.738 - type: mrr_at_5 value: 26.392 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.335 - type: ndcg_at_100 value: 36.913000000000004 - type: ndcg_at_1000 value: 39.300000000000004 - type: ndcg_at_3 value: 25.423000000000002 - type: ndcg_at_5 value: 28.262999999999998 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.379 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 11.214 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.288999999999998 - type: recall_at_10 value: 46.377 - type: recall_at_100 value: 71.53500000000001 - type: recall_at_1000 value: 88.947 - type: recall_at_3 value: 30.581999999999997 - type: recall_at_5 value: 37.354 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 21.795 - type: map_at_10 value: 37.614999999999995 - type: map_at_100 value: 40.037 - type: map_at_1000 value: 40.184999999999995 - type: map_at_3 value: 32.221 - type: map_at_5 value: 35.154999999999994 - type: mrr_at_1 value: 50.358000000000004 - type: mrr_at_10 value: 62.129 - type: mrr_at_100 value: 62.613 - type: mrr_at_1000 value: 62.62 - type: mrr_at_3 value: 59.272999999999996 - type: mrr_at_5 value: 61.138999999999996 - type: ndcg_at_1 value: 50.358000000000004 - type: ndcg_at_10 value: 48.362 - type: ndcg_at_100 value: 55.932 - type: ndcg_at_1000 value: 58.062999999999995 - type: ndcg_at_3 value: 42.111 - type: ndcg_at_5 value: 44.063 - type: precision_at_1 value: 50.358000000000004 - type: precision_at_10 value: 14.677999999999999 - type: precision_at_100 value: 2.2950000000000004 - type: precision_at_1000 value: 0.271 - type: precision_at_3 value: 31.77 - type: precision_at_5 value: 23.375 - type: recall_at_1 value: 21.795 - type: recall_at_10 value: 53.846000000000004 - type: recall_at_100 value: 78.952 - type: recall_at_1000 value: 90.41900000000001 - type: recall_at_3 value: 37.257 - type: recall_at_5 value: 44.661 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.728 - type: map_at_10 value: 22.691 - type: map_at_100 value: 31.734 - type: map_at_1000 value: 33.464 - type: map_at_3 value: 16.273 - type: map_at_5 value: 19.016 - type: mrr_at_1 value: 73.25 - type: mrr_at_10 value: 80.782 - type: mrr_at_100 value: 81.01899999999999 - type: mrr_at_1000 value: 81.021 - type: mrr_at_3 value: 79.583 - type: mrr_at_5 value: 80.146 - type: ndcg_at_1 value: 59.62499999999999 - type: ndcg_at_10 value: 46.304 - type: ndcg_at_100 value: 51.23 - type: ndcg_at_1000 value: 58.048 - type: ndcg_at_3 value: 51.541000000000004 - type: ndcg_at_5 value: 48.635 - type: precision_at_1 value: 73.25 - type: precision_at_10 value: 36.375 - type: precision_at_100 value: 11.53 - type: precision_at_1000 value: 2.23 - type: precision_at_3 value: 55.583000000000006 - type: precision_at_5 value: 47.15 - type: recall_at_1 value: 9.728 - type: recall_at_10 value: 28.793999999999997 - type: recall_at_100 value: 57.885 - type: recall_at_1000 value: 78.759 - type: recall_at_3 value: 17.79 - type: recall_at_5 value: 21.733 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.775 - type: f1 value: 41.89794273264891 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 85.378 - type: map_at_10 value: 91.51 - type: map_at_100 value: 91.666 - type: map_at_1000 value: 91.676 - type: map_at_3 value: 90.757 - type: map_at_5 value: 91.277 - type: mrr_at_1 value: 91.839 - type: mrr_at_10 value: 95.49 - type: mrr_at_100 value: 95.493 - type: mrr_at_1000 value: 95.493 - type: mrr_at_3 value: 95.345 - type: mrr_at_5 value: 95.47200000000001 - type: ndcg_at_1 value: 91.839 - type: ndcg_at_10 value: 93.806 - type: ndcg_at_100 value: 94.255 - type: ndcg_at_1000 value: 94.399 - type: ndcg_at_3 value: 93.027 - type: ndcg_at_5 value: 93.51 - type: precision_at_1 value: 91.839 - type: precision_at_10 value: 10.93 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 34.873 - type: precision_at_5 value: 21.44 - type: recall_at_1 value: 85.378 - type: recall_at_10 value: 96.814 - type: recall_at_100 value: 98.386 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 94.643 - type: recall_at_5 value: 95.976 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 32.190000000000005 - type: map_at_10 value: 53.605000000000004 - type: map_at_100 value: 55.550999999999995 - type: map_at_1000 value: 55.665 - type: map_at_3 value: 46.62 - type: map_at_5 value: 50.517999999999994 - type: mrr_at_1 value: 60.34 - type: mrr_at_10 value: 70.775 - type: mrr_at_100 value: 71.238 - type: mrr_at_1000 value: 71.244 - type: mrr_at_3 value: 68.72399999999999 - type: mrr_at_5 value: 69.959 - type: ndcg_at_1 value: 60.34 - type: ndcg_at_10 value: 63.226000000000006 - type: ndcg_at_100 value: 68.60300000000001 - type: ndcg_at_1000 value: 69.901 - type: ndcg_at_3 value: 58.048 - type: ndcg_at_5 value: 59.789 - type: precision_at_1 value: 60.34 - type: precision_at_10 value: 17.130000000000003 - type: precision_at_100 value: 2.29 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 38.323 - type: precision_at_5 value: 27.87 - type: recall_at_1 value: 32.190000000000005 - type: recall_at_10 value: 73.041 - type: recall_at_100 value: 91.31 - type: recall_at_1000 value: 98.104 - type: recall_at_3 value: 53.70399999999999 - type: recall_at_5 value: 62.358999999999995 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 43.511 - type: map_at_10 value: 58.15 - type: map_at_100 value: 58.95399999999999 - type: map_at_1000 value: 59.018 - type: map_at_3 value: 55.31700000000001 - type: map_at_5 value: 57.04900000000001 - type: mrr_at_1 value: 87.022 - type: mrr_at_10 value: 91.32000000000001 - type: mrr_at_100 value: 91.401 - type: mrr_at_1000 value: 91.403 - type: mrr_at_3 value: 90.77 - type: mrr_at_5 value: 91.156 - type: ndcg_at_1 value: 87.022 - type: ndcg_at_10 value: 68.183 - type: ndcg_at_100 value: 70.781 - type: ndcg_at_1000 value: 72.009 - type: ndcg_at_3 value: 64.334 - type: ndcg_at_5 value: 66.449 - type: precision_at_1 value: 87.022 - type: precision_at_10 value: 13.406 - type: precision_at_100 value: 1.542 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 39.023 - type: precision_at_5 value: 25.080000000000002 - type: recall_at_1 value: 43.511 - type: recall_at_10 value: 67.02900000000001 - type: recall_at_100 value: 77.11 - type: recall_at_1000 value: 85.294 - type: recall_at_3 value: 58.535000000000004 - type: recall_at_5 value: 62.70099999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.0996 - type: ap value: 87.86206089096373 - type: f1 value: 92.07554547510763 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.179 - type: map_at_10 value: 35.86 - type: map_at_100 value: 37.025999999999996 - type: map_at_1000 value: 37.068 - type: map_at_3 value: 31.921 - type: map_at_5 value: 34.172000000000004 - type: mrr_at_1 value: 23.926 - type: mrr_at_10 value: 36.525999999999996 - type: mrr_at_100 value: 37.627 - type: mrr_at_1000 value: 37.665 - type: mrr_at_3 value: 32.653 - type: mrr_at_5 value: 34.897 - type: ndcg_at_1 value: 23.910999999999998 - type: ndcg_at_10 value: 42.927 - type: ndcg_at_100 value: 48.464 - type: ndcg_at_1000 value: 49.533 - type: ndcg_at_3 value: 34.910000000000004 - type: ndcg_at_5 value: 38.937 - type: precision_at_1 value: 23.910999999999998 - type: precision_at_10 value: 6.758 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.838000000000001 - type: precision_at_5 value: 10.934000000000001 - type: recall_at_1 value: 23.179 - type: recall_at_10 value: 64.622 - type: recall_at_100 value: 90.135 - type: recall_at_1000 value: 98.301 - type: recall_at_3 value: 42.836999999999996 - type: recall_at_5 value: 52.512 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.59598723210215 - type: f1 value: 96.41913500001952 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.89557683538533 - type: f1 value: 63.379319722356264 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.93745796906524 - type: f1 value: 75.71616541785902 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.41223940820443 - type: f1 value: 81.2877893719078 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 35.03682528325662 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.942529406124 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.459949660460317 - type: mrr value: 32.70509582031616 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.497 - type: map_at_10 value: 13.843 - type: map_at_100 value: 17.713 - type: map_at_1000 value: 19.241 - type: map_at_3 value: 10.096 - type: map_at_5 value: 11.85 - type: mrr_at_1 value: 48.916 - type: mrr_at_10 value: 57.764 - type: mrr_at_100 value: 58.251 - type: mrr_at_1000 value: 58.282999999999994 - type: mrr_at_3 value: 55.623999999999995 - type: mrr_at_5 value: 57.018 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 36.945 - type: ndcg_at_100 value: 34.06 - type: ndcg_at_1000 value: 43.05 - type: ndcg_at_3 value: 41.738 - type: ndcg_at_5 value: 39.330999999999996 - type: precision_at_1 value: 48.916 - type: precision_at_10 value: 27.43 - type: precision_at_100 value: 8.616 - type: precision_at_1000 value: 2.155 - type: precision_at_3 value: 39.112 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.497 - type: recall_at_10 value: 18.163 - type: recall_at_100 value: 34.566 - type: recall_at_1000 value: 67.15 - type: recall_at_3 value: 11.100999999999999 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 31.916 - type: map_at_10 value: 48.123 - type: map_at_100 value: 49.103 - type: map_at_1000 value: 49.131 - type: map_at_3 value: 43.711 - type: map_at_5 value: 46.323 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 50.617999999999995 - type: mrr_at_100 value: 51.329 - type: mrr_at_1000 value: 51.348000000000006 - type: mrr_at_3 value: 47.010999999999996 - type: mrr_at_5 value: 49.175000000000004 - type: ndcg_at_1 value: 36.181999999999995 - type: ndcg_at_10 value: 56.077999999999996 - type: ndcg_at_100 value: 60.037 - type: ndcg_at_1000 value: 60.63499999999999 - type: ndcg_at_3 value: 47.859 - type: ndcg_at_5 value: 52.178999999999995 - type: precision_at_1 value: 36.181999999999995 - type: precision_at_10 value: 9.284 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 22.006999999999998 - type: precision_at_5 value: 15.695 - type: recall_at_1 value: 31.916 - type: recall_at_10 value: 77.771 - type: recall_at_100 value: 94.602 - type: recall_at_1000 value: 98.967 - type: recall_at_3 value: 56.528 - type: recall_at_5 value: 66.527 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.486 - type: map_at_10 value: 85.978 - type: map_at_100 value: 86.587 - type: map_at_1000 value: 86.598 - type: map_at_3 value: 83.04899999999999 - type: map_at_5 value: 84.857 - type: mrr_at_1 value: 82.32000000000001 - type: mrr_at_10 value: 88.64 - type: mrr_at_100 value: 88.702 - type: mrr_at_1000 value: 88.702 - type: mrr_at_3 value: 87.735 - type: mrr_at_5 value: 88.36 - type: ndcg_at_1 value: 82.34 - type: ndcg_at_10 value: 89.67 - type: ndcg_at_100 value: 90.642 - type: ndcg_at_1000 value: 90.688 - type: ndcg_at_3 value: 86.932 - type: ndcg_at_5 value: 88.408 - type: precision_at_1 value: 82.34 - type: precision_at_10 value: 13.675999999999998 - type: precision_at_100 value: 1.544 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.24 - type: precision_at_5 value: 25.068 - type: recall_at_1 value: 71.486 - type: recall_at_10 value: 96.844 - type: recall_at_100 value: 99.843 - type: recall_at_1000 value: 99.996 - type: recall_at_3 value: 88.92099999999999 - type: recall_at_5 value: 93.215 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.75758437908334 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 68.03497914092789 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.808 - type: map_at_10 value: 16.059 - type: map_at_100 value: 19.048000000000002 - type: map_at_1000 value: 19.43 - type: map_at_3 value: 10.953 - type: map_at_5 value: 13.363 - type: mrr_at_1 value: 28.7 - type: mrr_at_10 value: 42.436 - type: mrr_at_100 value: 43.599 - type: mrr_at_1000 value: 43.62 - type: mrr_at_3 value: 38.45 - type: mrr_at_5 value: 40.89 - type: ndcg_at_1 value: 28.7 - type: ndcg_at_10 value: 26.346000000000004 - type: ndcg_at_100 value: 36.758 - type: ndcg_at_1000 value: 42.113 - type: ndcg_at_3 value: 24.254 - type: ndcg_at_5 value: 21.506 - type: precision_at_1 value: 28.7 - type: precision_at_10 value: 13.969999999999999 - type: precision_at_100 value: 2.881 - type: precision_at_1000 value: 0.414 - type: precision_at_3 value: 22.933 - type: precision_at_5 value: 19.220000000000002 - type: recall_at_1 value: 5.808 - type: recall_at_10 value: 28.310000000000002 - type: recall_at_100 value: 58.475 - type: recall_at_1000 value: 84.072 - type: recall_at_3 value: 13.957 - type: recall_at_5 value: 19.515 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.39274129958557 - type: cos_sim_spearman value: 79.78021235170053 - type: euclidean_pearson value: 79.35335401300166 - type: euclidean_spearman value: 79.7271870968275 - type: manhattan_pearson value: 79.35256263340601 - type: manhattan_spearman value: 79.76036386976321 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.99130429246708 - type: cos_sim_spearman value: 73.88322811171203 - type: euclidean_pearson value: 80.7569419170376 - type: euclidean_spearman value: 73.82542155409597 - type: manhattan_pearson value: 80.79468183847625 - type: manhattan_spearman value: 73.87027144047784 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.88548789489907 - type: cos_sim_spearman value: 85.07535893847255 - type: euclidean_pearson value: 84.6637222061494 - type: euclidean_spearman value: 85.14200626702456 - type: manhattan_pearson value: 84.75327892344734 - type: manhattan_spearman value: 85.24406181838596 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.88140039325008 - type: cos_sim_spearman value: 79.61211268112362 - type: euclidean_pearson value: 81.29639728816458 - type: euclidean_spearman value: 79.51284578041442 - type: manhattan_pearson value: 81.3381797137111 - type: manhattan_spearman value: 79.55683684039808 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.16716737270485 - type: cos_sim_spearman value: 86.14823841857738 - type: euclidean_pearson value: 85.36325733440725 - type: euclidean_spearman value: 86.04919691402029 - type: manhattan_pearson value: 85.3147511385052 - type: manhattan_spearman value: 86.00676205857764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.34266645861588 - type: cos_sim_spearman value: 81.59914035005882 - type: euclidean_pearson value: 81.15053076245988 - type: euclidean_spearman value: 81.52776915798489 - type: manhattan_pearson value: 81.1819647418673 - type: manhattan_spearman value: 81.57479527353556 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.38263326821439 - type: cos_sim_spearman value: 89.10946308202642 - type: euclidean_pearson value: 88.87831312540068 - type: euclidean_spearman value: 89.03615865973664 - type: manhattan_pearson value: 88.79835539970384 - type: manhattan_spearman value: 88.9766156339753 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 70.1574915581685 - type: cos_sim_spearman value: 70.59144980004054 - type: euclidean_pearson value: 71.43246306918755 - type: euclidean_spearman value: 70.5544189562984 - type: manhattan_pearson value: 71.4071414609503 - type: manhattan_spearman value: 70.31799126163712 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.36215796635351 - type: cos_sim_spearman value: 83.07276756467208 - type: euclidean_pearson value: 83.06690453635584 - type: euclidean_spearman value: 82.9635366303289 - type: manhattan_pearson value: 83.04994049700815 - type: manhattan_spearman value: 82.98120125356036 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.92530011616722 - type: mrr value: 96.21826793395421 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 65.75 - type: map_at_10 value: 77.701 - type: map_at_100 value: 78.005 - type: map_at_1000 value: 78.006 - type: map_at_3 value: 75.48 - type: map_at_5 value: 76.927 - type: mrr_at_1 value: 68.333 - type: mrr_at_10 value: 78.511 - type: mrr_at_100 value: 78.704 - type: mrr_at_1000 value: 78.704 - type: mrr_at_3 value: 77 - type: mrr_at_5 value: 78.083 - type: ndcg_at_1 value: 68.333 - type: ndcg_at_10 value: 82.42699999999999 - type: ndcg_at_100 value: 83.486 - type: ndcg_at_1000 value: 83.511 - type: ndcg_at_3 value: 78.96300000000001 - type: ndcg_at_5 value: 81.028 - type: precision_at_1 value: 68.333 - type: precision_at_10 value: 10.667 - type: precision_at_100 value: 1.127 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 20.133000000000003 - type: recall_at_1 value: 65.75 - type: recall_at_10 value: 95.578 - type: recall_at_100 value: 99.833 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 86.506 - type: recall_at_5 value: 91.75 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.75247524752476 - type: cos_sim_ap value: 94.16065078045173 - type: cos_sim_f1 value: 87.22986247544205 - type: cos_sim_precision value: 85.71428571428571 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.74554455445545 - type: dot_ap value: 93.90633887037264 - type: dot_f1 value: 86.9873417721519 - type: dot_precision value: 88.1025641025641 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.75247524752476 - type: euclidean_ap value: 94.17466319018055 - type: euclidean_f1 value: 87.3405299313052 - type: euclidean_precision value: 85.74181117533719 - type: euclidean_recall value: 89 - type: manhattan_accuracy value: 99.75445544554455 - type: manhattan_ap value: 94.27688371923577 - type: manhattan_f1 value: 87.74002954209749 - type: manhattan_precision value: 86.42095053346266 - type: manhattan_recall value: 89.1 - type: max_accuracy value: 99.75445544554455 - type: max_ap value: 94.27688371923577 - type: max_f1 value: 87.74002954209749 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 71.26500637517056 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 39.17507906280528 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.4848744828509 - type: mrr value: 53.33678168236992 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.599864323827887 - type: cos_sim_spearman value: 30.91116204665598 - type: dot_pearson value: 30.82637894269936 - type: dot_spearman value: 30.957573868416066 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.23600000000000002 - type: map_at_10 value: 1.892 - type: map_at_100 value: 11.586 - type: map_at_1000 value: 27.761999999999997 - type: map_at_3 value: 0.653 - type: map_at_5 value: 1.028 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 94 - type: mrr_at_100 value: 94 - type: mrr_at_1000 value: 94 - type: mrr_at_3 value: 94 - type: mrr_at_5 value: 94 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 77.48899999999999 - type: ndcg_at_100 value: 60.141 - type: ndcg_at_1000 value: 54.228 - type: ndcg_at_3 value: 82.358 - type: ndcg_at_5 value: 80.449 - type: precision_at_1 value: 88 - type: precision_at_10 value: 82.19999999999999 - type: precision_at_100 value: 61.760000000000005 - type: precision_at_1000 value: 23.684 - type: precision_at_3 value: 88 - type: precision_at_5 value: 85.6 - type: recall_at_1 value: 0.23600000000000002 - type: recall_at_10 value: 2.117 - type: recall_at_100 value: 14.985000000000001 - type: recall_at_1000 value: 51.107 - type: recall_at_3 value: 0.688 - type: recall_at_5 value: 1.1039999999999999 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 2.3040000000000003 - type: map_at_10 value: 9.025 - type: map_at_100 value: 15.312999999999999 - type: map_at_1000 value: 16.954 - type: map_at_3 value: 4.981 - type: map_at_5 value: 6.32 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 39.835 - type: mrr_at_100 value: 40.8 - type: mrr_at_1000 value: 40.8 - type: mrr_at_3 value: 35.034 - type: mrr_at_5 value: 37.687 - type: ndcg_at_1 value: 22.448999999999998 - type: ndcg_at_10 value: 22.545 - type: ndcg_at_100 value: 35.931999999999995 - type: ndcg_at_1000 value: 47.665 - type: ndcg_at_3 value: 23.311 - type: ndcg_at_5 value: 22.421 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 20.408 - type: precision_at_100 value: 7.815999999999999 - type: precision_at_1000 value: 1.553 - type: precision_at_3 value: 25.169999999999998 - type: precision_at_5 value: 23.265 - type: recall_at_1 value: 2.3040000000000003 - type: recall_at_10 value: 15.693999999999999 - type: recall_at_100 value: 48.917 - type: recall_at_1000 value: 84.964 - type: recall_at_3 value: 6.026 - type: recall_at_5 value: 9.066 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 82.6074 - type: ap value: 23.187467098602013 - type: f1 value: 65.36829506379657 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 63.16355404640635 - type: f1 value: 63.534725639863346 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.91004094411276 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.55301901412649 - type: cos_sim_ap value: 75.25312618556728 - type: cos_sim_f1 value: 68.76561719140429 - type: cos_sim_precision value: 65.3061224489796 - type: cos_sim_recall value: 72.61213720316623 - type: dot_accuracy value: 86.29671574178936 - type: dot_ap value: 75.11910195501207 - type: dot_f1 value: 68.44048376830045 - type: dot_precision value: 66.12546125461255 - type: dot_recall value: 70.92348284960423 - type: euclidean_accuracy value: 86.5828217202122 - type: euclidean_ap value: 75.22986344900924 - type: euclidean_f1 value: 68.81267797449549 - type: euclidean_precision value: 64.8238861674831 - type: euclidean_recall value: 73.3245382585752 - type: manhattan_accuracy value: 86.61262442629791 - type: manhattan_ap value: 75.24401608557328 - type: manhattan_f1 value: 68.80473982483257 - type: manhattan_precision value: 67.21187720181177 - type: manhattan_recall value: 70.47493403693932 - type: max_accuracy value: 86.61262442629791 - type: max_ap value: 75.25312618556728 - type: max_f1 value: 68.81267797449549 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.10688089416696 - type: cos_sim_ap value: 84.17862178779863 - type: cos_sim_f1 value: 76.17305208781748 - type: cos_sim_precision value: 71.31246641590543 - type: cos_sim_recall value: 81.74468740375731 - type: dot_accuracy value: 88.1844995536927 - type: dot_ap value: 84.33816725235876 - type: dot_f1 value: 76.43554032918746 - type: dot_precision value: 74.01557767200346 - type: dot_recall value: 79.0190945488143 - type: euclidean_accuracy value: 88.07001203089223 - type: euclidean_ap value: 84.12267000814985 - type: euclidean_f1 value: 76.12232600180778 - type: euclidean_precision value: 74.50604541433205 - type: euclidean_recall value: 77.81028641823221 - type: manhattan_accuracy value: 88.06419063142779 - type: manhattan_ap value: 84.11648917164187 - type: manhattan_f1 value: 76.20579953925474 - type: manhattan_precision value: 72.56772755762935 - type: manhattan_recall value: 80.22790267939637 - type: max_accuracy value: 88.1844995536927 - type: max_ap value: 84.33816725235876 - type: max_f1 value: 76.43554032918746 --- <!-- **English** | 中文 --> # gte-large-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with sentence-transformers: Use with : ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000 - CPT: max_len 512, lr 5e-5, batch_size 28672, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 409 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
27
- "model_explanation_gemini": "Generates English sentence embeddings for tasks like classification, retrieval, clustering, and similarity scoring, achieving strong performance on MTEB benchmarks.\n\nModel Features: \n- **Tasks**: Sentence embeddings, classification, retrieval, clustering, reranking, STS (semantic textual similarity). \n- **Benchmarks**: Evaluated on MTEB (Massive Text Embedding Benchmark) with metrics like accuracy, F1, MAP, NDCG, and Pearson correlation. \n- **Performance**: High scores on Amazon polarity"
 
 
 
 
 
28
  }
 
24
  "region:us"
25
  ],
26
  "description": "--- datasets: - allenai/c4 library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.01492537313432 - type: ap value: 35.05341696659522 - type: f1 value: 66.71270310883853 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.97189999999999 - type: ap value: 90.5952493948908 - type: f1 value: 93.95848137716877 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 54.196 - type: f1 value: 53.80122334012787 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 47.297 - type: map_at_10 value: 64.303 - type: map_at_100 value: 64.541 - type: map_at_1000 value: 64.541 - type: map_at_3 value: 60.728 - type: map_at_5 value: 63.114000000000004 - type: mrr_at_1 value: 48.435 - type: mrr_at_10 value: 64.657 - type: mrr_at_100 value: 64.901 - type: mrr_at_1000 value: 64.901 - type: mrr_at_3 value: 61.06 - type: mrr_at_5 value: 63.514 - type: ndcg_at_1 value: 47.297 - type: ndcg_at_10 value: 72.107 - type: ndcg_at_100 value: 72.963 - type: ndcg_at_1000 value: 72.963 - type: ndcg_at_3 value: 65.063 - type: ndcg_at_5 value: 69.352 - type: precision_at_1 value: 47.297 - type: precision_at_10 value: 9.623 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 25.865 - type: precision_at_5 value: 17.596 - type: recall_at_1 value: 47.297 - type: recall_at_10 value: 96.23 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 77.596 - type: recall_at_5 value: 87.98 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.467787861077475 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.39198391914257 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.12794820591384 - type: mrr value: 75.9331442641692 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.85062993863319 - type: cos_sim_spearman value: 85.39049989733459 - type: euclidean_pearson value: 86.00222680278333 - type: euclidean_spearman value: 85.45556162077396 - type: manhattan_pearson value: 85.88769871785621 - type: manhattan_spearman value: 85.11760211290839 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.32792207792208 - type: f1 value: 87.29132945999555 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.5779328301945 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.94425623865118 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.978 - type: map_at_10 value: 44.45 - type: map_at_100 value: 46.19 - type: map_at_1000 value: 46.303 - type: map_at_3 value: 40.849000000000004 - type: map_at_5 value: 42.55 - type: mrr_at_1 value: 40.629 - type: mrr_at_10 value: 50.848000000000006 - type: mrr_at_100 value: 51.669 - type: mrr_at_1000 value: 51.705 - type: mrr_at_3 value: 47.997 - type: mrr_at_5 value: 49.506 - type: ndcg_at_1 value: 40.629 - type: ndcg_at_10 value: 51.102000000000004 - type: ndcg_at_100 value: 57.159000000000006 - type: ndcg_at_1000 value: 58.669000000000004 - type: ndcg_at_3 value: 45.738 - type: ndcg_at_5 value: 47.632999999999996 - type: precision_at_1 value: 40.629 - type: precision_at_10 value: 9.700000000000001 - type: precision_at_100 value: 1.5970000000000002 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.698 - type: precision_at_5 value: 15.393 - type: recall_at_1 value: 32.978 - type: recall_at_10 value: 63.711 - type: recall_at_100 value: 88.39399999999999 - type: recall_at_1000 value: 97.513 - type: recall_at_3 value: 48.025 - type: recall_at_5 value: 53.52 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 30.767 - type: map_at_10 value: 42.195 - type: map_at_100 value: 43.541999999999994 - type: map_at_1000 value: 43.673 - type: map_at_3 value: 38.561 - type: map_at_5 value: 40.532000000000004 - type: mrr_at_1 value: 38.79 - type: mrr_at_10 value: 48.021 - type: mrr_at_100 value: 48.735 - type: mrr_at_1000 value: 48.776 - type: mrr_at_3 value: 45.594 - type: mrr_at_5 value: 46.986 - type: ndcg_at_1 value: 38.79 - type: ndcg_at_10 value: 48.468 - type: ndcg_at_100 value: 53.037 - type: ndcg_at_1000 value: 55.001999999999995 - type: ndcg_at_3 value: 43.409 - type: ndcg_at_5 value: 45.654 - type: precision_at_1 value: 38.79 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.201 - type: precision_at_3 value: 21.21 - type: precision_at_5 value: 15.171999999999999 - type: recall_at_1 value: 30.767 - type: recall_at_10 value: 60.118 - type: recall_at_100 value: 79.271 - type: recall_at_1000 value: 91.43299999999999 - type: recall_at_3 value: 45.36 - type: recall_at_5 value: 51.705 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 40.007 - type: map_at_10 value: 53.529 - type: map_at_100 value: 54.602 - type: map_at_1000 value: 54.647 - type: map_at_3 value: 49.951 - type: map_at_5 value: 52.066 - type: mrr_at_1 value: 45.705 - type: mrr_at_10 value: 56.745000000000005 - type: mrr_at_100 value: 57.43899999999999 - type: mrr_at_1000 value: 57.462999999999994 - type: mrr_at_3 value: 54.25299999999999 - type: mrr_at_5 value: 55.842000000000006 - type: ndcg_at_1 value: 45.705 - type: ndcg_at_10 value: 59.809 - type: ndcg_at_100 value: 63.837999999999994 - type: ndcg_at_1000 value: 64.729 - type: ndcg_at_3 value: 53.994 - type: ndcg_at_5 value: 57.028 - type: precision_at_1 value: 45.705 - type: precision_at_10 value: 9.762 - type: precision_at_100 value: 1.275 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.368000000000002 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 40.007 - type: recall_at_10 value: 75.017 - type: recall_at_100 value: 91.99000000000001 - type: recall_at_1000 value: 98.265 - type: recall_at_3 value: 59.704 - type: recall_at_5 value: 67.109 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 26.639000000000003 - type: map_at_10 value: 35.926 - type: map_at_100 value: 37.126999999999995 - type: map_at_1000 value: 37.202 - type: map_at_3 value: 32.989000000000004 - type: map_at_5 value: 34.465 - type: mrr_at_1 value: 28.475 - type: mrr_at_10 value: 37.7 - type: mrr_at_100 value: 38.753 - type: mrr_at_1000 value: 38.807 - type: mrr_at_3 value: 35.066 - type: mrr_at_5 value: 36.512 - type: ndcg_at_1 value: 28.475 - type: ndcg_at_10 value: 41.245 - type: ndcg_at_100 value: 46.814 - type: ndcg_at_1000 value: 48.571 - type: ndcg_at_3 value: 35.528999999999996 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 28.475 - type: precision_at_10 value: 6.497 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 15.065999999999999 - type: precision_at_5 value: 10.599 - type: recall_at_1 value: 26.639000000000003 - type: recall_at_10 value: 55.759 - type: recall_at_100 value: 80.913 - type: recall_at_1000 value: 93.929 - type: recall_at_3 value: 40.454 - type: recall_at_5 value: 46.439 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.767999999999999 - type: map_at_10 value: 24.811 - type: map_at_100 value: 26.064999999999998 - type: map_at_1000 value: 26.186999999999998 - type: map_at_3 value: 21.736 - type: map_at_5 value: 23.283 - type: mrr_at_1 value: 19.527 - type: mrr_at_10 value: 29.179 - type: mrr_at_100 value: 30.153999999999996 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.223000000000003 - type: mrr_at_5 value: 27.733999999999998 - type: ndcg_at_1 value: 19.527 - type: ndcg_at_10 value: 30.786 - type: ndcg_at_100 value: 36.644 - type: ndcg_at_1000 value: 39.440999999999995 - type: ndcg_at_3 value: 24.958 - type: ndcg_at_5 value: 27.392 - type: precision_at_1 value: 19.527 - type: precision_at_10 value: 5.995 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.520999999999999 - type: precision_at_5 value: 9.129 - type: recall_at_1 value: 15.767999999999999 - type: recall_at_10 value: 44.824000000000005 - type: recall_at_100 value: 70.186 - type: recall_at_1000 value: 89.934 - type: recall_at_3 value: 28.607 - type: recall_at_5 value: 34.836 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 31.952 - type: map_at_10 value: 44.438 - type: map_at_100 value: 45.778 - type: map_at_1000 value: 45.883 - type: map_at_3 value: 41.044000000000004 - type: map_at_5 value: 42.986000000000004 - type: mrr_at_1 value: 39.172000000000004 - type: mrr_at_10 value: 49.76 - type: mrr_at_100 value: 50.583999999999996 - type: mrr_at_1000 value: 50.621 - type: mrr_at_3 value: 47.353 - type: mrr_at_5 value: 48.739 - type: ndcg_at_1 value: 39.172000000000004 - type: ndcg_at_10 value: 50.760000000000005 - type: ndcg_at_100 value: 56.084 - type: ndcg_at_1000 value: 57.865 - type: ndcg_at_3 value: 45.663 - type: ndcg_at_5 value: 48.178 - type: precision_at_1 value: 39.172000000000004 - type: precision_at_10 value: 9.22 - type: precision_at_100 value: 1.387 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 21.976000000000003 - type: precision_at_5 value: 15.457 - type: recall_at_1 value: 31.952 - type: recall_at_10 value: 63.900999999999996 - type: recall_at_100 value: 85.676 - type: recall_at_1000 value: 97.03699999999999 - type: recall_at_3 value: 49.781 - type: recall_at_5 value: 56.330000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 25.332 - type: map_at_10 value: 36.874 - type: map_at_100 value: 38.340999999999994 - type: map_at_1000 value: 38.452 - type: map_at_3 value: 33.068 - type: map_at_5 value: 35.324 - type: mrr_at_1 value: 30.822 - type: mrr_at_10 value: 41.641 - type: mrr_at_100 value: 42.519 - type: mrr_at_1000 value: 42.573 - type: mrr_at_3 value: 38.413000000000004 - type: mrr_at_5 value: 40.542 - type: ndcg_at_1 value: 30.822 - type: ndcg_at_10 value: 43.414 - type: ndcg_at_100 value: 49.196 - type: ndcg_at_1000 value: 51.237 - type: ndcg_at_3 value: 37.230000000000004 - type: ndcg_at_5 value: 40.405 - type: precision_at_1 value: 30.822 - type: precision_at_10 value: 8.379 - type: precision_at_100 value: 1.315 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 18.417 - type: precision_at_5 value: 13.744 - type: recall_at_1 value: 25.332 - type: recall_at_10 value: 57.774 - type: recall_at_100 value: 82.071 - type: recall_at_1000 value: 95.60600000000001 - type: recall_at_3 value: 40.722 - type: recall_at_5 value: 48.754999999999995 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.91033333333334 - type: map_at_10 value: 36.23225000000001 - type: map_at_100 value: 37.55766666666667 - type: map_at_1000 value: 37.672583333333336 - type: map_at_3 value: 32.95666666666667 - type: map_at_5 value: 34.73375 - type: mrr_at_1 value: 30.634 - type: mrr_at_10 value: 40.19449999999999 - type: mrr_at_100 value: 41.099250000000005 - type: mrr_at_1000 value: 41.15091666666667 - type: mrr_at_3 value: 37.4615 - type: mrr_at_5 value: 39.00216666666667 - type: ndcg_at_1 value: 30.634 - type: ndcg_at_10 value: 42.162166666666664 - type: ndcg_at_100 value: 47.60708333333333 - type: ndcg_at_1000 value: 49.68616666666666 - type: ndcg_at_3 value: 36.60316666666666 - type: ndcg_at_5 value: 39.15616666666668 - type: precision_at_1 value: 30.634 - type: precision_at_10 value: 7.6193333333333335 - type: precision_at_100 value: 1.2198333333333333 - type: precision_at_1000 value: 0.15975000000000003 - type: precision_at_3 value: 17.087 - type: precision_at_5 value: 12.298333333333334 - type: recall_at_1 value: 25.91033333333334 - type: recall_at_10 value: 55.67300000000001 - type: recall_at_100 value: 79.20608333333334 - type: recall_at_1000 value: 93.34866666666667 - type: recall_at_3 value: 40.34858333333333 - type: recall_at_5 value: 46.834083333333325 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.006 - type: map_at_10 value: 32.177 - type: map_at_100 value: 33.324999999999996 - type: map_at_1000 value: 33.419 - type: map_at_3 value: 29.952 - type: map_at_5 value: 31.095 - type: mrr_at_1 value: 28.066999999999997 - type: mrr_at_10 value: 34.995 - type: mrr_at_100 value: 35.978 - type: mrr_at_1000 value: 36.042 - type: mrr_at_3 value: 33.103 - type: mrr_at_5 value: 34.001 - type: ndcg_at_1 value: 28.066999999999997 - type: ndcg_at_10 value: 36.481 - type: ndcg_at_100 value: 42.022999999999996 - type: ndcg_at_1000 value: 44.377 - type: ndcg_at_3 value: 32.394 - type: ndcg_at_5 value: 34.108 - type: precision_at_1 value: 28.066999999999997 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 13.804 - type: precision_at_5 value: 9.508999999999999 - type: recall_at_1 value: 25.006 - type: recall_at_10 value: 46.972 - type: recall_at_100 value: 72.138 - type: recall_at_1000 value: 89.479 - type: recall_at_3 value: 35.793 - type: recall_at_5 value: 39.947 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.07 - type: map_at_10 value: 24.447 - type: map_at_100 value: 25.685999999999996 - type: map_at_1000 value: 25.813999999999997 - type: map_at_3 value: 21.634 - type: map_at_5 value: 23.133 - type: mrr_at_1 value: 19.580000000000002 - type: mrr_at_10 value: 28.127999999999997 - type: mrr_at_100 value: 29.119 - type: mrr_at_1000 value: 29.192 - type: mrr_at_3 value: 25.509999999999998 - type: mrr_at_5 value: 26.878 - type: ndcg_at_1 value: 19.580000000000002 - type: ndcg_at_10 value: 29.804000000000002 - type: ndcg_at_100 value: 35.555 - type: ndcg_at_1000 value: 38.421 - type: ndcg_at_3 value: 24.654999999999998 - type: ndcg_at_5 value: 26.881 - type: precision_at_1 value: 19.580000000000002 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 12.033000000000001 - type: precision_at_5 value: 8.871 - type: recall_at_1 value: 16.07 - type: recall_at_10 value: 42.364000000000004 - type: recall_at_100 value: 68.01899999999999 - type: recall_at_1000 value: 88.122 - type: recall_at_3 value: 27.846 - type: recall_at_5 value: 33.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 26.365 - type: map_at_10 value: 36.591 - type: map_at_100 value: 37.730000000000004 - type: map_at_1000 value: 37.84 - type: map_at_3 value: 33.403 - type: map_at_5 value: 35.272999999999996 - type: mrr_at_1 value: 30.503999999999998 - type: mrr_at_10 value: 39.940999999999995 - type: mrr_at_100 value: 40.818 - type: mrr_at_1000 value: 40.876000000000005 - type: mrr_at_3 value: 37.065 - type: mrr_at_5 value: 38.814 - type: ndcg_at_1 value: 30.503999999999998 - type: ndcg_at_10 value: 42.185 - type: ndcg_at_100 value: 47.416000000000004 - type: ndcg_at_1000 value: 49.705 - type: ndcg_at_3 value: 36.568 - type: ndcg_at_5 value: 39.416000000000004 - type: precision_at_1 value: 30.503999999999998 - type: precision_at_10 value: 7.276000000000001 - type: precision_at_100 value: 1.118 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 26.365 - type: recall_at_10 value: 55.616 - type: recall_at_100 value: 78.129 - type: recall_at_1000 value: 93.95599999999999 - type: recall_at_3 value: 40.686 - type: recall_at_5 value: 47.668 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.750999999999998 - type: map_at_10 value: 33.446 - type: map_at_100 value: 35.235 - type: map_at_1000 value: 35.478 - type: map_at_3 value: 29.358 - type: map_at_5 value: 31.525 - type: mrr_at_1 value: 27.668 - type: mrr_at_10 value: 37.694 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.779 - type: mrr_at_3 value: 34.223 - type: mrr_at_5 value: 36.08 - type: ndcg_at_1 value: 27.668 - type: ndcg_at_10 value: 40.557 - type: ndcg_at_100 value: 46.605999999999995 - type: ndcg_at_1000 value: 48.917 - type: ndcg_at_3 value: 33.677 - type: ndcg_at_5 value: 36.85 - type: precision_at_1 value: 27.668 - type: precision_at_10 value: 8.3 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.253 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 12.292 - type: recall_at_1 value: 22.750999999999998 - type: recall_at_10 value: 55.643 - type: recall_at_100 value: 82.151 - type: recall_at_1000 value: 95.963 - type: recall_at_3 value: 36.623 - type: recall_at_5 value: 44.708 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.288999999999998 - type: map_at_10 value: 25.903 - type: map_at_100 value: 27.071 - type: map_at_1000 value: 27.173000000000002 - type: map_at_3 value: 22.935 - type: map_at_5 value: 24.573 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.682000000000002 - type: mrr_at_100 value: 28.691 - type: mrr_at_1000 value: 28.761 - type: mrr_at_3 value: 24.738 - type: mrr_at_5 value: 26.392 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.335 - type: ndcg_at_100 value: 36.913000000000004 - type: ndcg_at_1000 value: 39.300000000000004 - type: ndcg_at_3 value: 25.423000000000002 - type: ndcg_at_5 value: 28.262999999999998 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.379 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 11.214 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.288999999999998 - type: recall_at_10 value: 46.377 - type: recall_at_100 value: 71.53500000000001 - type: recall_at_1000 value: 88.947 - type: recall_at_3 value: 30.581999999999997 - type: recall_at_5 value: 37.354 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 21.795 - type: map_at_10 value: 37.614999999999995 - type: map_at_100 value: 40.037 - type: map_at_1000 value: 40.184999999999995 - type: map_at_3 value: 32.221 - type: map_at_5 value: 35.154999999999994 - type: mrr_at_1 value: 50.358000000000004 - type: mrr_at_10 value: 62.129 - type: mrr_at_100 value: 62.613 - type: mrr_at_1000 value: 62.62 - type: mrr_at_3 value: 59.272999999999996 - type: mrr_at_5 value: 61.138999999999996 - type: ndcg_at_1 value: 50.358000000000004 - type: ndcg_at_10 value: 48.362 - type: ndcg_at_100 value: 55.932 - type: ndcg_at_1000 value: 58.062999999999995 - type: ndcg_at_3 value: 42.111 - type: ndcg_at_5 value: 44.063 - type: precision_at_1 value: 50.358000000000004 - type: precision_at_10 value: 14.677999999999999 - type: precision_at_100 value: 2.2950000000000004 - type: precision_at_1000 value: 0.271 - type: precision_at_3 value: 31.77 - type: precision_at_5 value: 23.375 - type: recall_at_1 value: 21.795 - type: recall_at_10 value: 53.846000000000004 - type: recall_at_100 value: 78.952 - type: recall_at_1000 value: 90.41900000000001 - type: recall_at_3 value: 37.257 - type: recall_at_5 value: 44.661 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.728 - type: map_at_10 value: 22.691 - type: map_at_100 value: 31.734 - type: map_at_1000 value: 33.464 - type: map_at_3 value: 16.273 - type: map_at_5 value: 19.016 - type: mrr_at_1 value: 73.25 - type: mrr_at_10 value: 80.782 - type: mrr_at_100 value: 81.01899999999999 - type: mrr_at_1000 value: 81.021 - type: mrr_at_3 value: 79.583 - type: mrr_at_5 value: 80.146 - type: ndcg_at_1 value: 59.62499999999999 - type: ndcg_at_10 value: 46.304 - type: ndcg_at_100 value: 51.23 - type: ndcg_at_1000 value: 58.048 - type: ndcg_at_3 value: 51.541000000000004 - type: ndcg_at_5 value: 48.635 - type: precision_at_1 value: 73.25 - type: precision_at_10 value: 36.375 - type: precision_at_100 value: 11.53 - type: precision_at_1000 value: 2.23 - type: precision_at_3 value: 55.583000000000006 - type: precision_at_5 value: 47.15 - type: recall_at_1 value: 9.728 - type: recall_at_10 value: 28.793999999999997 - type: recall_at_100 value: 57.885 - type: recall_at_1000 value: 78.759 - type: recall_at_3 value: 17.79 - type: recall_at_5 value: 21.733 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.775 - type: f1 value: 41.89794273264891 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 85.378 - type: map_at_10 value: 91.51 - type: map_at_100 value: 91.666 - type: map_at_1000 value: 91.676 - type: map_at_3 value: 90.757 - type: map_at_5 value: 91.277 - type: mrr_at_1 value: 91.839 - type: mrr_at_10 value: 95.49 - type: mrr_at_100 value: 95.493 - type: mrr_at_1000 value: 95.493 - type: mrr_at_3 value: 95.345 - type: mrr_at_5 value: 95.47200000000001 - type: ndcg_at_1 value: 91.839 - type: ndcg_at_10 value: 93.806 - type: ndcg_at_100 value: 94.255 - type: ndcg_at_1000 value: 94.399 - type: ndcg_at_3 value: 93.027 - type: ndcg_at_5 value: 93.51 - type: precision_at_1 value: 91.839 - type: precision_at_10 value: 10.93 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 34.873 - type: precision_at_5 value: 21.44 - type: recall_at_1 value: 85.378 - type: recall_at_10 value: 96.814 - type: recall_at_100 value: 98.386 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 94.643 - type: recall_at_5 value: 95.976 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 32.190000000000005 - type: map_at_10 value: 53.605000000000004 - type: map_at_100 value: 55.550999999999995 - type: map_at_1000 value: 55.665 - type: map_at_3 value: 46.62 - type: map_at_5 value: 50.517999999999994 - type: mrr_at_1 value: 60.34 - type: mrr_at_10 value: 70.775 - type: mrr_at_100 value: 71.238 - type: mrr_at_1000 value: 71.244 - type: mrr_at_3 value: 68.72399999999999 - type: mrr_at_5 value: 69.959 - type: ndcg_at_1 value: 60.34 - type: ndcg_at_10 value: 63.226000000000006 - type: ndcg_at_100 value: 68.60300000000001 - type: ndcg_at_1000 value: 69.901 - type: ndcg_at_3 value: 58.048 - type: ndcg_at_5 value: 59.789 - type: precision_at_1 value: 60.34 - type: precision_at_10 value: 17.130000000000003 - type: precision_at_100 value: 2.29 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 38.323 - type: precision_at_5 value: 27.87 - type: recall_at_1 value: 32.190000000000005 - type: recall_at_10 value: 73.041 - type: recall_at_100 value: 91.31 - type: recall_at_1000 value: 98.104 - type: recall_at_3 value: 53.70399999999999 - type: recall_at_5 value: 62.358999999999995 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 43.511 - type: map_at_10 value: 58.15 - type: map_at_100 value: 58.95399999999999 - type: map_at_1000 value: 59.018 - type: map_at_3 value: 55.31700000000001 - type: map_at_5 value: 57.04900000000001 - type: mrr_at_1 value: 87.022 - type: mrr_at_10 value: 91.32000000000001 - type: mrr_at_100 value: 91.401 - type: mrr_at_1000 value: 91.403 - type: mrr_at_3 value: 90.77 - type: mrr_at_5 value: 91.156 - type: ndcg_at_1 value: 87.022 - type: ndcg_at_10 value: 68.183 - type: ndcg_at_100 value: 70.781 - type: ndcg_at_1000 value: 72.009 - type: ndcg_at_3 value: 64.334 - type: ndcg_at_5 value: 66.449 - type: precision_at_1 value: 87.022 - type: precision_at_10 value: 13.406 - type: precision_at_100 value: 1.542 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 39.023 - type: precision_at_5 value: 25.080000000000002 - type: recall_at_1 value: 43.511 - type: recall_at_10 value: 67.02900000000001 - type: recall_at_100 value: 77.11 - type: recall_at_1000 value: 85.294 - type: recall_at_3 value: 58.535000000000004 - type: recall_at_5 value: 62.70099999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.0996 - type: ap value: 87.86206089096373 - type: f1 value: 92.07554547510763 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.179 - type: map_at_10 value: 35.86 - type: map_at_100 value: 37.025999999999996 - type: map_at_1000 value: 37.068 - type: map_at_3 value: 31.921 - type: map_at_5 value: 34.172000000000004 - type: mrr_at_1 value: 23.926 - type: mrr_at_10 value: 36.525999999999996 - type: mrr_at_100 value: 37.627 - type: mrr_at_1000 value: 37.665 - type: mrr_at_3 value: 32.653 - type: mrr_at_5 value: 34.897 - type: ndcg_at_1 value: 23.910999999999998 - type: ndcg_at_10 value: 42.927 - type: ndcg_at_100 value: 48.464 - type: ndcg_at_1000 value: 49.533 - type: ndcg_at_3 value: 34.910000000000004 - type: ndcg_at_5 value: 38.937 - type: precision_at_1 value: 23.910999999999998 - type: precision_at_10 value: 6.758 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.838000000000001 - type: precision_at_5 value: 10.934000000000001 - type: recall_at_1 value: 23.179 - type: recall_at_10 value: 64.622 - type: recall_at_100 value: 90.135 - type: recall_at_1000 value: 98.301 - type: recall_at_3 value: 42.836999999999996 - type: recall_at_5 value: 52.512 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.59598723210215 - type: f1 value: 96.41913500001952 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.89557683538533 - type: f1 value: 63.379319722356264 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.93745796906524 - type: f1 value: 75.71616541785902 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.41223940820443 - type: f1 value: 81.2877893719078 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 35.03682528325662 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.942529406124 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.459949660460317 - type: mrr value: 32.70509582031616 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.497 - type: map_at_10 value: 13.843 - type: map_at_100 value: 17.713 - type: map_at_1000 value: 19.241 - type: map_at_3 value: 10.096 - type: map_at_5 value: 11.85 - type: mrr_at_1 value: 48.916 - type: mrr_at_10 value: 57.764 - type: mrr_at_100 value: 58.251 - type: mrr_at_1000 value: 58.282999999999994 - type: mrr_at_3 value: 55.623999999999995 - type: mrr_at_5 value: 57.018 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 36.945 - type: ndcg_at_100 value: 34.06 - type: ndcg_at_1000 value: 43.05 - type: ndcg_at_3 value: 41.738 - type: ndcg_at_5 value: 39.330999999999996 - type: precision_at_1 value: 48.916 - type: precision_at_10 value: 27.43 - type: precision_at_100 value: 8.616 - type: precision_at_1000 value: 2.155 - type: precision_at_3 value: 39.112 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.497 - type: recall_at_10 value: 18.163 - type: recall_at_100 value: 34.566 - type: recall_at_1000 value: 67.15 - type: recall_at_3 value: 11.100999999999999 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 31.916 - type: map_at_10 value: 48.123 - type: map_at_100 value: 49.103 - type: map_at_1000 value: 49.131 - type: map_at_3 value: 43.711 - type: map_at_5 value: 46.323 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 50.617999999999995 - type: mrr_at_100 value: 51.329 - type: mrr_at_1000 value: 51.348000000000006 - type: mrr_at_3 value: 47.010999999999996 - type: mrr_at_5 value: 49.175000000000004 - type: ndcg_at_1 value: 36.181999999999995 - type: ndcg_at_10 value: 56.077999999999996 - type: ndcg_at_100 value: 60.037 - type: ndcg_at_1000 value: 60.63499999999999 - type: ndcg_at_3 value: 47.859 - type: ndcg_at_5 value: 52.178999999999995 - type: precision_at_1 value: 36.181999999999995 - type: precision_at_10 value: 9.284 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 22.006999999999998 - type: precision_at_5 value: 15.695 - type: recall_at_1 value: 31.916 - type: recall_at_10 value: 77.771 - type: recall_at_100 value: 94.602 - type: recall_at_1000 value: 98.967 - type: recall_at_3 value: 56.528 - type: recall_at_5 value: 66.527 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.486 - type: map_at_10 value: 85.978 - type: map_at_100 value: 86.587 - type: map_at_1000 value: 86.598 - type: map_at_3 value: 83.04899999999999 - type: map_at_5 value: 84.857 - type: mrr_at_1 value: 82.32000000000001 - type: mrr_at_10 value: 88.64 - type: mrr_at_100 value: 88.702 - type: mrr_at_1000 value: 88.702 - type: mrr_at_3 value: 87.735 - type: mrr_at_5 value: 88.36 - type: ndcg_at_1 value: 82.34 - type: ndcg_at_10 value: 89.67 - type: ndcg_at_100 value: 90.642 - type: ndcg_at_1000 value: 90.688 - type: ndcg_at_3 value: 86.932 - type: ndcg_at_5 value: 88.408 - type: precision_at_1 value: 82.34 - type: precision_at_10 value: 13.675999999999998 - type: precision_at_100 value: 1.544 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.24 - type: precision_at_5 value: 25.068 - type: recall_at_1 value: 71.486 - type: recall_at_10 value: 96.844 - type: recall_at_100 value: 99.843 - type: recall_at_1000 value: 99.996 - type: recall_at_3 value: 88.92099999999999 - type: recall_at_5 value: 93.215 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.75758437908334 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 68.03497914092789 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.808 - type: map_at_10 value: 16.059 - type: map_at_100 value: 19.048000000000002 - type: map_at_1000 value: 19.43 - type: map_at_3 value: 10.953 - type: map_at_5 value: 13.363 - type: mrr_at_1 value: 28.7 - type: mrr_at_10 value: 42.436 - type: mrr_at_100 value: 43.599 - type: mrr_at_1000 value: 43.62 - type: mrr_at_3 value: 38.45 - type: mrr_at_5 value: 40.89 - type: ndcg_at_1 value: 28.7 - type: ndcg_at_10 value: 26.346000000000004 - type: ndcg_at_100 value: 36.758 - type: ndcg_at_1000 value: 42.113 - type: ndcg_at_3 value: 24.254 - type: ndcg_at_5 value: 21.506 - type: precision_at_1 value: 28.7 - type: precision_at_10 value: 13.969999999999999 - type: precision_at_100 value: 2.881 - type: precision_at_1000 value: 0.414 - type: precision_at_3 value: 22.933 - type: precision_at_5 value: 19.220000000000002 - type: recall_at_1 value: 5.808 - type: recall_at_10 value: 28.310000000000002 - type: recall_at_100 value: 58.475 - type: recall_at_1000 value: 84.072 - type: recall_at_3 value: 13.957 - type: recall_at_5 value: 19.515 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.39274129958557 - type: cos_sim_spearman value: 79.78021235170053 - type: euclidean_pearson value: 79.35335401300166 - type: euclidean_spearman value: 79.7271870968275 - type: manhattan_pearson value: 79.35256263340601 - type: manhattan_spearman value: 79.76036386976321 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.99130429246708 - type: cos_sim_spearman value: 73.88322811171203 - type: euclidean_pearson value: 80.7569419170376 - type: euclidean_spearman value: 73.82542155409597 - type: manhattan_pearson value: 80.79468183847625 - type: manhattan_spearman value: 73.87027144047784 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.88548789489907 - type: cos_sim_spearman value: 85.07535893847255 - type: euclidean_pearson value: 84.6637222061494 - type: euclidean_spearman value: 85.14200626702456 - type: manhattan_pearson value: 84.75327892344734 - type: manhattan_spearman value: 85.24406181838596 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.88140039325008 - type: cos_sim_spearman value: 79.61211268112362 - type: euclidean_pearson value: 81.29639728816458 - type: euclidean_spearman value: 79.51284578041442 - type: manhattan_pearson value: 81.3381797137111 - type: manhattan_spearman value: 79.55683684039808 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.16716737270485 - type: cos_sim_spearman value: 86.14823841857738 - type: euclidean_pearson value: 85.36325733440725 - type: euclidean_spearman value: 86.04919691402029 - type: manhattan_pearson value: 85.3147511385052 - type: manhattan_spearman value: 86.00676205857764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.34266645861588 - type: cos_sim_spearman value: 81.59914035005882 - type: euclidean_pearson value: 81.15053076245988 - type: euclidean_spearman value: 81.52776915798489 - type: manhattan_pearson value: 81.1819647418673 - type: manhattan_spearman value: 81.57479527353556 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.38263326821439 - type: cos_sim_spearman value: 89.10946308202642 - type: euclidean_pearson value: 88.87831312540068 - type: euclidean_spearman value: 89.03615865973664 - type: manhattan_pearson value: 88.79835539970384 - type: manhattan_spearman value: 88.9766156339753 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 70.1574915581685 - type: cos_sim_spearman value: 70.59144980004054 - type: euclidean_pearson value: 71.43246306918755 - type: euclidean_spearman value: 70.5544189562984 - type: manhattan_pearson value: 71.4071414609503 - type: manhattan_spearman value: 70.31799126163712 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.36215796635351 - type: cos_sim_spearman value: 83.07276756467208 - type: euclidean_pearson value: 83.06690453635584 - type: euclidean_spearman value: 82.9635366303289 - type: manhattan_pearson value: 83.04994049700815 - type: manhattan_spearman value: 82.98120125356036 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.92530011616722 - type: mrr value: 96.21826793395421 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 65.75 - type: map_at_10 value: 77.701 - type: map_at_100 value: 78.005 - type: map_at_1000 value: 78.006 - type: map_at_3 value: 75.48 - type: map_at_5 value: 76.927 - type: mrr_at_1 value: 68.333 - type: mrr_at_10 value: 78.511 - type: mrr_at_100 value: 78.704 - type: mrr_at_1000 value: 78.704 - type: mrr_at_3 value: 77 - type: mrr_at_5 value: 78.083 - type: ndcg_at_1 value: 68.333 - type: ndcg_at_10 value: 82.42699999999999 - type: ndcg_at_100 value: 83.486 - type: ndcg_at_1000 value: 83.511 - type: ndcg_at_3 value: 78.96300000000001 - type: ndcg_at_5 value: 81.028 - type: precision_at_1 value: 68.333 - type: precision_at_10 value: 10.667 - type: precision_at_100 value: 1.127 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 20.133000000000003 - type: recall_at_1 value: 65.75 - type: recall_at_10 value: 95.578 - type: recall_at_100 value: 99.833 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 86.506 - type: recall_at_5 value: 91.75 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.75247524752476 - type: cos_sim_ap value: 94.16065078045173 - type: cos_sim_f1 value: 87.22986247544205 - type: cos_sim_precision value: 85.71428571428571 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.74554455445545 - type: dot_ap value: 93.90633887037264 - type: dot_f1 value: 86.9873417721519 - type: dot_precision value: 88.1025641025641 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.75247524752476 - type: euclidean_ap value: 94.17466319018055 - type: euclidean_f1 value: 87.3405299313052 - type: euclidean_precision value: 85.74181117533719 - type: euclidean_recall value: 89 - type: manhattan_accuracy value: 99.75445544554455 - type: manhattan_ap value: 94.27688371923577 - type: manhattan_f1 value: 87.74002954209749 - type: manhattan_precision value: 86.42095053346266 - type: manhattan_recall value: 89.1 - type: max_accuracy value: 99.75445544554455 - type: max_ap value: 94.27688371923577 - type: max_f1 value: 87.74002954209749 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 71.26500637517056 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 39.17507906280528 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.4848744828509 - type: mrr value: 53.33678168236992 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.599864323827887 - type: cos_sim_spearman value: 30.91116204665598 - type: dot_pearson value: 30.82637894269936 - type: dot_spearman value: 30.957573868416066 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.23600000000000002 - type: map_at_10 value: 1.892 - type: map_at_100 value: 11.586 - type: map_at_1000 value: 27.761999999999997 - type: map_at_3 value: 0.653 - type: map_at_5 value: 1.028 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 94 - type: mrr_at_100 value: 94 - type: mrr_at_1000 value: 94 - type: mrr_at_3 value: 94 - type: mrr_at_5 value: 94 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 77.48899999999999 - type: ndcg_at_100 value: 60.141 - type: ndcg_at_1000 value: 54.228 - type: ndcg_at_3 value: 82.358 - type: ndcg_at_5 value: 80.449 - type: precision_at_1 value: 88 - type: precision_at_10 value: 82.19999999999999 - type: precision_at_100 value: 61.760000000000005 - type: precision_at_1000 value: 23.684 - type: precision_at_3 value: 88 - type: precision_at_5 value: 85.6 - type: recall_at_1 value: 0.23600000000000002 - type: recall_at_10 value: 2.117 - type: recall_at_100 value: 14.985000000000001 - type: recall_at_1000 value: 51.107 - type: recall_at_3 value: 0.688 - type: recall_at_5 value: 1.1039999999999999 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 2.3040000000000003 - type: map_at_10 value: 9.025 - type: map_at_100 value: 15.312999999999999 - type: map_at_1000 value: 16.954 - type: map_at_3 value: 4.981 - type: map_at_5 value: 6.32 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 39.835 - type: mrr_at_100 value: 40.8 - type: mrr_at_1000 value: 40.8 - type: mrr_at_3 value: 35.034 - type: mrr_at_5 value: 37.687 - type: ndcg_at_1 value: 22.448999999999998 - type: ndcg_at_10 value: 22.545 - type: ndcg_at_100 value: 35.931999999999995 - type: ndcg_at_1000 value: 47.665 - type: ndcg_at_3 value: 23.311 - type: ndcg_at_5 value: 22.421 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 20.408 - type: precision_at_100 value: 7.815999999999999 - type: precision_at_1000 value: 1.553 - type: precision_at_3 value: 25.169999999999998 - type: precision_at_5 value: 23.265 - type: recall_at_1 value: 2.3040000000000003 - type: recall_at_10 value: 15.693999999999999 - type: recall_at_100 value: 48.917 - type: recall_at_1000 value: 84.964 - type: recall_at_3 value: 6.026 - type: recall_at_5 value: 9.066 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 82.6074 - type: ap value: 23.187467098602013 - type: f1 value: 65.36829506379657 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 63.16355404640635 - type: f1 value: 63.534725639863346 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.91004094411276 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.55301901412649 - type: cos_sim_ap value: 75.25312618556728 - type: cos_sim_f1 value: 68.76561719140429 - type: cos_sim_precision value: 65.3061224489796 - type: cos_sim_recall value: 72.61213720316623 - type: dot_accuracy value: 86.29671574178936 - type: dot_ap value: 75.11910195501207 - type: dot_f1 value: 68.44048376830045 - type: dot_precision value: 66.12546125461255 - type: dot_recall value: 70.92348284960423 - type: euclidean_accuracy value: 86.5828217202122 - type: euclidean_ap value: 75.22986344900924 - type: euclidean_f1 value: 68.81267797449549 - type: euclidean_precision value: 64.8238861674831 - type: euclidean_recall value: 73.3245382585752 - type: manhattan_accuracy value: 86.61262442629791 - type: manhattan_ap value: 75.24401608557328 - type: manhattan_f1 value: 68.80473982483257 - type: manhattan_precision value: 67.21187720181177 - type: manhattan_recall value: 70.47493403693932 - type: max_accuracy value: 86.61262442629791 - type: max_ap value: 75.25312618556728 - type: max_f1 value: 68.81267797449549 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.10688089416696 - type: cos_sim_ap value: 84.17862178779863 - type: cos_sim_f1 value: 76.17305208781748 - type: cos_sim_precision value: 71.31246641590543 - type: cos_sim_recall value: 81.74468740375731 - type: dot_accuracy value: 88.1844995536927 - type: dot_ap value: 84.33816725235876 - type: dot_f1 value: 76.43554032918746 - type: dot_precision value: 74.01557767200346 - type: dot_recall value: 79.0190945488143 - type: euclidean_accuracy value: 88.07001203089223 - type: euclidean_ap value: 84.12267000814985 - type: euclidean_f1 value: 76.12232600180778 - type: euclidean_precision value: 74.50604541433205 - type: euclidean_recall value: 77.81028641823221 - type: manhattan_accuracy value: 88.06419063142779 - type: manhattan_ap value: 84.11648917164187 - type: manhattan_f1 value: 76.20579953925474 - type: manhattan_precision value: 72.56772755762935 - type: manhattan_recall value: 80.22790267939637 - type: max_accuracy value: 88.1844995536927 - type: max_ap value: 84.33816725235876 - type: max_f1 value: 76.43554032918746 --- <!-- **English** | 中文 --> # gte-large-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. <!-- Provide a longer summary of what this model is. --> - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval <!-- - **Demo [optional]:** [More Information Needed] --> ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with sentence-transformers: Use with : ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000 - CPT: max_len 512, lr 5e-5, batch_size 28672, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 409 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:",
27
+ "model_explanation_gemini": "Generates English sentence embeddings for tasks like classification, retrieval, clustering, and similarity scoring, achieving strong performance on MTEB benchmarks.\n\nModel Features: \n- **Tasks**: Sentence embeddings, classification, retrieval, clustering, reranking, STS (semantic textual similarity). \n- **Benchmarks**: Evaluated on MTEB (Massive Text Embedding Benchmark) with metrics like accuracy, F1, MAP, NDCG, and Pearson correlation. \n- **Performance**: High scores on Amazon polarity",
28
+ "release_year": "2024",
29
+ "parameter_count": null,
30
+ "is_fine_tuned": false,
31
+ "category": "Embedding",
32
+ "api_enhanced": true
33
  }
model_data_json/Alibaba-NLP_gte-modernbert-base.json CHANGED
@@ -22,5 +22,11 @@
22
  "region:us"
23
  ],
24
  "description": "--- license: apache-2.0 language: - en base_model: - answerdotai/ModernBERT-base base_model_relation: finetune pipeline_tag: sentence-similarity library_name: transformers tags: - sentence-transformers - mteb - embedding - transformers.js --- # gte-modernbert-base We are excited to introduce the series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The series models include both text embedding models and rerank models. The models demonstrates competitive performance in several text embedding and text retrieval evaluation tasks when compared to similar-scale models from the current open-source community. This includes assessments such as MTEB, LoCO, and COIR evaluation. ## Model Overview - Developed by: Tongyi Lab, Alibaba Group - Model Type: Text Embedding - Primary Language: English - Model Size: 149M - Max Input Length: 8192 tokens - Output Dimension: 768 ### Model list | Models | Language | Model Type | Model Size | Max Seq. Length | Dimension | MTEB-en | BEIR | LoCo | CoIR | |:--------------------------------------------------------------------------------------:|:--------:|:----------------------:|:----------:|:---------------:|:---------:|:-------:|:----:|:----:|:----:| | []( | English | text embedding | 149M | 8192 | 768 | 64.38 | 55.33 | 87.57 | 79.31 | | []( | English | text reranker | 149M | 8192 | - | - | 56.19 | 90.68 | 79.99 | ## Usage > [!TIP] > For and , if your GPU supports it, the efficient Flash Attention 2 will be used automatically if you have installed. It is not mandatory. > > Use with Use with : Use with : ## Training Details The series of models follows the training scheme of the previous GTE models, with the only difference being that the pre-training language model base has been replaced from GTE-MLM to ModernBert. For more training details, please refer to our paper: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. Given that all models in the series have a size of less than 1B parameters, we focused exclusively on the results of models under 1B from the MTEB leaderboard. | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:------------------------------------------------------------------------------------------------:|:--------------:|:---------:|:---------------:|:------------:|:-----------:|:---:|:---:|:---:|:---:|:-----------:|:--------:| | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5 | 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | gte-base-en-v1.5 | 137 | 768 | 8192 | 64.11 | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5 | 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | | gte-large-en-v1.5 | 409 | 1024 | 8192 | 65.39 | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | modernbert-embed-base | 149 | 768 | 8192 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 | | nomic-embed-text-v1.5 | | 768 | 8192 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01| 81.94 | 30.4 | | gte-multilingual-base | 305 | 768 | 8192 | 61.4 | 70.89 | 44.31 | 84.24 | 57.47 |51.08 | 82.11 | 30.58 | | jina-embeddings-v3 | 572 | 1024 | 8192 | 65.51 | 82.58 |45.21 |84.01 |58.13 |53.88 | 85.81 | 29.71 | | **gte-modernbert-base** | 149 | 768 | 8192 | **64.38** | **76.99** | **46.47** | **85.93** | **59.24** | **55.33** | **81.57** | **30.68** | ### LoCo (Long Document Retrieval)(NDCG@10) | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | | gte-modernbert-base | 768 | 8192 | 88.88 | 54.45 | 93.00 | 99.82 | 98.03 | 98.70 | | gte-reranker-modernbert-base | - | 8192 | 90.68 | 70.86 | 94.06 | 99.73 | 99.11 | 89.67 | ### COIR (Code Retrieval Task)(NDCG@10) | Model Name | Dimension | Sequence Length | Average(20) | CodeSearchNet-ccr-go | CodeSearchNet-ccr-java | CodeSearchNet-ccr-javascript | CodeSearchNet-ccr-php | CodeSearchNet-ccr-python | CodeSearchNet-ccr-ruby | CodeSearchNet-go | CodeSearchNet-java | CodeSearchNet-javascript | CodeSearchNet-php | CodeSearchNet-python | CodeSearchNet-ruby | apps | codefeedback-mt | codefeedback-st | codetrans-contest | codetrans-dl | cosqa | stackoverflow-qa | synthetic-text2sql | |:----:|:---:|:---:|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 79.31 | 94.15 | 93.57 | 94.27 | 91.51 | 93.93 | 90.63 | 88.32 | 83.27 | 76.05 | 85.12 | 88.16 | 77.59 | 57.54 | 82.34 | 85.95 | 71.89 | 35.46 | 43.47 | 91.2 | 61.87 | | gte-reranker-modernbert-base | - | 8192 | 79.99 | 96.43 | 96.88 | 98.32 | 91.81 | 97.7 | 91.96 | 88.81 | 79.71 | 76.27 | 89.39 | 98.37 | 84.11 | 47.57 | 83.37 | 88.91 | 49.66 | 36.36 | 44.37 | 89.58 | 64.21 | ### BEIR(NDCG@10) | Model Name | Dimension | Sequence Length | Average(15) | ArguAna | ClimateFEVER | CQADupstackAndroidRetrieval | DBPedia | FEVER | FiQA2018 | HotpotQA | MSMARCO | NFCorpus | NQ | QuoraRetrieval | SCIDOCS | SciFact | Touche2020 | TRECCOVID | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 55.33 | 72.68 | 37.74 | 42.63 | 41.79 | 91.03 | 48.81 | 69.47 | 40.9 | 36.44 | 57.62 | 88.55 | 21.29 | 77.4 | 21.68 | 81.95 | | gte-reranker-modernbert-base | - | 8192 | 56.73 | 69.03 | 37.79 | 44.68 | 47.23 | 94.54 | 49.81 | 78.16 | 45.38 | 30.69 | 64.57 | 87.77 | 20.60 | 73.57 | 27.36 | 79.89 | ## Hiring We have open positions for **Research Interns** and **Full-Time Researchers** to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of **Beijing** and **Hangzhou**. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, feel free to give us a cite.",
25
- "model_explanation_gemini": "Generates English text embeddings for sentence similarity and retrieval tasks, achieving competitive performance in MTEB, LoCO, and COIR evaluations compared to similar-scale models.\n\nModel Features: \n- **Type**: Text Embedding \n- **Base Model**: ModernBERT-base \n- **Language**: English \n- **Size**: 149M parameters \n- **Max Input Length**: 8192 tokens \n- **Output Dimension**: 768 \n- **Key Metrics**: MTEB (64.38 avg),"
 
 
 
 
 
 
26
  }
 
22
  "region:us"
23
  ],
24
  "description": "--- license: apache-2.0 language: - en base_model: - answerdotai/ModernBERT-base base_model_relation: finetune pipeline_tag: sentence-similarity library_name: transformers tags: - sentence-transformers - mteb - embedding - transformers.js --- # gte-modernbert-base We are excited to introduce the series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The series models include both text embedding models and rerank models. The models demonstrates competitive performance in several text embedding and text retrieval evaluation tasks when compared to similar-scale models from the current open-source community. This includes assessments such as MTEB, LoCO, and COIR evaluation. ## Model Overview - Developed by: Tongyi Lab, Alibaba Group - Model Type: Text Embedding - Primary Language: English - Model Size: 149M - Max Input Length: 8192 tokens - Output Dimension: 768 ### Model list | Models | Language | Model Type | Model Size | Max Seq. Length | Dimension | MTEB-en | BEIR | LoCo | CoIR | |:--------------------------------------------------------------------------------------:|:--------:|:----------------------:|:----------:|:---------------:|:---------:|:-------:|:----:|:----:|:----:| | []( | English | text embedding | 149M | 8192 | 768 | 64.38 | 55.33 | 87.57 | 79.31 | | []( | English | text reranker | 149M | 8192 | - | - | 56.19 | 90.68 | 79.99 | ## Usage > [!TIP] > For and , if your GPU supports it, the efficient Flash Attention 2 will be used automatically if you have installed. It is not mandatory. > > Use with Use with : Use with : ## Training Details The series of models follows the training scheme of the previous GTE models, with the only difference being that the pre-training language model base has been replaced from GTE-MLM to ModernBert. For more training details, please refer to our paper: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. Given that all models in the series have a size of less than 1B parameters, we focused exclusively on the results of models under 1B from the MTEB leaderboard. | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:------------------------------------------------------------------------------------------------:|:--------------:|:---------:|:---------------:|:------------:|:-----------:|:---:|:---:|:---:|:---:|:-----------:|:--------:| | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5 | 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | gte-base-en-v1.5 | 137 | 768 | 8192 | 64.11 | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5 | 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | | gte-large-en-v1.5 | 409 | 1024 | 8192 | 65.39 | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | modernbert-embed-base | 149 | 768 | 8192 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 | | nomic-embed-text-v1.5 | | 768 | 8192 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01| 81.94 | 30.4 | | gte-multilingual-base | 305 | 768 | 8192 | 61.4 | 70.89 | 44.31 | 84.24 | 57.47 |51.08 | 82.11 | 30.58 | | jina-embeddings-v3 | 572 | 1024 | 8192 | 65.51 | 82.58 |45.21 |84.01 |58.13 |53.88 | 85.81 | 29.71 | | **gte-modernbert-base** | 149 | 768 | 8192 | **64.38** | **76.99** | **46.47** | **85.93** | **59.24** | **55.33** | **81.57** | **30.68** | ### LoCo (Long Document Retrieval)(NDCG@10) | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | | gte-modernbert-base | 768 | 8192 | 88.88 | 54.45 | 93.00 | 99.82 | 98.03 | 98.70 | | gte-reranker-modernbert-base | - | 8192 | 90.68 | 70.86 | 94.06 | 99.73 | 99.11 | 89.67 | ### COIR (Code Retrieval Task)(NDCG@10) | Model Name | Dimension | Sequence Length | Average(20) | CodeSearchNet-ccr-go | CodeSearchNet-ccr-java | CodeSearchNet-ccr-javascript | CodeSearchNet-ccr-php | CodeSearchNet-ccr-python | CodeSearchNet-ccr-ruby | CodeSearchNet-go | CodeSearchNet-java | CodeSearchNet-javascript | CodeSearchNet-php | CodeSearchNet-python | CodeSearchNet-ruby | apps | codefeedback-mt | codefeedback-st | codetrans-contest | codetrans-dl | cosqa | stackoverflow-qa | synthetic-text2sql | |:----:|:---:|:---:|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 79.31 | 94.15 | 93.57 | 94.27 | 91.51 | 93.93 | 90.63 | 88.32 | 83.27 | 76.05 | 85.12 | 88.16 | 77.59 | 57.54 | 82.34 | 85.95 | 71.89 | 35.46 | 43.47 | 91.2 | 61.87 | | gte-reranker-modernbert-base | - | 8192 | 79.99 | 96.43 | 96.88 | 98.32 | 91.81 | 97.7 | 91.96 | 88.81 | 79.71 | 76.27 | 89.39 | 98.37 | 84.11 | 47.57 | 83.37 | 88.91 | 49.66 | 36.36 | 44.37 | 89.58 | 64.21 | ### BEIR(NDCG@10) | Model Name | Dimension | Sequence Length | Average(15) | ArguAna | ClimateFEVER | CQADupstackAndroidRetrieval | DBPedia | FEVER | FiQA2018 | HotpotQA | MSMARCO | NFCorpus | NQ | QuoraRetrieval | SCIDOCS | SciFact | Touche2020 | TRECCOVID | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 55.33 | 72.68 | 37.74 | 42.63 | 41.79 | 91.03 | 48.81 | 69.47 | 40.9 | 36.44 | 57.62 | 88.55 | 21.29 | 77.4 | 21.68 | 81.95 | | gte-reranker-modernbert-base | - | 8192 | 56.73 | 69.03 | 37.79 | 44.68 | 47.23 | 94.54 | 49.81 | 78.16 | 45.38 | 30.69 | 64.57 | 87.77 | 20.60 | 73.57 | 27.36 | 79.89 | ## Hiring We have open positions for **Research Interns** and **Full-Time Researchers** to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of **Beijing** and **Hangzhou**. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, feel free to give us a cite.",
25
+ "model_explanation_gemini": "Generates English text embeddings for sentence similarity and retrieval tasks, achieving competitive performance in MTEB, LoCO, and COIR evaluations compared to similar-scale models.\n\nModel Features: \n- **Type**: Text Embedding \n- **Base Model**: ModernBERT-base \n- **Language**: English \n- **Size**: 149M parameters \n- **Max Input Length**: 8192 tokens \n- **Output Dimension**: 768 \n- **Key Metrics**: MTEB (64.38 avg),",
26
+ "release_year": "2023",
27
+ "parameter_count": null,
28
+ "is_fine_tuned": true,
29
+ "category": "Embedding",
30
+ "model_family": "BERT",
31
+ "api_enhanced": true
32
  }
model_data_json/Alibaba-NLP_gte-multilingual-base.json CHANGED
@@ -100,5 +100,10 @@
100
  "region:us"
101
  ],
102
  "description": "--- tags: - mteb - sentence-transformers - transformers - multilingual - sentence-similarity license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh model-index: - name: gte-multilingual-base (dense) results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 33.66681726329994 - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_spearman value: 43.54760696384009 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_spearman value: 48.91186363417501 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 41.689860834990064 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 54.20241337977897 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 44.34083695608643 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: 666fdacebe0291776e86f29345663dfaf80a0db9 metrics: - type: map value: 64.91495250072002 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: ndcg_at_10 value: 53.638 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.95522388059702 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 80.717625 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.64199999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.108 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.169999999999995 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 39.56799999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 35.75000000000001 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.342000000000006 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: ndcg_at_10 value: 58.231 - task: type: Retrieval dataset: type: clarin-knext/arguana-pl name: MTEB ArguAna-PL config: default split: test revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 metrics: - type: ndcg_at_10 value: 53.166000000000004 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.01900557959478 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 41.06626465345723 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.87514497610431 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 81.21450112991194 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_spearman value: 51.71589543397271 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: ndcg_at_10 value: 26.115 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.6169102296451 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.89603052314916 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.12388869645537 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.15692469720906 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.36038961038962 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.5903826674123 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.21474277151329 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 62.519999999999996 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_ap value: 74.90132799162956 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_spearman value: 90.30727955142524 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 37.94850105022274 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 38.11958675421534 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.10950950485399 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.28038294231966 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: ndcg_at_10 value: 47.099000000000004 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: ndcg_at_10 value: 45.973000000000006 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: ndcg_at_10 value: 55.606 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: ndcg_at_10 value: 36.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: ndcg_at_10 value: 30.711 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: ndcg_at_10 value: 44.523 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: ndcg_at_10 value: 37.940000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 38.12183333333333 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: ndcg_at_10 value: 32.684000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: ndcg_at_10 value: 26.735 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: ndcg_at_10 value: 36.933 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: ndcg_at_10 value: 33.747 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 28.872999999999998 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: ndcg_at_10 value: 34.833 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: ndcg_at_10 value: 43.78 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_ap value: 84.00640599186677 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: ndcg_at_10 value: 80.60000000000001 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: ndcg_at_10 value: 40.116 - task: type: Retrieval dataset: type: clarin-knext/dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: 76afe41d9af165cc40999fcaa92312b8b012064a metrics: - type: ndcg_at_10 value: 32.498 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: ndcg_at_10 value: 87.547 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: ndcg_at_10 value: 64.85 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.949999999999996 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: ndcg_at_10 value: 92.111 - task: type: Retrieval dataset: type: clarin-knext/fiqa-pl name: MTEB FiQA-PL config: default split: test revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e metrics: - type: ndcg_at_10 value: 28.962 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: ndcg_at_10 value: 45.005 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 25.133776435657595 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: ndcg_at_10 value: 63.036 - task: type: Retrieval dataset: type: clarin-knext/hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 metrics: - type: ndcg_at_10 value: 56.904999999999994 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 44.59407464409388 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 74.912 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 79.26829268292683 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_spearman value: 74.8601229809791 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 42.331902754246556 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 40.92029335502153 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 32.19266316591337 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: ndcg_at_10 value: 79.346 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: ndcg_at_10 value: 39.922999999999995 - task: type: Retrieval dataset: type: clarin-knext/msmarco-pl name: MTEB MSMARCO-PL config: default split: test revision: 8634c07806d5cce3a6138e260e59b81760a0a640 metrics: - type: ndcg_at_10 value: 55.620999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.53989968080255 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.26993519301212 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.87725150100067 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.48512370811149 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.45141627823591 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 83.45750452079565 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.57637938896488 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.50803043110736 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.6577718478986 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 64.05887879736925 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 65.27070634636071 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.04520795660037 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 80.66350710900474 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 44.016506455899425 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 40.67730129573544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.94552790854068 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 49.273705447209146 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.490921318090116 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.97511768661733 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.5689307330195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.34902488231337 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.6684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.54539340954942 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.08675184936112 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.12508406186953 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.41425689307331 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.59515803631474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.90517821116342 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.91526563550774 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.198386012104905 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.04371217215869 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.31203765971756 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.521183591123055 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.06254203093476 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.01546738399461 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.27975790181574 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.79556153328849 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.18493611297915 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.888365837256224 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.79690652320108 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.225958305312716 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.58641560188299 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.08204438466711 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.54606590450572 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.443174176193665 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.45662407531944 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.739071956960316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.36180228648286 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.3920645595158 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.06993947545395 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.123739071956955 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.46133154001346 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.54472091459314 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.204438466711494 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.69603227975792 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.523873570948226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.53396099529253 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.88298587760591 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.65097511768662 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.8453261600538 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.6247478143914 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.16274377942166 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.61667787491594 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.17283120376598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.89912575655683 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.27975790181573 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.269670477471415 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.10423671822461 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.40753194351043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.369872225958304 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.60726294552792 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.30262273032952 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.52925353059851 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.28446536650976 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.45460659045058 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.26563550773368 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.20578345662408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.64963012777405 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.698049764626774 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.14458641560188 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.51445864156018 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.13786146603901 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.61533288500337 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.526563550773375 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.99731002017484 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.59381304640216 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.010759919300604 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 53.26160053799597 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.800941492938804 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.387357094821795 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.5359784801614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.36919973100203 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.81506388702084 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.35104236718225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.67787491593813 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.4250168123739 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.49630127774043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.95696032279758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.11768661735036 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.86953597848016 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.51042367182247 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.81573638197713 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.26227303295225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.51513113651646 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.29858776059179 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.72696704774714 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.57700067249496 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.22797579018157 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.97041022192333 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.72629455279085 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.16072629455278 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.92199058507062 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.40484196368527 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.61398789509079 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: ndcg_at_10 value: 61.934999999999995 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.052031054565205 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.969909524076794 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.7530992892652 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: ndcg_at_10 value: 34.705999999999996 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ar) config: ar split: test revision: None metrics: - type: ndcg_at_10 value: 55.166000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (de) config: de split: test revision: None metrics: - type: ndcg_at_10 value: 55.155 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (en) config: en split: test revision: None metrics: - type: ndcg_at_10 value: 50.993 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (es) config: es split: test revision: None metrics: - type: ndcg_at_10 value: 81.228 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (fr) config: fr split: test revision: None metrics: - type: ndcg_at_10 value: 76.19 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (hi) config: hi split: test revision: None metrics: - type: ndcg_at_10 value: 45.206 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (it) config: it split: test revision: None metrics: - type: ndcg_at_10 value: 66.741 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ja) config: ja split: test revision: None metrics: - type: ndcg_at_10 value: 52.111 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ko) config: ko split: test revision: None metrics: - type: ndcg_at_10 value: 46.733000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (pt) config: pt split: test revision: None metrics: - type: ndcg_at_10 value: 79.105 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ru) config: ru split: test revision: None metrics: - type: ndcg_at_10 value: 64.21 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (th) config: th split: test revision: None metrics: - type: ndcg_at_10 value: 35.467 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (zh) config: zh split: test revision: None metrics: - type: ndcg_at_10 value: 27.419 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 61.02000000000001 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: ndcg_at_10 value: 36.65 - task: type: Retrieval dataset: type: clarin-knext/nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 metrics: - type: ndcg_at_10 value: 26.831 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: ndcg_at_10 value: 58.111000000000004 - task: type: Retrieval dataset: type: clarin-knext/nq-pl name: MTEB NQ-PL config: default split: test revision: f171245712cf85dd4700b06bef18001578d0ca8d metrics: - type: ndcg_at_10 value: 43.126999999999995 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_ap value: 72.67630697316041 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 84.85000000000001 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_ap value: 100 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 65.99189110918043 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_spearman value: 16.124364530596228 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_ap value: 92.43431057460192 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_ap value: 99.06090138049724 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_ap value: 58.9314954874314 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 69.59833795013851 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 44.73684210526315 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_spearman value: 39.36450754137984 - task: type: Retrieval dataset: type: clarin-knext/quora-pl name: MTEB Quora-PL config: default split: test revision: 0be27e93455051e531182b85e85e425aba12e9d4 metrics: - type: ndcg_at_10 value: 80.76299999999999 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: ndcg_at_10 value: 88.022 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.719165988934385 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.25390069273025 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: ndcg_at_10 value: 18.243000000000002 - task: type: Retrieval dataset: type: clarin-knext/scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: 45452b03f05560207ef19149545f168e596c9337 metrics: - type: ndcg_at_10 value: 14.219000000000001 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_ap value: 75.4022630307816 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 79.34269390198548 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_spearman value: 74.0651660446132 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_spearman value: 78.62693119733123 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 77.50660544631359 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 85.55415077723738 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.67550814479077 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 88.94601412322764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 84.33844259337481 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 81.58650681159105 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 78.82472265884256 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.43637938260397 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.71008299464059 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 88.88074713413747 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.36405640457285 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.84737910084762 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 87.03931621433031 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.43335591752246 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.85268648747021 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 82.45786516224341 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 67.20227303970304 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 60.892838305537126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.01876318464508 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 42.3879320510127 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 65.54048784845729 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 58.55244068334867 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.48710288440624 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.585754901838 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 81.03001290557805 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 62.28001859884359 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 79.64106342105019 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.27915339361124 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.28574268257462 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.92658860751482 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 74.83418886368217 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 56.01064022625769 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 53.64332829635126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 73.24670207647144 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_spearman value: 80.7157790971544 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.45763616928973 - task: type: STS dataset: type: stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_spearman value: 84.4335500335282 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.15276484499303 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: ndcg_at_10 value: 73.433 - task: type: Retrieval dataset: type: clarin-knext/scifact-pl name: MTEB SciFact-PL config: default split: test revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e metrics: - type: ndcg_at_10 value: 58.919999999999995 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_ap value: 95.40564890916419 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 63.41856697730145 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 31.709285904909112 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.09341030060322 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_spearman value: 30.58262517835034 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_spearman value: 29.744542072951358 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: b205c5084a0934ce8af14338bf03feb19499c84d metrics: - type: map value: 88.03333333333333 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff metrics: - type: ndcg_at_10 value: 83.043 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 67.08577894804324 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: ndcg_at_10 value: 84.718 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 48.726 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: ndcg_at_10 value: 57.56 - task: type: Retrieval dataset: type: clarin-knext/trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd metrics: - type: ndcg_at_10 value: 59.355999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.765 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.69942196531792 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.86585365853657 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.81666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.75 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.78333333333335 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.72333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.45202558635395 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.59238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 35.69686411149825 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.59333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.1456922987907 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.47462133594857 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.62965440356746 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.48412698412699 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 75.85 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 27.32600866497127 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.38 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.98888712165028 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 85.55690476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 46.68466031323174 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.73071428571428 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.26333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 96.61666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 70.03714285714285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.09 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 59.570476190476185 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.68333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 80.40880503144653 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.7008547008547 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.84833333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 71.69696969696969 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 55.76985790822269 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.66666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 68.36668519547896 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 36.73992673992674 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.420952380952365 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.95392490046146 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.58936507936508 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.563650793650794 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.35 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.43 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.73333333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.38666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.64 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 21.257184628237262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 13.592316017316017 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.22666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.711309523809526 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 24.98790634904795 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 17.19218192918193 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.26666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.57333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.35127206127206 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.12318903318903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 23.856320290390055 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.52833333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.93333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.75333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 30.802919708029197 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 15.984076294076294 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.82666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 76.36054421768706 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.232711399711398 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 45.640803181175855 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 86.29 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.90833333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 11.11880248978075 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 48.45839345839346 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 65.68157033805888 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.63852498786997 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.67904761904761 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.35969868173258 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 5.957229437229437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.50333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.75498778998778 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.99190476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.95 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.054042624042623 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 72.77064981488574 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.14 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 29.976786498525627 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.6525821596244 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 33.12964812964813 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 34.36077879427633 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.571845212690285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 58.13107263107262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.87370133925458 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 20.394327616827614 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.29967426710098 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.80666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.23062271062273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 78.08398950131233 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.85166666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.63004001231148 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.77000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.2654503616042 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 83.90333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.80666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.08 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 60.43098607367475 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.19333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.55352798053529 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.44999999999999 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 57.25416429643288 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 56.616646560243524 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: ndcg_at_10 value: 22.819 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.02579999999999 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.60045274476514 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.346666699466205 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_ap value: 71.88199004440489 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_ap value: 85.41587779677383 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: ndcg_at_10 value: 72.792 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 82.58000000000001 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: ndcg_at_10 value: 67.327 --- ## gte-multilingual-base The **gte-multilingual-base** model is the latest in the GTE (General Text Embedding) family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. - **Elastic Dense Embedding**: Support elastic output dense representation while maintaining the effectiveness of downstream tasks, which significantly reduces storage costs and improves execution efficiency. - **Sparse Vectors**: In addition to dense representations, it can also generate sparse vectors. **Paper**: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Model Information - Model Size: 305M - Embedding Dimension: 768 - Max Input Tokens: 8192 ## Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** - **How to use with TEI: refs/pr/7** ### Get Dense Embeddings with Transformers ### Use with sentence-transformers ### Use with infinity Usage via docker and infinity, MIT Licensed. ### Use with custom code to get dense embeddings and sparse token weights ## Evaluation We validated the performance of the **gte-multilingual-base** model on multiple downstream tasks, including multilingual retrieval, cross-lingual retrieval, long text retrieval, and general text representation evaluation on the MTEB Leaderboard, among others. ### Retrieval Task Retrieval results on MIRACL and MLDR (multilingual), MKQA (crosslingual), BEIR and LoCo (English). !image - Detail results on MLDR !image - Detail results on LoCo ### MTEB Results on MTEB English, Chinese, French, Polish !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
103
- "model_explanation_gemini": "Generates multilingual sentence embeddings for tasks like clustering, semantic textual similarity (STS), classification, reranking, and retrieval across over 50 languages. \n\n**Features:** \n- Multilingual support (50+ languages) \n- Sentence embedding generation \n- Tasks: Clustering, STS, classification, reranking, retrieval \n- Performance metrics: v_measure, cos_sim_spearman, accuracy, ndcg_at_10, map \n\n**Comparison:** \nOutperforms or competes with"
 
 
 
 
 
104
  }
 
100
  "region:us"
101
  ],
102
  "description": "--- tags: - mteb - sentence-transformers - transformers - multilingual - sentence-similarity license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh model-index: - name: gte-multilingual-base (dense) results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 33.66681726329994 - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_spearman value: 43.54760696384009 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_spearman value: 48.91186363417501 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 41.689860834990064 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 54.20241337977897 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 44.34083695608643 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: 666fdacebe0291776e86f29345663dfaf80a0db9 metrics: - type: map value: 64.91495250072002 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: ndcg_at_10 value: 53.638 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.95522388059702 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 80.717625 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.64199999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.108 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.169999999999995 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 39.56799999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 35.75000000000001 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.342000000000006 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: ndcg_at_10 value: 58.231 - task: type: Retrieval dataset: type: clarin-knext/arguana-pl name: MTEB ArguAna-PL config: default split: test revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 metrics: - type: ndcg_at_10 value: 53.166000000000004 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.01900557959478 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 41.06626465345723 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.87514497610431 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 81.21450112991194 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_spearman value: 51.71589543397271 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: ndcg_at_10 value: 26.115 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.6169102296451 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.89603052314916 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.12388869645537 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.15692469720906 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.36038961038962 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.5903826674123 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.21474277151329 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 62.519999999999996 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_ap value: 74.90132799162956 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_spearman value: 90.30727955142524 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 37.94850105022274 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 38.11958675421534 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.10950950485399 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.28038294231966 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: ndcg_at_10 value: 47.099000000000004 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: ndcg_at_10 value: 45.973000000000006 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: ndcg_at_10 value: 55.606 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: ndcg_at_10 value: 36.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: ndcg_at_10 value: 30.711 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: ndcg_at_10 value: 44.523 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: ndcg_at_10 value: 37.940000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 38.12183333333333 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: ndcg_at_10 value: 32.684000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: ndcg_at_10 value: 26.735 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: ndcg_at_10 value: 36.933 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: ndcg_at_10 value: 33.747 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 28.872999999999998 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: ndcg_at_10 value: 34.833 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: ndcg_at_10 value: 43.78 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_ap value: 84.00640599186677 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: ndcg_at_10 value: 80.60000000000001 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: ndcg_at_10 value: 40.116 - task: type: Retrieval dataset: type: clarin-knext/dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: 76afe41d9af165cc40999fcaa92312b8b012064a metrics: - type: ndcg_at_10 value: 32.498 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: ndcg_at_10 value: 87.547 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: ndcg_at_10 value: 64.85 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.949999999999996 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: ndcg_at_10 value: 92.111 - task: type: Retrieval dataset: type: clarin-knext/fiqa-pl name: MTEB FiQA-PL config: default split: test revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e metrics: - type: ndcg_at_10 value: 28.962 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: ndcg_at_10 value: 45.005 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 25.133776435657595 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: ndcg_at_10 value: 63.036 - task: type: Retrieval dataset: type: clarin-knext/hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 metrics: - type: ndcg_at_10 value: 56.904999999999994 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 44.59407464409388 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 74.912 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 79.26829268292683 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_spearman value: 74.8601229809791 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 42.331902754246556 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 40.92029335502153 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 32.19266316591337 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: ndcg_at_10 value: 79.346 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: ndcg_at_10 value: 39.922999999999995 - task: type: Retrieval dataset: type: clarin-knext/msmarco-pl name: MTEB MSMARCO-PL config: default split: test revision: 8634c07806d5cce3a6138e260e59b81760a0a640 metrics: - type: ndcg_at_10 value: 55.620999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.53989968080255 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.26993519301212 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.87725150100067 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.48512370811149 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.45141627823591 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 83.45750452079565 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.57637938896488 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.50803043110736 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.6577718478986 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 64.05887879736925 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 65.27070634636071 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.04520795660037 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 80.66350710900474 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 44.016506455899425 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 40.67730129573544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.94552790854068 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 49.273705447209146 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.490921318090116 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.97511768661733 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.5689307330195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.34902488231337 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.6684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.54539340954942 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.08675184936112 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.12508406186953 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.41425689307331 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.59515803631474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.90517821116342 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.91526563550774 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.198386012104905 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.04371217215869 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.31203765971756 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.521183591123055 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.06254203093476 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.01546738399461 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.27975790181574 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.79556153328849 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.18493611297915 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.888365837256224 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.79690652320108 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.225958305312716 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.58641560188299 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.08204438466711 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.54606590450572 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.443174176193665 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.45662407531944 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.739071956960316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.36180228648286 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.3920645595158 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.06993947545395 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.123739071956955 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.46133154001346 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.54472091459314 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.204438466711494 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.69603227975792 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.523873570948226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.53396099529253 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.88298587760591 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.65097511768662 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.8453261600538 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.6247478143914 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.16274377942166 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.61667787491594 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.17283120376598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.89912575655683 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.27975790181573 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.269670477471415 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.10423671822461 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.40753194351043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.369872225958304 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.60726294552792 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.30262273032952 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.52925353059851 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.28446536650976 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.45460659045058 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.26563550773368 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.20578345662408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.64963012777405 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.698049764626774 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.14458641560188 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.51445864156018 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.13786146603901 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.61533288500337 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.526563550773375 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.99731002017484 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.59381304640216 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.010759919300604 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 53.26160053799597 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.800941492938804 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.387357094821795 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.5359784801614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.36919973100203 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.81506388702084 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.35104236718225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.67787491593813 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.4250168123739 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.49630127774043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.95696032279758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.11768661735036 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.86953597848016 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.51042367182247 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.81573638197713 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.26227303295225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.51513113651646 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.29858776059179 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.72696704774714 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.57700067249496 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.22797579018157 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.97041022192333 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.72629455279085 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.16072629455278 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.92199058507062 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.40484196368527 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.61398789509079 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: ndcg_at_10 value: 61.934999999999995 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.052031054565205 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.969909524076794 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.7530992892652 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: ndcg_at_10 value: 34.705999999999996 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ar) config: ar split: test revision: None metrics: - type: ndcg_at_10 value: 55.166000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (de) config: de split: test revision: None metrics: - type: ndcg_at_10 value: 55.155 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (en) config: en split: test revision: None metrics: - type: ndcg_at_10 value: 50.993 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (es) config: es split: test revision: None metrics: - type: ndcg_at_10 value: 81.228 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (fr) config: fr split: test revision: None metrics: - type: ndcg_at_10 value: 76.19 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (hi) config: hi split: test revision: None metrics: - type: ndcg_at_10 value: 45.206 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (it) config: it split: test revision: None metrics: - type: ndcg_at_10 value: 66.741 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ja) config: ja split: test revision: None metrics: - type: ndcg_at_10 value: 52.111 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ko) config: ko split: test revision: None metrics: - type: ndcg_at_10 value: 46.733000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (pt) config: pt split: test revision: None metrics: - type: ndcg_at_10 value: 79.105 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ru) config: ru split: test revision: None metrics: - type: ndcg_at_10 value: 64.21 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (th) config: th split: test revision: None metrics: - type: ndcg_at_10 value: 35.467 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (zh) config: zh split: test revision: None metrics: - type: ndcg_at_10 value: 27.419 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 61.02000000000001 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: ndcg_at_10 value: 36.65 - task: type: Retrieval dataset: type: clarin-knext/nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 metrics: - type: ndcg_at_10 value: 26.831 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: ndcg_at_10 value: 58.111000000000004 - task: type: Retrieval dataset: type: clarin-knext/nq-pl name: MTEB NQ-PL config: default split: test revision: f171245712cf85dd4700b06bef18001578d0ca8d metrics: - type: ndcg_at_10 value: 43.126999999999995 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_ap value: 72.67630697316041 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 84.85000000000001 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_ap value: 100 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 65.99189110918043 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_spearman value: 16.124364530596228 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_ap value: 92.43431057460192 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_ap value: 99.06090138049724 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_ap value: 58.9314954874314 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 69.59833795013851 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 44.73684210526315 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_spearman value: 39.36450754137984 - task: type: Retrieval dataset: type: clarin-knext/quora-pl name: MTEB Quora-PL config: default split: test revision: 0be27e93455051e531182b85e85e425aba12e9d4 metrics: - type: ndcg_at_10 value: 80.76299999999999 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: ndcg_at_10 value: 88.022 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.719165988934385 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.25390069273025 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: ndcg_at_10 value: 18.243000000000002 - task: type: Retrieval dataset: type: clarin-knext/scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: 45452b03f05560207ef19149545f168e596c9337 metrics: - type: ndcg_at_10 value: 14.219000000000001 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_ap value: 75.4022630307816 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 79.34269390198548 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_spearman value: 74.0651660446132 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_spearman value: 78.62693119733123 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 77.50660544631359 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 85.55415077723738 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.67550814479077 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 88.94601412322764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 84.33844259337481 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 81.58650681159105 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 78.82472265884256 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.43637938260397 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.71008299464059 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 88.88074713413747 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.36405640457285 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.84737910084762 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 87.03931621433031 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.43335591752246 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.85268648747021 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 82.45786516224341 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 67.20227303970304 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 60.892838305537126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.01876318464508 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 42.3879320510127 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 65.54048784845729 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 58.55244068334867 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.48710288440624 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.585754901838 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 81.03001290557805 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 62.28001859884359 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 79.64106342105019 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.27915339361124 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.28574268257462 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.92658860751482 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 74.83418886368217 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 56.01064022625769 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 53.64332829635126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 73.24670207647144 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_spearman value: 80.7157790971544 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.45763616928973 - task: type: STS dataset: type: stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_spearman value: 84.4335500335282 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.15276484499303 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: ndcg_at_10 value: 73.433 - task: type: Retrieval dataset: type: clarin-knext/scifact-pl name: MTEB SciFact-PL config: default split: test revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e metrics: - type: ndcg_at_10 value: 58.919999999999995 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_ap value: 95.40564890916419 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 63.41856697730145 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 31.709285904909112 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.09341030060322 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_spearman value: 30.58262517835034 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_spearman value: 29.744542072951358 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: b205c5084a0934ce8af14338bf03feb19499c84d metrics: - type: map value: 88.03333333333333 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff metrics: - type: ndcg_at_10 value: 83.043 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 67.08577894804324 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: ndcg_at_10 value: 84.718 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 48.726 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: ndcg_at_10 value: 57.56 - task: type: Retrieval dataset: type: clarin-knext/trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd metrics: - type: ndcg_at_10 value: 59.355999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.765 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.69942196531792 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.86585365853657 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.81666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.75 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.78333333333335 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.72333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.45202558635395 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.59238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 35.69686411149825 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.59333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.1456922987907 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.47462133594857 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.62965440356746 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.48412698412699 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 75.85 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 27.32600866497127 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.38 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.98888712165028 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 85.55690476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 46.68466031323174 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.73071428571428 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.26333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 96.61666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 70.03714285714285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.09 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 59.570476190476185 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.68333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 80.40880503144653 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.7008547008547 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.84833333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 71.69696969696969 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 55.76985790822269 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.66666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 68.36668519547896 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 36.73992673992674 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.420952380952365 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.95392490046146 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.58936507936508 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.563650793650794 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.35 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.43 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.73333333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.38666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.64 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 21.257184628237262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 13.592316017316017 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.22666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.711309523809526 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 24.98790634904795 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 17.19218192918193 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.26666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.57333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.35127206127206 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.12318903318903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 23.856320290390055 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.52833333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.93333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.75333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 30.802919708029197 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 15.984076294076294 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.82666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 76.36054421768706 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.232711399711398 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 45.640803181175855 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 86.29 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.90833333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 11.11880248978075 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 48.45839345839346 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 65.68157033805888 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.63852498786997 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.67904761904761 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.35969868173258 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 5.957229437229437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.50333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.75498778998778 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.99190476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.95 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.054042624042623 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 72.77064981488574 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.14 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 29.976786498525627 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.6525821596244 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 33.12964812964813 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 34.36077879427633 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.571845212690285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 58.13107263107262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.87370133925458 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 20.394327616827614 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.29967426710098 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.80666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.23062271062273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 78.08398950131233 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.85166666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.63004001231148 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.77000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.2654503616042 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 83.90333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.80666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.08 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 60.43098607367475 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.19333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.55352798053529 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.44999999999999 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 57.25416429643288 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 56.616646560243524 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: ndcg_at_10 value: 22.819 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.02579999999999 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.60045274476514 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.346666699466205 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_ap value: 71.88199004440489 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_ap value: 85.41587779677383 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: ndcg_at_10 value: 72.792 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 82.58000000000001 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: ndcg_at_10 value: 67.327 --- ## gte-multilingual-base The **gte-multilingual-base** model is the latest in the GTE (General Text Embedding) family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. - **Elastic Dense Embedding**: Support elastic output dense representation while maintaining the effectiveness of downstream tasks, which significantly reduces storage costs and improves execution efficiency. - **Sparse Vectors**: In addition to dense representations, it can also generate sparse vectors. **Paper**: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Model Information - Model Size: 305M - Embedding Dimension: 768 - Max Input Tokens: 8192 ## Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** - **How to use with TEI: refs/pr/7** ### Get Dense Embeddings with Transformers ### Use with sentence-transformers ### Use with infinity Usage via docker and infinity, MIT Licensed. ### Use with custom code to get dense embeddings and sparse token weights ## Evaluation We validated the performance of the **gte-multilingual-base** model on multiple downstream tasks, including multilingual retrieval, cross-lingual retrieval, long text retrieval, and general text representation evaluation on the MTEB Leaderboard, among others. ### Retrieval Task Retrieval results on MIRACL and MLDR (multilingual), MKQA (crosslingual), BEIR and LoCo (English). !image - Detail results on MLDR !image - Detail results on LoCo ### MTEB Results on MTEB English, Chinese, French, Polish !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
103
+ "model_explanation_gemini": "Generates multilingual sentence embeddings for tasks like clustering, semantic textual similarity (STS), classification, reranking, and retrieval across over 50 languages. \n\n**Features:** \n- Multilingual support (50+ languages) \n- Sentence embedding generation \n- Tasks: Clustering, STS, classification, reranking, retrieval \n- Performance metrics: v_measure, cos_sim_spearman, accuracy, ndcg_at_10, map \n\n**Comparison:** \nOutperforms or competes with",
104
+ "release_year": "2024",
105
+ "parameter_count": null,
106
+ "is_fine_tuned": false,
107
+ "category": "Embedding",
108
+ "api_enhanced": true
109
  }
model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json CHANGED
@@ -90,5 +90,10 @@
90
  "region:us"
91
  ],
92
  "description": "--- license: apache-2.0 pipeline_tag: text-ranking tags: - transformers - sentence-transformers - text-embeddings-inference language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh library_name: sentence-transformers --- ## gte-multilingual-reranker-base The **gte-multilingual-reranker-base** model is the first reranker model in the GTE family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. ## Model Information - Model Size: 306M - Max Input Tokens: 8192 ### Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** Using Huggingface transformers (transformers>=4.36.0) Usage with infinity: Infinity, a MIT Licensed Inference RestAPI Server. ## Evaluation Results of reranking based on multiple text retreival datasets !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
93
- "model_explanation_gemini": "\"Multilingual text reranker model supporting 70+ languages with high performance, long-context handling (8192 tokens), and faster inference compared to similar-sized models.\"\n\nFeatures: \n- Multilingual (70+ languages) \n- Encoder-only transformer architecture \n- 306M parameters \n- 8192-token context length \n- 10x faster inference than comparable models \n- State-of-the-art performance in multilingual retrieval \n\nComparison: \nOutperforms similar-sized reranker models in"
 
 
 
 
 
94
  }
 
90
  "region:us"
91
  ],
92
  "description": "--- license: apache-2.0 pipeline_tag: text-ranking tags: - transformers - sentence-transformers - text-embeddings-inference language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh library_name: sentence-transformers --- ## gte-multilingual-reranker-base The **gte-multilingual-reranker-base** model is the first reranker model in the GTE family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. ## Model Information - Model Size: 306M - Max Input Tokens: 8192 ### Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** Using Huggingface transformers (transformers>=4.36.0) Usage with infinity: Infinity, a MIT Licensed Inference RestAPI Server. ## Evaluation Results of reranking based on multiple text retreival datasets !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:",
93
+ "model_explanation_gemini": "\"Multilingual text reranker model supporting 70+ languages with high performance, long-context handling (8192 tokens), and faster inference compared to similar-sized models.\"\n\nFeatures: \n- Multilingual (70+ languages) \n- Encoder-only transformer architecture \n- 306M parameters \n- 8192-token context length \n- 10x faster inference than comparable models \n- State-of-the-art performance in multilingual retrieval \n\nComparison: \nOutperforms similar-sized reranker models in",
94
+ "release_year": "2024",
95
+ "parameter_count": null,
96
+ "is_fine_tuned": false,
97
+ "category": "Reranker",
98
+ "api_enhanced": true
99
  }
model_data_json/Alibaba-NLP_gte-reranker-modernbert-base.json CHANGED
@@ -20,5 +20,11 @@
20
  "region:us"
21
  ],
22
  "description": "--- license: apache-2.0 language: - en base_model: - answerdotai/ModernBERT-base base_model_relation: finetune pipeline_tag: text-ranking library_name: transformers tags: - sentence-transformers - transformers.js --- # gte-reranker-modernbert-base We are excited to introduce the series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The series models include both text embedding models and rerank models. The models demonstrates competitive performance in several text embedding and text retrieval evaluation tasks when compared to similar-scale models from the current open-source community. This includes assessments such as **MTEB**, **LoCO**, and **COIR** evaluation. ## Model Overview - Developed by: Tongyi Lab, Alibaba Group - Model Type: Text reranker - Primary Language: English - Model Size: 149M - Max Input Length: 8192 tokens ### Model list | Models | Language | Model Type | Model Size | Max Seq. Length | Dimension | MTEB-en | BEIR | LoCo | CoIR | |:--------------------------------------------------------------------------------------:|:--------:|:----------------------:|:----------:|:---------------:|:---------:|:-------:|:----:|:----:|:----:| | []( | English | text embedding | 149M | 8192 | 768 | 64.38 | 55.33 | 87.57 | 79.31 | | []( | English | text reranker | 149M | 8192 | - | - | 56.19 | 90.68 | 79.99 | ## Usage > [!TIP] > For and , if your GPU supports it, the efficient Flash Attention 2 will be used automatically if you have installed. It is not mandatory. > > Use with Use with : Before you start, install the sentence-transformers libraries: Use with ## Training Details The series of models follows the training scheme of the previous GTE models, with the only difference being that the pre-training language model base has been replaced from GTE-MLM to ModernBert. For more training details, please refer to our paper: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. Given that all models in the series have a size of less than 1B parameters, we focused exclusively on the results of models under 1B from the MTEB leaderboard. | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:------------------------------------------------------------------------------------------------:|:--------------:|:---------:|:---------------:|:------------:|:-----------:|:---:|:---:|:---:|:---:|:-----------:|:--------:| | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5 | 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | gte-base-en-v1.5 | 137 | 768 | 8192 | 64.11 | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5 | 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | | gte-large-en-v1.5 | 409 | 1024 | 8192 | 65.39 | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | modernbert-embed-base | 149 | 768 | 8192 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 | | nomic-embed-text-v1.5 | | 768 | 8192 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01| 81.94 | 30.4 | | gte-multilingual-base | 305 | 768 | 8192 | 61.4 | 70.89 | 44.31 | 84.24 | 57.47 |51.08 | 82.11 | 30.58 | | jina-embeddings-v3 | 572 | 1024 | 8192 | 65.51 | 82.58 |45.21 |84.01 |58.13 |53.88 | 85.81 | 29.71 | | **gte-modernbert-base** | 149 | 768 | 8192 | **64.38** | **76.99** | **46.47** | **85.93** | **59.24** | **55.33** | **81.57** | **30.68** | ### LoCo (Long Document Retrieval) | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | | gte-modernbert-base | 768 | 8192 | 88.88 | 54.45 | 93.00 | 99.82 | 98.03 | 98.70 | | gte-reranker-modernbert-base | - | 8192 | 90.68 | 70.86 | 94.06 | 99.73 | 99.11 | 89.67 | ### COIR (Code Retrieval Task) | Model Name | Dimension | Sequence Length | Average(20) | CodeSearchNet-ccr-go | CodeSearchNet-ccr-java | CodeSearchNet-ccr-javascript | CodeSearchNet-ccr-php | CodeSearchNet-ccr-python | CodeSearchNet-ccr-ruby | CodeSearchNet-go | CodeSearchNet-java | CodeSearchNet-javascript | CodeSearchNet-php | CodeSearchNet-python | CodeSearchNet-ruby | apps | codefeedback-mt | codefeedback-st | codetrans-contest | codetrans-dl | cosqa | stackoverflow-qa | synthetic-text2sql | |:----:|:---:|:---:|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 79.31 | 94.15 | 93.57 | 94.27 | 91.51 | 93.93 | 90.63 | 88.32 | 83.27 | 76.05 | 85.12 | 88.16 | 77.59 | 57.54 | 82.34 | 85.95 | 71.89 | 35.46 | 43.47 | 91.2 | 61.87 | | gte-reranker-modernbert-base | - | 8192 | 79.99 | 96.43 | 96.88 | 98.32 | 91.81 | 97.7 | 91.96 | 88.81 | 79.71 | 76.27 | 89.39 | 98.37 | 84.11 | 47.57 | 83.37 | 88.91 | 49.66 | 36.36 | 44.37 | 89.58 | 64.21 | ### BEIR | Model Name | Dimension | Sequence Length | Average(15) | ArguAna | ClimateFEVER | CQADupstackAndroidRetrieval | DBPedia | FEVER | FiQA2018 | HotpotQA | MSMARCO | NFCorpus | NQ | QuoraRetrieval | SCIDOCS | SciFact | Touche2020 | TRECCOVID | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 55.33 | 72.68 | 37.74 | 42.63 | 41.79 | 91.03 | 48.81 | 69.47 | 40.9 | 36.44 | 57.62 | 88.55 | 21.29 | 77.4 | 21.68 | 81.95 | | gte-reranker-modernbert-base | - | 8192 | 56.73 | 69.03 | 37.79 | 44.68 | 47.23 | 94.54 | 49.81 | 78.16 | 45.38 | 30.69 | 64.57 | 87.77 | 20.60 | 73.57 | 27.36 | 79.89 | ## Hiring We have open positions for **Research Interns** and **Full-Time Researchers** to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of **Beijing** and **Hangzhou**. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, feel free to give us a cite.",
23
- "model_explanation_gemini": "\"An English text reranker model based on ModernBERT, optimized for long-context retrieval tasks with competitive performance in benchmarks like MTEB, LoCo, and COIR.\"\n\nModel Features: \n- **Type**: Text reranker \n- **Base Model**: ModernBERT-base \n- **Language**: English \n- **Size**: 149M parameters \n- **Max Input Length**: 8192 tokens \n- **Benchmarks**: Strong performance in MTEB (64.38 avg), LoCo"
 
 
 
 
 
 
24
  }
 
20
  "region:us"
21
  ],
22
  "description": "--- license: apache-2.0 language: - en base_model: - answerdotai/ModernBERT-base base_model_relation: finetune pipeline_tag: text-ranking library_name: transformers tags: - sentence-transformers - transformers.js --- # gte-reranker-modernbert-base We are excited to introduce the series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The series models include both text embedding models and rerank models. The models demonstrates competitive performance in several text embedding and text retrieval evaluation tasks when compared to similar-scale models from the current open-source community. This includes assessments such as **MTEB**, **LoCO**, and **COIR** evaluation. ## Model Overview - Developed by: Tongyi Lab, Alibaba Group - Model Type: Text reranker - Primary Language: English - Model Size: 149M - Max Input Length: 8192 tokens ### Model list | Models | Language | Model Type | Model Size | Max Seq. Length | Dimension | MTEB-en | BEIR | LoCo | CoIR | |:--------------------------------------------------------------------------------------:|:--------:|:----------------------:|:----------:|:---------------:|:---------:|:-------:|:----:|:----:|:----:| | []( | English | text embedding | 149M | 8192 | 768 | 64.38 | 55.33 | 87.57 | 79.31 | | []( | English | text reranker | 149M | 8192 | - | - | 56.19 | 90.68 | 79.99 | ## Usage > [!TIP] > For and , if your GPU supports it, the efficient Flash Attention 2 will be used automatically if you have installed. It is not mandatory. > > Use with Use with : Before you start, install the sentence-transformers libraries: Use with ## Training Details The series of models follows the training scheme of the previous GTE models, with the only difference being that the pre-training language model base has been replaced from GTE-MLM to ModernBert. For more training details, please refer to our paper: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. Given that all models in the series have a size of less than 1B parameters, we focused exclusively on the results of models under 1B from the MTEB leaderboard. | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:------------------------------------------------------------------------------------------------:|:--------------:|:---------:|:---------------:|:------------:|:-----------:|:---:|:---:|:---:|:---:|:-----------:|:--------:| | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5 | 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | gte-base-en-v1.5 | 137 | 768 | 8192 | 64.11 | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5 | 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | | gte-large-en-v1.5 | 409 | 1024 | 8192 | 65.39 | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | modernbert-embed-base | 149 | 768 | 8192 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 | | nomic-embed-text-v1.5 | | 768 | 8192 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01| 81.94 | 30.4 | | gte-multilingual-base | 305 | 768 | 8192 | 61.4 | 70.89 | 44.31 | 84.24 | 57.47 |51.08 | 82.11 | 30.58 | | jina-embeddings-v3 | 572 | 1024 | 8192 | 65.51 | 82.58 |45.21 |84.01 |58.13 |53.88 | 85.81 | 29.71 | | **gte-modernbert-base** | 149 | 768 | 8192 | **64.38** | **76.99** | **46.47** | **85.93** | **59.24** | **55.33** | **81.57** | **30.68** | ### LoCo (Long Document Retrieval) | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | | gte-modernbert-base | 768 | 8192 | 88.88 | 54.45 | 93.00 | 99.82 | 98.03 | 98.70 | | gte-reranker-modernbert-base | - | 8192 | 90.68 | 70.86 | 94.06 | 99.73 | 99.11 | 89.67 | ### COIR (Code Retrieval Task) | Model Name | Dimension | Sequence Length | Average(20) | CodeSearchNet-ccr-go | CodeSearchNet-ccr-java | CodeSearchNet-ccr-javascript | CodeSearchNet-ccr-php | CodeSearchNet-ccr-python | CodeSearchNet-ccr-ruby | CodeSearchNet-go | CodeSearchNet-java | CodeSearchNet-javascript | CodeSearchNet-php | CodeSearchNet-python | CodeSearchNet-ruby | apps | codefeedback-mt | codefeedback-st | codetrans-contest | codetrans-dl | cosqa | stackoverflow-qa | synthetic-text2sql | |:----:|:---:|:---:|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 79.31 | 94.15 | 93.57 | 94.27 | 91.51 | 93.93 | 90.63 | 88.32 | 83.27 | 76.05 | 85.12 | 88.16 | 77.59 | 57.54 | 82.34 | 85.95 | 71.89 | 35.46 | 43.47 | 91.2 | 61.87 | | gte-reranker-modernbert-base | - | 8192 | 79.99 | 96.43 | 96.88 | 98.32 | 91.81 | 97.7 | 91.96 | 88.81 | 79.71 | 76.27 | 89.39 | 98.37 | 84.11 | 47.57 | 83.37 | 88.91 | 49.66 | 36.36 | 44.37 | 89.58 | 64.21 | ### BEIR | Model Name | Dimension | Sequence Length | Average(15) | ArguAna | ClimateFEVER | CQADupstackAndroidRetrieval | DBPedia | FEVER | FiQA2018 | HotpotQA | MSMARCO | NFCorpus | NQ | QuoraRetrieval | SCIDOCS | SciFact | Touche2020 | TRECCOVID | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gte-modernbert-base | 768 | 8192 | 55.33 | 72.68 | 37.74 | 42.63 | 41.79 | 91.03 | 48.81 | 69.47 | 40.9 | 36.44 | 57.62 | 88.55 | 21.29 | 77.4 | 21.68 | 81.95 | | gte-reranker-modernbert-base | - | 8192 | 56.73 | 69.03 | 37.79 | 44.68 | 47.23 | 94.54 | 49.81 | 78.16 | 45.38 | 30.69 | 64.57 | 87.77 | 20.60 | 73.57 | 27.36 | 79.89 | ## Hiring We have open positions for **Research Interns** and **Full-Time Researchers** to join our team at Tongyi Lab. We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems. Our team is located in the vibrant cities of **Beijing** and **Hangzhou**. If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href=\"mailto:[email protected]\">[email protected]</a>. ## Citation If you find our paper or models helpful, feel free to give us a cite.",
23
+ "model_explanation_gemini": "\"An English text reranker model based on ModernBERT, optimized for long-context retrieval tasks with competitive performance in benchmarks like MTEB, LoCo, and COIR.\"\n\nModel Features: \n- **Type**: Text reranker \n- **Base Model**: ModernBERT-base \n- **Language**: English \n- **Size**: 149M parameters \n- **Max Input Length**: 8192 tokens \n- **Benchmarks**: Strong performance in MTEB (64.38 avg), LoCo",
24
+ "release_year": "2023",
25
+ "parameter_count": null,
26
+ "is_fine_tuned": true,
27
+ "category": "Reranker",
28
+ "model_family": "BERT",
29
+ "api_enhanced": true
30
  }
model_data_json/Alpha-VLLM_Lumina-Image-2.0.json CHANGED
@@ -11,5 +11,10 @@
11
  "region:us"
12
  ],
13
  "description": "--- pipeline_tag: text-to-image library_name: diffusers license: apache-2.0 --- !Lumina-Image 2.0 Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transformer capable of generating images from text descriptions. For more information, visit our GitHub. ## Gradio Demo We provide an official Gradio demo. You can use the link we provided to try it out. ## Usage This is a Hugging Face Diffusers implementation of the paper Lumina-Image 2.0: A Unified and Efficient Image Generative Framework. ## Citation If you find the provided code or models useful for your research, consider citing them as:",
14
- "model_explanation_gemini": "Generates images from text descriptions using a 2B parameter flow-based diffusion transformer. \n\nFeatures: \n- Text-to-image generation \n- 2 billion parameter architecture \n- Flow-based diffusion transformer \n- Hugging Face Diffusers implementation \n\nComparison: Not explicitly compared to other models in the description."
 
 
 
 
 
15
  }
 
11
  "region:us"
12
  ],
13
  "description": "--- pipeline_tag: text-to-image library_name: diffusers license: apache-2.0 --- !Lumina-Image 2.0 Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transformer capable of generating images from text descriptions. For more information, visit our GitHub. ## Gradio Demo We provide an official Gradio demo. You can use the link we provided to try it out. ## Usage This is a Hugging Face Diffusers implementation of the paper Lumina-Image 2.0: A Unified and Efficient Image Generative Framework. ## Citation If you find the provided code or models useful for your research, consider citing them as:",
14
+ "model_explanation_gemini": "Generates images from text descriptions using a 2B parameter flow-based diffusion transformer. \n\nFeatures: \n- Text-to-image generation \n- 2 billion parameter architecture \n- Flow-based diffusion transformer \n- Hugging Face Diffusers implementation \n\nComparison: Not explicitly compared to other models in the description.",
15
+ "release_year": "2024",
16
+ "parameter_count": null,
17
+ "is_fine_tuned": false,
18
+ "category": "Diffusion",
19
+ "api_enhanced": true
20
  }
model_data_json/Alpha-VLLM_Lumina-mGPT-7B-768.json CHANGED
@@ -14,5 +14,11 @@
14
  "region:us"
15
  ],
16
  "description": "--- pipeline_tag: any-to-any tags: - Any2Any --- **Lumina-mGPT** is a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. ![Lumina-mGPT]( !image/png # Usage We provide the implementation of Lumina-mGPT, as well as sampling code, in our github repository.",
17
- "model_explanation_gemini": "Generates photorealistic images from text descriptions using multimodal autoregressive capabilities.\n\nFeatures: \n- Multimodal (vision and language) \n- Autoregressive \n- Text-to-image generation \n- Photorealistic output \n\nComparison: \nExcels in flexible photorealistic image generation compared to other multimodal models, with a focus on autoregressive any-to-any task handling."
 
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- pipeline_tag: any-to-any tags: - Any2Any --- **Lumina-mGPT** is a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. ![Lumina-mGPT]( !image/png # Usage We provide the implementation of Lumina-mGPT, as well as sampling code, in our github repository.",
17
+ "model_explanation_gemini": "Generates photorealistic images from text descriptions using multimodal autoregressive capabilities.\n\nFeatures: \n- Multimodal (vision and language) \n- Autoregressive \n- Text-to-image generation \n- Photorealistic output \n\nComparison: \nExcels in flexible photorealistic image generation compared to other multimodal models, with a focus on autoregressive any-to-any task handling.",
18
+ "release_year": "2024",
19
+ "parameter_count": "7B",
20
+ "is_fine_tuned": false,
21
+ "category": "Large Language Model",
22
+ "model_family": "GPT",
23
+ "api_enhanced": true
24
  }
model_data_json/Alvenir_bert-punct-restoration-da.json CHANGED
@@ -15,5 +15,11 @@
15
  "region:us"
16
  ],
17
  "description": "--- language: da tags: - bert - punctuation restoration license: apache-2.0 datasets: - custom --- # Bert Punctuation Restoration Danish This model performs the punctuation restoration task in Danish. The method used is sequence classification similar to how NER models are trained. ## Model description TODO ### How to use The model requires some additional inference code, hence we created an awesome little pip package for inference. The inference code is based on the pipeline from huggingface. First, install the little package by running Then restoration is as simple as the following snippet: ## Training data To Do ## Training procedure To Do ### Preprocessing TODO ## Evaluation results TODO",
18
- "model_explanation_gemini": "Restores punctuation in Danish text using BERT-based sequence classification.\n\nFeatures: \n- Language: Danish (da) \n- Task: Punctuation restoration \n- Architecture: BERT \n- Training method: Sequence classification (NER-like approach) \n- License: Apache-2.0 \n- Custom dataset \n\n(No comparative analysis provided in the original description)"
 
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- language: da tags: - bert - punctuation restoration license: apache-2.0 datasets: - custom --- # Bert Punctuation Restoration Danish This model performs the punctuation restoration task in Danish. The method used is sequence classification similar to how NER models are trained. ## Model description TODO ### How to use The model requires some additional inference code, hence we created an awesome little pip package for inference. The inference code is based on the pipeline from huggingface. First, install the little package by running Then restoration is as simple as the following snippet: ## Training data To Do ## Training procedure To Do ### Preprocessing TODO ## Evaluation results TODO",
18
+ "model_explanation_gemini": "Restores punctuation in Danish text using BERT-based sequence classification.\n\nFeatures: \n- Language: Danish (da) \n- Task: Punctuation restoration \n- Architecture: BERT \n- Training method: Sequence classification (NER-like approach) \n- License: Apache-2.0 \n- Custom dataset \n\n(No comparative analysis provided in the original description)",
19
+ "release_year": null,
20
+ "parameter_count": null,
21
+ "is_fine_tuned": false,
22
+ "category": "Named Entity Recognition",
23
+ "model_family": "BERT",
24
+ "api_enhanced": true
25
  }
model_data_json/Amrrs_wav2vec2-large-xlsr-53-tamil.json CHANGED
@@ -18,5 +18,11 @@
18
  "region:us"
19
  ],
20
  "description": "--- language: ta datasets: - common_voice tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Tamil by Amrrs results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ta type: common_voice args: ta metrics: - name: Test WER type: wer value: 82.94 --- # Wav2Vec2-Large-XLSR-53-Tamil Fine-tuned facebook/wav2vec2-large-xlsr-53 in Tamil using the Common Voice When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated as follows on the {language} test data of Common Voice. **Test Result**: 82.94 % ## Training The Common Voice , datasets were used for training. The script used for training can be found here",
21
- "model_explanation_gemini": "\"Fine-tuned for Tamil automatic speech recognition using Common Voice data, achieving an 82.94% WER.\"\n\nFeatures: \n- Language: Tamil (ta) \n- Base Model: facebook/wav2vec2-large-xlsr-53 \n- Task: Automatic Speech Recognition (ASR) \n- Training Data: Common Voice dataset \n- Input Requirement: 16kHz sampled audio \n- Metric: 82.94% Word Error Rate (WER) \n\nComparison:"
 
 
 
 
 
 
22
  }
 
18
  "region:us"
19
  ],
20
  "description": "--- language: ta datasets: - common_voice tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Tamil by Amrrs results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ta type: common_voice args: ta metrics: - name: Test WER type: wer value: 82.94 --- # Wav2Vec2-Large-XLSR-53-Tamil Fine-tuned facebook/wav2vec2-large-xlsr-53 in Tamil using the Common Voice When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated as follows on the {language} test data of Common Voice. **Test Result**: 82.94 % ## Training The Common Voice , datasets were used for training. The script used for training can be found here",
21
+ "model_explanation_gemini": "\"Fine-tuned for Tamil automatic speech recognition using Common Voice data, achieving an 82.94% WER.\"\n\nFeatures: \n- Language: Tamil (ta) \n- Base Model: facebook/wav2vec2-large-xlsr-53 \n- Task: Automatic Speech Recognition (ASR) \n- Training Data: Common Voice dataset \n- Input Requirement: 16kHz sampled audio \n- Metric: 82.94% Word Error Rate (WER) \n\nComparison:",
22
+ "release_year": null,
23
+ "parameter_count": null,
24
+ "is_fine_tuned": false,
25
+ "category": "Audio",
26
+ "model_family": "Wav2Vec",
27
+ "api_enhanced": true
28
  }
model_data_json/Angelakeke_RaTE-NER-Deberta.json CHANGED
@@ -15,5 +15,11 @@
15
  "region:us"
16
  ],
17
  "description": "--- license: mit language: - en tags: - medical - radiology model-index: - name: rate-ner-rad results: [] pipeline_tag: token-classification widget: - text: No suspicious focal mass lesion is seen in the left kidney. example_title: Example in radiopaedia --- # RaTE-NER-Deberta This model is a fine-tuned version of DeBERTa on the RaTE-NER dataset. ## Model description This model is trained to serve the RaTEScore metric, if you are interested in our pipeline, please refer to our paper and Github. This model also can be used to extract **Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease** in medical radiology reports. ## Usage <details> <summary> Click to expand the usage of this model. </summary> <pre><code> from transformers import AutoTokenizer, AutoModelForTokenClassification import torch def post_process(tokenized_text, predicted_entities, tokenizer): entity_spans = [] start = end = None entity_type = None for i, (token, label) in enumerate(zip(tokenized_text, predicted_entities[:len(tokenized_text)])): if token in [\"[CLS]\", \"[SEP]\"]: continue if label != \"O\" and i < len(predicted_entities) - 1: if label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"I-\"): start = i entity_type = label[2:] elif label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"B-\"): start = i end = i entity_spans.append((start, end, label[2:])) start = i entity_type = label[2:] elif label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"O\"): start = i end = i entity_spans.append((start, end, label[2:])) start = end = None entity_type = None elif label.startswith(\"I-\") and predicted_entities[i+1].startswith(\"B-\"): end = i if start is not None: entity_spans.append((start, end, entity_type)) start = i entity_type = label[2:] elif label.startswith(\"I-\") and predicted_entities[i+1].startswith(\"O\"): end = i if start is not None: entity_spans.append((start, end, entity_type)) start = end = None entity_type = None if start is not None and end is None: end = len(tokenized_text) - 2 entity_spans.append((start, end, entity_type)) save_pair = [] for start, end, entity_type in entity_spans: entity_str = tokenizer.convert_tokens_to_string(tokenized_text[start:end+1]) save_pair.append((entity_str, entity_type)) return save_pair def run_ner(texts, idx2label, tokenizer, model, device): inputs = tokenizer(texts, max_length=512, padding=True, truncation=True, return_tensors=\"pt\").to(device) with torch.no_grad(): outputs = model(**inputs) predicted_labels = torch.argmax(outputs.logits, dim=2).tolist() save_pairs = [] for i in range(len(texts)): predicted_entities = [idx2label[label] for label in predicted_labels[i]] non_pad_mask = inputs[\"input_ids\"][i] != tokenizer.pad_token_id non_pad_length = non_pad_mask.sum().item() non_pad_input_ids = inputs[\"input_ids\"][i][:non_pad_length] tokenized_text = tokenizer.convert_ids_to_tokens(non_pad_input_ids) save_pair = post_process(tokenized_text, predicted_entities, tokenizer) if i == 0: save_pairs = save_pair else: save_pairs.extend(save_pair) return save_pairs ner_labels = ['B-ABNORMALITY', 'I-ABNORMALITY', 'B-NON-ABNORMALITY', 'I-NON-ABNORMALITY', 'B-DISEASE', 'I-DISEASE', 'B-NON-DISEASE', 'I-NON-DISEASE', 'B-ANATOMY', 'I-ANATOMY', 'O'] idx2label = {i: label for i, label in enumerate(ner_labels)} tokenizer = AutoTokenizer.from_pretrained('Angelakeke/RaTE-NER-Deberta') model = AutoModelForTokenClassification.from_pretrained('Angelakeke/RaTE-NER-Deberta') device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\") model.to(device) model.eval() We recommend to inference by sentences. text = \"\" texts = text.split('. ') save_pair = run_ner(texts, idx2label, tokenizer, model, device) </code></pre> </details> ## Author Author: Weike Zhao If you have any questions, please feel free to contact [email protected]. ## Citation",
18
- "model_explanation_gemini": "Extracts medical entities (Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease) from radiology reports using a fine-tuned DeBERTa model trained on the RaTE-NER dataset. \n\n**Features**: \n- Token classification for medical NER \n- Supports entity types: Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease \n- Optimized for radiology reports \n- Built on DeBERTa architecture \n\n**Comparison**:"
 
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- license: mit language: - en tags: - medical - radiology model-index: - name: rate-ner-rad results: [] pipeline_tag: token-classification widget: - text: No suspicious focal mass lesion is seen in the left kidney. example_title: Example in radiopaedia --- # RaTE-NER-Deberta This model is a fine-tuned version of DeBERTa on the RaTE-NER dataset. ## Model description This model is trained to serve the RaTEScore metric, if you are interested in our pipeline, please refer to our paper and Github. This model also can be used to extract **Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease** in medical radiology reports. ## Usage <details> <summary> Click to expand the usage of this model. </summary> <pre><code> from transformers import AutoTokenizer, AutoModelForTokenClassification import torch def post_process(tokenized_text, predicted_entities, tokenizer): entity_spans = [] start = end = None entity_type = None for i, (token, label) in enumerate(zip(tokenized_text, predicted_entities[:len(tokenized_text)])): if token in [\"[CLS]\", \"[SEP]\"]: continue if label != \"O\" and i < len(predicted_entities) - 1: if label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"I-\"): start = i entity_type = label[2:] elif label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"B-\"): start = i end = i entity_spans.append((start, end, label[2:])) start = i entity_type = label[2:] elif label.startswith(\"B-\") and predicted_entities[i+1].startswith(\"O\"): start = i end = i entity_spans.append((start, end, label[2:])) start = end = None entity_type = None elif label.startswith(\"I-\") and predicted_entities[i+1].startswith(\"B-\"): end = i if start is not None: entity_spans.append((start, end, entity_type)) start = i entity_type = label[2:] elif label.startswith(\"I-\") and predicted_entities[i+1].startswith(\"O\"): end = i if start is not None: entity_spans.append((start, end, entity_type)) start = end = None entity_type = None if start is not None and end is None: end = len(tokenized_text) - 2 entity_spans.append((start, end, entity_type)) save_pair = [] for start, end, entity_type in entity_spans: entity_str = tokenizer.convert_tokens_to_string(tokenized_text[start:end+1]) save_pair.append((entity_str, entity_type)) return save_pair def run_ner(texts, idx2label, tokenizer, model, device): inputs = tokenizer(texts, max_length=512, padding=True, truncation=True, return_tensors=\"pt\").to(device) with torch.no_grad(): outputs = model(**inputs) predicted_labels = torch.argmax(outputs.logits, dim=2).tolist() save_pairs = [] for i in range(len(texts)): predicted_entities = [idx2label[label] for label in predicted_labels[i]] non_pad_mask = inputs[\"input_ids\"][i] != tokenizer.pad_token_id non_pad_length = non_pad_mask.sum().item() non_pad_input_ids = inputs[\"input_ids\"][i][:non_pad_length] tokenized_text = tokenizer.convert_ids_to_tokens(non_pad_input_ids) save_pair = post_process(tokenized_text, predicted_entities, tokenizer) if i == 0: save_pairs = save_pair else: save_pairs.extend(save_pair) return save_pairs ner_labels = ['B-ABNORMALITY', 'I-ABNORMALITY', 'B-NON-ABNORMALITY', 'I-NON-ABNORMALITY', 'B-DISEASE', 'I-DISEASE', 'B-NON-DISEASE', 'I-NON-DISEASE', 'B-ANATOMY', 'I-ANATOMY', 'O'] idx2label = {i: label for i, label in enumerate(ner_labels)} tokenizer = AutoTokenizer.from_pretrained('Angelakeke/RaTE-NER-Deberta') model = AutoModelForTokenClassification.from_pretrained('Angelakeke/RaTE-NER-Deberta') device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\") model.to(device) model.eval() We recommend to inference by sentences. text = \"\" texts = text.split('. ') save_pair = run_ner(texts, idx2label, tokenizer, model, device) </code></pre> </details> ## Author Author: Weike Zhao If you have any questions, please feel free to contact [email protected]. ## Citation",
18
+ "model_explanation_gemini": "Extracts medical entities (Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease) from radiology reports using a fine-tuned DeBERTa model trained on the RaTE-NER dataset. \n\n**Features**: \n- Token classification for medical NER \n- Supports entity types: Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease \n- Optimized for radiology reports \n- Built on DeBERTa architecture \n\n**Comparison**:",
19
+ "release_year": null,
20
+ "parameter_count": null,
21
+ "is_fine_tuned": false,
22
+ "category": "Named Entity Recognition",
23
+ "model_family": "BERT",
24
+ "api_enhanced": true
25
  }
model_data_json/Anonymous-Pineapple_roberta-base.json CHANGED
@@ -12,5 +12,11 @@
12
  "region:us"
13
  ],
14
  "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID <!-- Provide a quick summary of what the model is/does. --> ## Model Details ### Model Description <!-- Provide a longer summary of what this model is. --> This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] <!-- Provide the basic links for the model. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> ### Direct Use <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> [More Information Needed] ### Downstream Use [optional] <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> [More Information Needed] ### Training Procedure <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> #### Speeds, Sizes, Times [optional] <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> [More Information Needed] ## Evaluation <!-- This section describes the evaluation protocols and provides the results. --> ### Testing Data, Factors & Metrics #### Testing Data <!-- This should link to a Dataset Card if possible. --> [More Information Needed] #### Factors <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> [More Information Needed] #### Metrics <!-- These are the evaluation metrics being used, ideally with a description of why. --> [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] <!-- Relevant interpretability work for the model goes here --> [More Information Needed] ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]",
15
- "model_explanation_gemini": "The model lacks sufficient details to determine its purpose, features, or functionality, as the description contains placeholder text and no specific information.\n\nFeatures: \n- No identifiable features or tasks are listed in the description. \n\nComparison: \n- Not applicable due to insufficient information about the model's capabilities or architecture."
 
 
 
 
 
 
16
  }
 
12
  "region:us"
13
  ],
14
  "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID <!-- Provide a quick summary of what the model is/does. --> ## Model Details ### Model Description <!-- Provide a longer summary of what this model is. --> This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] <!-- Provide the basic links for the model. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> ### Direct Use <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> [More Information Needed] ### Downstream Use [optional] <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> [More Information Needed] ### Training Procedure <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> #### Speeds, Sizes, Times [optional] <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> [More Information Needed] ## Evaluation <!-- This section describes the evaluation protocols and provides the results. --> ### Testing Data, Factors & Metrics #### Testing Data <!-- This should link to a Dataset Card if possible. --> [More Information Needed] #### Factors <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> [More Information Needed] #### Metrics <!-- These are the evaluation metrics being used, ideally with a description of why. --> [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] <!-- Relevant interpretability work for the model goes here --> [More Information Needed] ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]",
15
+ "model_explanation_gemini": "The model lacks sufficient details to determine its purpose, features, or functionality, as the description contains placeholder text and no specific information.\n\nFeatures: \n- No identifiable features or tasks are listed in the description. \n\nComparison: \n- Not applicable due to insufficient information about the model's capabilities or architecture.",
16
+ "release_year": "2019",
17
+ "parameter_count": null,
18
+ "is_fine_tuned": false,
19
+ "category": "Other",
20
+ "model_family": "BERT",
21
+ "api_enhanced": true
22
  }
model_data_json/Anzhc_Anzhcs_YOLOs.json CHANGED
@@ -14,5 +14,10 @@
14
  "region:us"
15
  ],
16
  "description": "--- license: agpl-3.0 tags: - pytorch - YOLOv8 - art - Ultralytics base_model: - Ultralytics/YOLOv8 - Ultralytics/YOLO11 library_name: ultralytics pipeline_tag: object-detection metrics: - mAP50 - mAP50-95 --- # Description YOLOs in this repo are trained with datasets that i have annotated myself, or with the help of my friends(They will be appropriately mentioned in those cases). YOLOs on open datasets will have their own pages. #### Want to request a model? Im open to commissions, hit me up in Discord - **anzhc** > ## **Table of Contents** > - **Face segmentation** > - *Universal* > - *Real Face, gendered* > - **Eyes segmentation** > - **Head+Hair segmentation** > - **Breasts** > - *Breasts Segmentation* > - *Breast size detection/classification* > - **Drone detection** > - **Anime Art Scoring** > - **Support** P.S. All model names in tables have download links attached :3 ## Available Models ### Face segmentation: #### Universal: Series of models aiming at detecting and segmenting face accurately. Trained on closed dataset i annotated myself. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| |----------------------------------------------------------------------------|-----------------------|--------------------------------|---------------------------|---------------|------------|-------------------| | Anzhc Face -seg.pt | Face: illustration, real | LOST DATA | LOST DATA |2(male, female)|LOST DATA| 640| | Anzhc Face seg 640 v2 y8n.pt | Face: illustration, real |0.791(box) 0.765(mask) | 0.608(box) 0.445(mask)|1(face) |~500| 640| | Anzhc Face seg 768 v2 y8n.pt | Face: illustration, real | 0.765(box) 0.748(mask) | 0.572(box) 0.431(mask) |1(face) |~500| 768| | Anzhc Face seg 768MS v2 y8n.pt | Face: illustration, real | 0.807(box) 0.770(mask) | 0.601(box) 0.432(mask) |1(face) |~500| 768|(Multi-scale)| | Anzhc Face seg 1024 v2 y8n.pt | Face: illustration, real | 0.768(box) 0.740(mask) | 0.557(box) 0.394(mask)|1(face) |~500| 1024| | Anzhc Face seg 640 v3 y11n.pt | Face: illustration | 0.882(box) 0.871(mask) | 0.689(box) 0.570(mask)|1(face) |~660| 640| UPDATE: v3 model has a bit different face target compared to v2, so stats of v2 models suffer compared to v3 in newer benchmark, especially in mask, while box is +- same. Dataset for v3 and above is going to be targeting inclusion of eyebrows and full eyelashes, for better adetailer experience without large dillution parameter. Also starting from v3, im moving to yolo11 models, as they seem to be direct upgrade over v8. v12 did not show significant improvement while requiring 50% more time to train, even with installed Flash Attention, so it's unlikely i will switch to it anytime soon. Benchmark was performed in 640px. Difference in v2 models are only in their target resolution, so their performance spread is marginal. !image/png !image/png #### Real Face, gendered: Trained only on real photos for the most part, so will perform poorly with illustrations, but is gendered, and can be used for male/female detection stack. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhcs ManFace v02 1024 y8n.pt | Face: real | 0.883(box),0.883(mask) | 0.778(box), 0.704(mask) |1(face) |~340 |1024| | Anzhcs WomanFace v05 1024 y8n.pt | Face: real | 0.82(box),0.82(mask) | 0.713(box), 0.659(mask) |1(face) |~600 |1024| Benchmark was performed in 640px. !image/png !image/png ### Eyes segmentation: Was trained for the purpose of inpainting eyes with Adetailer extension, and specializes on detecting anime eyes, particularly - sclera area, without adding eyelashes and outer eye area to detection. Current benchmark is likely inaccurate (but it is all i have), due to data being re-scrambled multi times (dataset expansion for future versions). | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc Eyes -seg-hd.pt | Eyes: illustration | 0.925(box),0.868(mask) | 0.721(box), 0.511(mask) |1(eye) |~500(?) |1024| !image/png !image/png ### Head+Hair segmentation: An old model (one of my first). Detects head + hair. Can be useful in likeness inpaint pipelines that need to be automated. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc HeadHair seg y8n.pt | Head: illustration, real | 0.775(box),0.777(mask) | 0.576(box), 0.552(mask) |1(head) |~3180 |640| | Anzhc HeadHair seg y8m.pt | Head: illustration, real | 0.867(box),0.862(mask) | 0.674(box), 0.626(mask) |1(head) |~3180 |640| !image/png !image/png ### Breasts: #### Breasts segmentation: Model for segmenting breasts. Was trained on anime images only, therefore has very weak realistic performance, but still is possible. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc Breasts Seg v1 1024n.pt | Breasts: illustration | 0.742(box),0.73(mask) | 0.563(box), 0.535(mask) |1(breasts) |~2000 |1024| | Anzhc Breasts Seg v1 1024s.pt | Breasts: illustration | 0.768(box),0.763(mask) | 0.596(box), 0.575(mask) |1(breasts) |~2000 |1024| | Anzhc Breasts Seg v1 1024m.pt | Breasts: illustration | 0.782(box),0.775(mask) | 0.644(box), 0.614(mask) |1(breasts) |~2000 |1024| !image/png !image/png #### Breast size detection and classification: Model for Detecting and classifying breast size. Can be used for tagging and moderating content. Utilizes custom scale, combining default Booru sizes with quite freeform upper range of scale from rule34, simplifying and standartizing it. Size range is established relative to body proportion, instead of relative to scene, to not be confused in cases of gigantism and be disentangled from scene. And of course it's subjective, since i was the only one annotating data. | Model | Target |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- |---------------|------------|-------------------| | Anzhcs Breast Size det cls v8 y11m.pt| Breasts: illustration and real |15(size range)|~16100 |640| mAPs are not displayed in table, because i think we need more complex stats for this model. !image/png Accurate ratio - correct predictions, exactly matching val. +1, -1, +-1 ratio - expanded range of acceptable predictions, by +,- and +-1 class. I suggest using this stat as main accuracy, because +-1 range is likely an acceptable margin of error. At annotation, usual rate of error of original data according to this size scale was in range of +-2 to +-3 in some cases, so +-1 should be quite good. Miscalss ratio - Correct detection, but classification goes beyond +-1 error. Miss ratio - Not seen by model, completely missed. False-Positive ratio - Detection of something that isn't there. In case of this model i suspect that FPR is also including confusion rate. In some cases multiple detection will be made for single instance, and only 1 will be accepted. That can be counted as false-positive, while it will be covered in +-1 acc. Actual FPR should be lower than reported, as tested manually. GT Instances - amount of instances of data per class in dataset. With that established, v8 provides pretty decent quality detection and classification, except for extremes of class 11+, and class 0(flat chest), well, since it's not too simple to detect what's not there. Class 2(medium) is one of the most confusing in this case, and has lowest accuracy. From charts, it's mostly mistaken with class 1. Rest of classes with reasonable amount of data perform quite well, and achieve high 70s to mid 80s for normal sizes, and up to high 90s for bigger size range. Misclassification is quite rare, and im happy with model performance in that regard. Average rate of misclassification is just ~3%. Missing predictions is unfortunately over 10%, but data is highly skewed with classes 0-2, which are hard to detect. FPR for v8 is very reasonable, assuming confused detections(of 2 classes at once) are counted as FPR. Size range is smooth, and lots of cases where both classes could be applied. Last class(unmeasurable) is used for classifying outliers that are hard to measure in currently visible area(e.g. mostly out of frame), but model will try to reasonably predict obstructed and partially visible instances. All ratios are calculated relative to their respective GT instance count. I will continue to use this benchmark approach for future detection models. !image/png ### Drone detection Model for segmenting and detecting drones. What a wild swing after entry for breast model, huh. I don't really know, just had an idea, made it work, here we are. **I would highly advice against using it in anything serious.** Starting from v03. Consider it as v1, since v03 is my internal iteration. HIGHLY SENSITIVE TO DRONE MODELS - will have hard time detecting certain types, especially close-up. Performs poorly on cluttered background. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhcs Drones v03 1024 y11n.pt | Drones | 0.927(box) 0.888(mask) | 0.753(box) 0.508(mask) |1(drone) |~3460 |1024| !image/png !image/png ## Anime Art Scoring A classification model trained to assign a percentile group based on human preference, instead of trying to directly assign a \"quality\" label. Dataset was composed of about 100k images aged from 1 to 2 years on Danbooru (newer and older images were not used). That limits data to images that were sufficiently viewed and rated, while not being overly exposed due to age, nor underexposed. Scores were used and split into percentile groups, each 10%. Main interest in making this one was to find out if there is a significant discoverable correlation between scores and image quality. Here are my custom charts: !image/png (top100 is second class due to alphabetical sorting, but for margin acceptance chart it was re-sorted) From this chart, considering there are 10 classes in total, i found weak-to-modest correlation between scores and upper half of chart, negative correlation with middle-low part, weak for low, and moderate for lowest. What does that mean? It means that there is meaningful correlation between scoring of people relative to features of art in question, but there is no meaningful correlation between art that is scoring neutrally. Negative scoring (top80-100) has moderate correlation, which suggests that there are some uniform negative features we can infere. Top60 class is very interesting, because it presents no correlation between provided images, even in top-3 accuracy(it performs at near-random selection in that case(10%)). That suggests that there is no feature correlation between art being not noticed, at least not the one YOLO was able to find. We can reasonably predict art that will end up in top of the chart by human score, but we are not able to predict middle-of-the line art, which would constitute majority of art in real case. We can predict low quality based on human preference reasonably well, but far from ideal. Margin acceptance charts - A top-1 accuracy, but with margin of class acceptance(1, 2 and 3(starts with -1, then adds +1 and then -2 class)(it/s not +-1-3 as naming suggests)) This allows us to see how well are classes correlate. If we see significant increase relative to first chart, that means that second best prediction was selected as top-1. We can also see extended correlation trend across classes. We once again can see that middle classes have very low correlation and accuracy, suggesting no meaningful features. That kinda suggests to me that there is no reason for art that ended up in middle of dataset to be there, and it would end up higher or lower in perfect world. Top10-40 correlates very well, and that can be used for human preference detection. Funny note on that: **bigger the breasts - better the score**. And i wholeheartedly support that notion. NSFW art in general will have higher preference score, well, what an unexpected outcome, amirite? Dataset was composed ~50/50% from Danbooru/Safebooru(safebooru.donmai.us), so it's not due to overrepresentation of NSFW. That is also why you should not use scores for quality tagging, but if you are looking for a thing to maintain high compatibility with current anime models - be my guest. Correlation between bottom scores(that you'd use for low quality/worst quality) is weaker, so be conservative with that. Bigger model and data will likely see more correlation, but from quick test of just running larger variation did not lead me to better performance. | Model | Target |Top-1 acc/(w/ margin(1/2/3))|Top-2 acc|Top-3 acc|Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- |---------|---------|---------|---------------|------------|-------------------| | Anzhcs Anime Score CLS v1.pt| Anime illustration |0.336(0.467/0.645/0.679)|0.566|0.696|10(top10 to top100)|~98000 |224| Additionally, i will provide a script for tagging your datasets with that, if you want - Simple Utility Scripts repo ### Support If you want to support me, feel free to donate on ko-fi: Or send me some BTC: bc1qpc5kmxrpqp6x8ykdu6976s4rvsz0utk22h80j9 /--UNDER CONSTRUCTION--/",
17
- "model_explanation_gemini": "Detects and segments facial features, eyes, head/hair, and breasts in illustrations and real photos, with specialized models for gendered detection and anime art scoring. \n\n**Features:** \n- Face segmentation (universal & gendered) \n- Eyes segmentation (anime-focused) \n- Head + hair segmentation \n- Breasts segmentation & size classification \n- Drone detection \n- Anime art scoring \n- Trained on custom-annotated datasets \n- Supports YOLOv8 and YOLOv"
 
 
 
 
 
18
  }
 
14
  "region:us"
15
  ],
16
  "description": "--- license: agpl-3.0 tags: - pytorch - YOLOv8 - art - Ultralytics base_model: - Ultralytics/YOLOv8 - Ultralytics/YOLO11 library_name: ultralytics pipeline_tag: object-detection metrics: - mAP50 - mAP50-95 --- # Description YOLOs in this repo are trained with datasets that i have annotated myself, or with the help of my friends(They will be appropriately mentioned in those cases). YOLOs on open datasets will have their own pages. #### Want to request a model? Im open to commissions, hit me up in Discord - **anzhc** > ## **Table of Contents** > - **Face segmentation** > - *Universal* > - *Real Face, gendered* > - **Eyes segmentation** > - **Head+Hair segmentation** > - **Breasts** > - *Breasts Segmentation* > - *Breast size detection/classification* > - **Drone detection** > - **Anime Art Scoring** > - **Support** P.S. All model names in tables have download links attached :3 ## Available Models ### Face segmentation: #### Universal: Series of models aiming at detecting and segmenting face accurately. Trained on closed dataset i annotated myself. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| |----------------------------------------------------------------------------|-----------------------|--------------------------------|---------------------------|---------------|------------|-------------------| | Anzhc Face -seg.pt | Face: illustration, real | LOST DATA | LOST DATA |2(male, female)|LOST DATA| 640| | Anzhc Face seg 640 v2 y8n.pt | Face: illustration, real |0.791(box) 0.765(mask) | 0.608(box) 0.445(mask)|1(face) |~500| 640| | Anzhc Face seg 768 v2 y8n.pt | Face: illustration, real | 0.765(box) 0.748(mask) | 0.572(box) 0.431(mask) |1(face) |~500| 768| | Anzhc Face seg 768MS v2 y8n.pt | Face: illustration, real | 0.807(box) 0.770(mask) | 0.601(box) 0.432(mask) |1(face) |~500| 768|(Multi-scale)| | Anzhc Face seg 1024 v2 y8n.pt | Face: illustration, real | 0.768(box) 0.740(mask) | 0.557(box) 0.394(mask)|1(face) |~500| 1024| | Anzhc Face seg 640 v3 y11n.pt | Face: illustration | 0.882(box) 0.871(mask) | 0.689(box) 0.570(mask)|1(face) |~660| 640| UPDATE: v3 model has a bit different face target compared to v2, so stats of v2 models suffer compared to v3 in newer benchmark, especially in mask, while box is +- same. Dataset for v3 and above is going to be targeting inclusion of eyebrows and full eyelashes, for better adetailer experience without large dillution parameter. Also starting from v3, im moving to yolo11 models, as they seem to be direct upgrade over v8. v12 did not show significant improvement while requiring 50% more time to train, even with installed Flash Attention, so it's unlikely i will switch to it anytime soon. Benchmark was performed in 640px. Difference in v2 models are only in their target resolution, so their performance spread is marginal. !image/png !image/png #### Real Face, gendered: Trained only on real photos for the most part, so will perform poorly with illustrations, but is gendered, and can be used for male/female detection stack. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhcs ManFace v02 1024 y8n.pt | Face: real | 0.883(box),0.883(mask) | 0.778(box), 0.704(mask) |1(face) |~340 |1024| | Anzhcs WomanFace v05 1024 y8n.pt | Face: real | 0.82(box),0.82(mask) | 0.713(box), 0.659(mask) |1(face) |~600 |1024| Benchmark was performed in 640px. !image/png !image/png ### Eyes segmentation: Was trained for the purpose of inpainting eyes with Adetailer extension, and specializes on detecting anime eyes, particularly - sclera area, without adding eyelashes and outer eye area to detection. Current benchmark is likely inaccurate (but it is all i have), due to data being re-scrambled multi times (dataset expansion for future versions). | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc Eyes -seg-hd.pt | Eyes: illustration | 0.925(box),0.868(mask) | 0.721(box), 0.511(mask) |1(eye) |~500(?) |1024| !image/png !image/png ### Head+Hair segmentation: An old model (one of my first). Detects head + hair. Can be useful in likeness inpaint pipelines that need to be automated. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc HeadHair seg y8n.pt | Head: illustration, real | 0.775(box),0.777(mask) | 0.576(box), 0.552(mask) |1(head) |~3180 |640| | Anzhc HeadHair seg y8m.pt | Head: illustration, real | 0.867(box),0.862(mask) | 0.674(box), 0.626(mask) |1(head) |~3180 |640| !image/png !image/png ### Breasts: #### Breasts segmentation: Model for segmenting breasts. Was trained on anime images only, therefore has very weak realistic performance, but still is possible. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhc Breasts Seg v1 1024n.pt | Breasts: illustration | 0.742(box),0.73(mask) | 0.563(box), 0.535(mask) |1(breasts) |~2000 |1024| | Anzhc Breasts Seg v1 1024s.pt | Breasts: illustration | 0.768(box),0.763(mask) | 0.596(box), 0.575(mask) |1(breasts) |~2000 |1024| | Anzhc Breasts Seg v1 1024m.pt | Breasts: illustration | 0.782(box),0.775(mask) | 0.644(box), 0.614(mask) |1(breasts) |~2000 |1024| !image/png !image/png #### Breast size detection and classification: Model for Detecting and classifying breast size. Can be used for tagging and moderating content. Utilizes custom scale, combining default Booru sizes with quite freeform upper range of scale from rule34, simplifying and standartizing it. Size range is established relative to body proportion, instead of relative to scene, to not be confused in cases of gigantism and be disentangled from scene. And of course it's subjective, since i was the only one annotating data. | Model | Target |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- |---------------|------------|-------------------| | Anzhcs Breast Size det cls v8 y11m.pt| Breasts: illustration and real |15(size range)|~16100 |640| mAPs are not displayed in table, because i think we need more complex stats for this model. !image/png Accurate ratio - correct predictions, exactly matching val. +1, -1, +-1 ratio - expanded range of acceptable predictions, by +,- and +-1 class. I suggest using this stat as main accuracy, because +-1 range is likely an acceptable margin of error. At annotation, usual rate of error of original data according to this size scale was in range of +-2 to +-3 in some cases, so +-1 should be quite good. Miscalss ratio - Correct detection, but classification goes beyond +-1 error. Miss ratio - Not seen by model, completely missed. False-Positive ratio - Detection of something that isn't there. In case of this model i suspect that FPR is also including confusion rate. In some cases multiple detection will be made for single instance, and only 1 will be accepted. That can be counted as false-positive, while it will be covered in +-1 acc. Actual FPR should be lower than reported, as tested manually. GT Instances - amount of instances of data per class in dataset. With that established, v8 provides pretty decent quality detection and classification, except for extremes of class 11+, and class 0(flat chest), well, since it's not too simple to detect what's not there. Class 2(medium) is one of the most confusing in this case, and has lowest accuracy. From charts, it's mostly mistaken with class 1. Rest of classes with reasonable amount of data perform quite well, and achieve high 70s to mid 80s for normal sizes, and up to high 90s for bigger size range. Misclassification is quite rare, and im happy with model performance in that regard. Average rate of misclassification is just ~3%. Missing predictions is unfortunately over 10%, but data is highly skewed with classes 0-2, which are hard to detect. FPR for v8 is very reasonable, assuming confused detections(of 2 classes at once) are counted as FPR. Size range is smooth, and lots of cases where both classes could be applied. Last class(unmeasurable) is used for classifying outliers that are hard to measure in currently visible area(e.g. mostly out of frame), but model will try to reasonably predict obstructed and partially visible instances. All ratios are calculated relative to their respective GT instance count. I will continue to use this benchmark approach for future detection models. !image/png ### Drone detection Model for segmenting and detecting drones. What a wild swing after entry for breast model, huh. I don't really know, just had an idea, made it work, here we are. **I would highly advice against using it in anything serious.** Starting from v03. Consider it as v1, since v03 is my internal iteration. HIGHLY SENSITIVE TO DRONE MODELS - will have hard time detecting certain types, especially close-up. Performs poorly on cluttered background. | Model | Target | mAP 50 | mAP 50-95 |Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- | ----------------------------- | ------------------------- |---------------|------------|-------------------| | Anzhcs Drones v03 1024 y11n.pt | Drones | 0.927(box) 0.888(mask) | 0.753(box) 0.508(mask) |1(drone) |~3460 |1024| !image/png !image/png ## Anime Art Scoring A classification model trained to assign a percentile group based on human preference, instead of trying to directly assign a \"quality\" label. Dataset was composed of about 100k images aged from 1 to 2 years on Danbooru (newer and older images were not used). That limits data to images that were sufficiently viewed and rated, while not being overly exposed due to age, nor underexposed. Scores were used and split into percentile groups, each 10%. Main interest in making this one was to find out if there is a significant discoverable correlation between scores and image quality. Here are my custom charts: !image/png (top100 is second class due to alphabetical sorting, but for margin acceptance chart it was re-sorted) From this chart, considering there are 10 classes in total, i found weak-to-modest correlation between scores and upper half of chart, negative correlation with middle-low part, weak for low, and moderate for lowest. What does that mean? It means that there is meaningful correlation between scoring of people relative to features of art in question, but there is no meaningful correlation between art that is scoring neutrally. Negative scoring (top80-100) has moderate correlation, which suggests that there are some uniform negative features we can infere. Top60 class is very interesting, because it presents no correlation between provided images, even in top-3 accuracy(it performs at near-random selection in that case(10%)). That suggests that there is no feature correlation between art being not noticed, at least not the one YOLO was able to find. We can reasonably predict art that will end up in top of the chart by human score, but we are not able to predict middle-of-the line art, which would constitute majority of art in real case. We can predict low quality based on human preference reasonably well, but far from ideal. Margin acceptance charts - A top-1 accuracy, but with margin of class acceptance(1, 2 and 3(starts with -1, then adds +1 and then -2 class)(it/s not +-1-3 as naming suggests)) This allows us to see how well are classes correlate. If we see significant increase relative to first chart, that means that second best prediction was selected as top-1. We can also see extended correlation trend across classes. We once again can see that middle classes have very low correlation and accuracy, suggesting no meaningful features. That kinda suggests to me that there is no reason for art that ended up in middle of dataset to be there, and it would end up higher or lower in perfect world. Top10-40 correlates very well, and that can be used for human preference detection. Funny note on that: **bigger the breasts - better the score**. And i wholeheartedly support that notion. NSFW art in general will have higher preference score, well, what an unexpected outcome, amirite? Dataset was composed ~50/50% from Danbooru/Safebooru(safebooru.donmai.us), so it's not due to overrepresentation of NSFW. That is also why you should not use scores for quality tagging, but if you are looking for a thing to maintain high compatibility with current anime models - be my guest. Correlation between bottom scores(that you'd use for low quality/worst quality) is weaker, so be conservative with that. Bigger model and data will likely see more correlation, but from quick test of just running larger variation did not lead me to better performance. | Model | Target |Top-1 acc/(w/ margin(1/2/3))|Top-2 acc|Top-3 acc|Classes |Dataset size|Training Resolution| | --------------------------- | --------------------- |---------|---------|---------|---------------|------------|-------------------| | Anzhcs Anime Score CLS v1.pt| Anime illustration |0.336(0.467/0.645/0.679)|0.566|0.696|10(top10 to top100)|~98000 |224| Additionally, i will provide a script for tagging your datasets with that, if you want - Simple Utility Scripts repo ### Support If you want to support me, feel free to donate on ko-fi: Or send me some BTC: bc1qpc5kmxrpqp6x8ykdu6976s4rvsz0utk22h80j9 /--UNDER CONSTRUCTION--/",
17
+ "model_explanation_gemini": "Detects and segments facial features, eyes, head/hair, and breasts in illustrations and real photos, with specialized models for gendered detection and anime art scoring. \n\n**Features:** \n- Face segmentation (universal & gendered) \n- Eyes segmentation (anime-focused) \n- Head + hair segmentation \n- Breasts segmentation & size classification \n- Drone detection \n- Anime art scoring \n- Trained on custom-annotated datasets \n- Supports YOLOv8 and YOLOv",
18
+ "release_year": null,
19
+ "parameter_count": null,
20
+ "is_fine_tuned": true,
21
+ "category": "Vision",
22
+ "api_enhanced": true
23
  }
model_data_json/Arki05_Grok-1-GGUF.json CHANGED
@@ -10,5 +10,10 @@
10
  "region:us"
11
  ],
12
  "description": "--- license: apache-2.0 --- # Grok-1 GGUF Quantizations This repository contains unofficial GGUF Quantizations of Grok-1, compatible with as of PR- Add grok-1 support #6204. ## Updates #### Native Split Support in llama.cpp - The splits have been updated to utilize the improvements from PR: llama_model_loader: support multiple split/shard GGUFs. As a result, manual merging with is no longer required. With this, there is no need to merge the split files before use. Just download all splits and run llama.cpp with the first split like you would previously. It'll detect the other splits and load them as well. #### Direct Split Download from huggingface using llama.cpp - Thanks to a new PR common: llama_load_model_from_url split support #6192 from phymbert it's now possible load model splits from url. That means this downloads and runs the model: And that is very cool (@phymbert) ## Available Quantizations The following Quantizations are currently available for download: | Quant | Split Files | Size | |----------|-------------|----------| | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 112.4 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 125.4 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 186.0 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 259.8 GB | I would recommend the version for now. *More Quantizations will be uploaded soon. All current Quants are created without any importance matrix.*",
13
- "model_explanation_gemini": "\"Unofficial GGUF quantizations of Grok-1, split into multiple files for easier handling and compatible with llama.cpp, offering various size options without manual merging.\"\n\nFeatures: \n- GGUF quantization format \n- Split-file support (1-of-9 to 9-of-9) \n- No manual merging required \n- Multiple size options (112.4GB to 259.8GB) \n- Compatibility with llama.cpp \n- Direct download capability \n\nComparison: \nUnlike standard"
 
 
 
 
 
14
  }
 
10
  "region:us"
11
  ],
12
  "description": "--- license: apache-2.0 --- # Grok-1 GGUF Quantizations This repository contains unofficial GGUF Quantizations of Grok-1, compatible with as of PR- Add grok-1 support #6204. ## Updates #### Native Split Support in llama.cpp - The splits have been updated to utilize the improvements from PR: llama_model_loader: support multiple split/shard GGUFs. As a result, manual merging with is no longer required. With this, there is no need to merge the split files before use. Just download all splits and run llama.cpp with the first split like you would previously. It'll detect the other splits and load them as well. #### Direct Split Download from huggingface using llama.cpp - Thanks to a new PR common: llama_load_model_from_url split support #6192 from phymbert it's now possible load model splits from url. That means this downloads and runs the model: And that is very cool (@phymbert) ## Available Quantizations The following Quantizations are currently available for download: | Quant | Split Files | Size | |----------|-------------|----------| | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 112.4 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 125.4 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 186.0 GB | | | 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 | 259.8 GB | I would recommend the version for now. *More Quantizations will be uploaded soon. All current Quants are created without any importance matrix.*",
13
+ "model_explanation_gemini": "\"Unofficial GGUF quantizations of Grok-1, split into multiple files for easier handling and compatible with llama.cpp, offering various size options without manual merging.\"\n\nFeatures: \n- GGUF quantization format \n- Split-file support (1-of-9 to 9-of-9) \n- No manual merging required \n- Multiple size options (112.4GB to 259.8GB) \n- Compatibility with llama.cpp \n- Direct download capability \n\nComparison: \nUnlike standard",
14
+ "release_year": null,
15
+ "parameter_count": null,
16
+ "is_fine_tuned": false,
17
+ "category": "Other",
18
+ "api_enhanced": true
19
  }
model_data_json/ArthurZ_Ilama-3.2-1B.json CHANGED
@@ -13,5 +13,11 @@
13
  "region:us"
14
  ],
15
  "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID <!-- Provide a quick summary of what the model is/does. --> ## Model Details ### Model Description <!-- Provide a longer summary of what this model is. --> This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] <!-- Provide the basic links for the model. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> ### Direct Use <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> [More Information Needed] ### Downstream Use [optional] <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> [More Information Needed] ### Training Procedure <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> #### Speeds, Sizes, Times [optional] <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> [More Information Needed] ## Evaluation <!-- This section describes the evaluation protocols and provides the results. --> ### Testing Data, Factors & Metrics #### Testing Data <!-- This should link to a Dataset Card if possible. --> [More Information Needed] #### Factors <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> [More Information Needed] #### Metrics <!-- These are the evaluation metrics being used, ideally with a description of why. --> [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] <!-- Relevant interpretability work for the model goes here --> [More Information Needed] ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]",
16
- "model_explanation_gemini": "The description provides no substantive information about the model's purpose, features, or functionality, as all relevant fields are marked \"[More Information Needed].\"\n\nFeatures: \n- No discernible features or tasks are described. \n- No comparative analysis with other models is provided. \n\nComparison: \n- Not applicable due to lack of model details."
 
 
 
 
 
 
17
  }
 
13
  "region:us"
14
  ],
15
  "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID <!-- Provide a quick summary of what the model is/does. --> ## Model Details ### Model Description <!-- Provide a longer summary of what this model is. --> This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] <!-- Provide the basic links for the model. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> ### Direct Use <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> [More Information Needed] ### Downstream Use [optional] <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> [More Information Needed] ### Training Procedure <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> #### Speeds, Sizes, Times [optional] <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> [More Information Needed] ## Evaluation <!-- This section describes the evaluation protocols and provides the results. --> ### Testing Data, Factors & Metrics #### Testing Data <!-- This should link to a Dataset Card if possible. --> [More Information Needed] #### Factors <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> [More Information Needed] #### Metrics <!-- These are the evaluation metrics being used, ideally with a description of why. --> [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] <!-- Relevant interpretability work for the model goes here --> [More Information Needed] ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]",
16
+ "model_explanation_gemini": "The description provides no substantive information about the model's purpose, features, or functionality, as all relevant fields are marked \"[More Information Needed].\"\n\nFeatures: \n- No discernible features or tasks are described. \n- No comparative analysis with other models is provided. \n\nComparison: \n- Not applicable due to lack of model details.",
17
+ "release_year": "2019",
18
+ "parameter_count": "1B",
19
+ "is_fine_tuned": false,
20
+ "category": "Embedding",
21
+ "model_family": "LLaMA",
22
+ "api_enhanced": true
23
  }
model_data_json/Aryn_deformable-detr-DocLayNet.json CHANGED
@@ -15,5 +15,10 @@
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - DocLayNet widget: - src: example_title: DocLayNet Example 1 - src: example_title: DocLayNet Example 2 - src: example_title: DocLayNet Example 3 --- # Deformable DETR model trained on DocLayNet Deformable DEtection TRansformer (DETR), trained on DocLayNet (including 80k annotated pages in 11 classes). You can use this model in the serverless Aryn Partitioning Service. You can get started here ## Model description The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. !model image ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available Deformable DETR models. ### How to use Here is how to use this model: ## Evaluation results This model achieves 57.1 box mAP on DocLayNet. ## Training data The Deformable DETR model was trained on DocLayNet. It was introduced in the paper DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis by Pfitzmann et al. and first released in this repository. ### BibTeX entry and citation info",
18
- "model_explanation_gemini": "Detects objects in document layouts using a transformer-based model trained on DocLayNet, achieving 57.1 box mAP. \n\n**Features:** \n- Object detection for document layouts (11 classes) \n- Encoder-decoder transformer with convolutional backbone \n- Uses object queries and bipartite matching loss \n- Trained on DocLayNet (80k annotated pages) \n\n**Comparison:** \nUnlike standard DETR models, this variant uses deformable attention for improved efficiency and performance"
 
 
 
 
 
19
  }
 
15
  "region:us"
16
  ],
17
  "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - DocLayNet widget: - src: example_title: DocLayNet Example 1 - src: example_title: DocLayNet Example 2 - src: example_title: DocLayNet Example 3 --- # Deformable DETR model trained on DocLayNet Deformable DEtection TRansformer (DETR), trained on DocLayNet (including 80k annotated pages in 11 classes). You can use this model in the serverless Aryn Partitioning Service. You can get started here ## Model description The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. !model image ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available Deformable DETR models. ### How to use Here is how to use this model: ## Evaluation results This model achieves 57.1 box mAP on DocLayNet. ## Training data The Deformable DETR model was trained on DocLayNet. It was introduced in the paper DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis by Pfitzmann et al. and first released in this repository. ### BibTeX entry and citation info",
18
+ "model_explanation_gemini": "Detects objects in document layouts using a transformer-based model trained on DocLayNet, achieving 57.1 box mAP. \n\n**Features:** \n- Object detection for document layouts (11 classes) \n- Encoder-decoder transformer with convolutional backbone \n- Uses object queries and bipartite matching loss \n- Trained on DocLayNet (80k annotated pages) \n\n**Comparison:** \nUnlike standard DETR models, this variant uses deformable attention for improved efficiency and performance",
19
+ "release_year": "2022",
20
+ "parameter_count": null,
21
+ "is_fine_tuned": false,
22
+ "category": "Vision",
23
+ "api_enhanced": true
24
  }
model_data_json/AskUI_PTA-1.json CHANGED
@@ -17,5 +17,10 @@
17
  "region:us"
18
  ],
19
  "description": "--- library_name: transformers pipeline_tag: image-text-to-text tags: - computer use license: mit language: - en base_model: - microsoft/Florence-2-base --- # PTA-1: Controlling Computers with Small Models PTA (Prompt-to-Automation) is a vision language model for computer & phone automation, based on Florence-2. With only 270M parameters it outperforms much larger models in GUI text and element localization. This enables low-latency computer automation with local execution. ▶️ Try the demo at: AskUI/PTA-1 **Model Input:** Screenshot + description_of_target_element **Model Output:** BoundingBox for Target Element !image ## How to Get Started with the Model Use the code below to get started with the model. *Requirements:* torch, timm, einops, Pillow, transformers ## Evaluation **Note:** This is a first version of our evaluation, based on 999 samples (333 samples from each dataset). We are still running all models on the full test sets, and we are seeing ±5% deviations for a subset of the models we have already evaluated. | Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined | |--------------------------------------------|------------|--------|------------------|----------------|-----------------------------| | AskUI/PTA-1 | 0.27B | 79.98 | 90.69* | 76.28 | 72.97* | | anthropic.claude-3-5-sonnet-20241022-v2:0 | - | 70.37 | 82.28 | 83.18 | 45.65 | | agentsea/paligemma-3b-ft-waveui-896 | 3.29B | 57.76 | 70.57* | 67.87 | 34.83 | | Qwen/Qwen2-VL-7B-Instruct | 8.29B | 57.26 | 47.45 | 60.66 | 63.66 | | agentsea/paligemma-3b-ft-widgetcap-waveui-448 | 3.29B | 53.15 | 74.17* | 53.45 | 31.83 | | microsoft/Florence-2-base | 0.27B | 39.44 | 22.22 | 81.38 | 14.71 | | microsoft/Florence-2-large | 0.82B | 36.64 | 14.11 | 81.98 | 13.81 | | EasyOCR | - | 29.43 | 3.9 | 75.08 | 9.31 | | adept/fuyu-8b | 9.41B | 26.83 | 5.71 | 71.47 | 3.3 | | Qwen/Qwen2-VL-2B-Instruct | 2.21B | 23.32 | 17.12 | 26.13 | 26.73 | | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 | 0.90B | 18.92 | 10.81 | 22.82 | 23.12 | \\* Models is known to be trained on the train split of that dataset. The high benchmark scores for our model are partially due to data bias. Therefore, we expect users of the model to fine-tune it according to the data distributions of their use case. #### Metrics Click success rate is calculated as the number of clicks inside the target bounding box relative to all clicks. If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.",
20
- "model_explanation_gemini": "\"AskUI_PTA-1 is a 270M-parameter vision-language model for computer/phone automation, excelling in GUI text/element localization by generating bounding boxes from screenshots and target descriptions, enabling low-latency local execution.\" \n\n**Features:** \n- Small size (270M parameters) \n- Image-text-to-text pipeline (screenshot + description → bounding box) \n- Optimized for GUI text/element localization \n- Low-latency local execution \n- Out"
 
 
 
 
 
21
  }
 
17
  "region:us"
18
  ],
19
  "description": "--- library_name: transformers pipeline_tag: image-text-to-text tags: - computer use license: mit language: - en base_model: - microsoft/Florence-2-base --- # PTA-1: Controlling Computers with Small Models PTA (Prompt-to-Automation) is a vision language model for computer & phone automation, based on Florence-2. With only 270M parameters it outperforms much larger models in GUI text and element localization. This enables low-latency computer automation with local execution. ▶️ Try the demo at: AskUI/PTA-1 **Model Input:** Screenshot + description_of_target_element **Model Output:** BoundingBox for Target Element !image ## How to Get Started with the Model Use the code below to get started with the model. *Requirements:* torch, timm, einops, Pillow, transformers ## Evaluation **Note:** This is a first version of our evaluation, based on 999 samples (333 samples from each dataset). We are still running all models on the full test sets, and we are seeing ±5% deviations for a subset of the models we have already evaluated. | Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined | |--------------------------------------------|------------|--------|------------------|----------------|-----------------------------| | AskUI/PTA-1 | 0.27B | 79.98 | 90.69* | 76.28 | 72.97* | | anthropic.claude-3-5-sonnet-20241022-v2:0 | - | 70.37 | 82.28 | 83.18 | 45.65 | | agentsea/paligemma-3b-ft-waveui-896 | 3.29B | 57.76 | 70.57* | 67.87 | 34.83 | | Qwen/Qwen2-VL-7B-Instruct | 8.29B | 57.26 | 47.45 | 60.66 | 63.66 | | agentsea/paligemma-3b-ft-widgetcap-waveui-448 | 3.29B | 53.15 | 74.17* | 53.45 | 31.83 | | microsoft/Florence-2-base | 0.27B | 39.44 | 22.22 | 81.38 | 14.71 | | microsoft/Florence-2-large | 0.82B | 36.64 | 14.11 | 81.98 | 13.81 | | EasyOCR | - | 29.43 | 3.9 | 75.08 | 9.31 | | adept/fuyu-8b | 9.41B | 26.83 | 5.71 | 71.47 | 3.3 | | Qwen/Qwen2-VL-2B-Instruct | 2.21B | 23.32 | 17.12 | 26.13 | 26.73 | | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 | 0.90B | 18.92 | 10.81 | 22.82 | 23.12 | \\* Models is known to be trained on the train split of that dataset. The high benchmark scores for our model are partially due to data bias. Therefore, we expect users of the model to fine-tune it according to the data distributions of their use case. #### Metrics Click success rate is calculated as the number of clicks inside the target bounding box relative to all clicks. If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.",
20
+ "model_explanation_gemini": "\"AskUI_PTA-1 is a 270M-parameter vision-language model for computer/phone automation, excelling in GUI text/element localization by generating bounding boxes from screenshots and target descriptions, enabling low-latency local execution.\" \n\n**Features:** \n- Small size (270M parameters) \n- Image-text-to-text pipeline (screenshot + description → bounding box) \n- Optimized for GUI text/element localization \n- Low-latency local execution \n- Out",
21
+ "release_year": null,
22
+ "parameter_count": null,
23
+ "is_fine_tuned": true,
24
+ "category": "Multimodal",
25
+ "api_enhanced": true
26
  }
model_data_json/AstraMindAI_xtts2-gpt.json CHANGED
@@ -11,5 +11,11 @@
11
  "region:us"
12
  ],
13
  "description": "--- license: apache-2.0 base_model: - coqui/XTTS-v2 --- # Auralis 🌌 ## Model Details 🛠️ **Model Name:** Auralis **Model Architecture:** Based on Coqui XTTS-v2 **License:** - license: Apache 2.0 - base_model: XTTS-v2 Components Coqui AI License **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi **Developed by:** AstraMind.ai **GitHub:** AstraMind AI **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. --- ## Model Description 🚀 Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by Coqui XTTS-v2 and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. ### Key Features: - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. - **Scalable:** Handles multiple requests simultaneously. - **Streaming:** Seamlessly processes long texts in a streaming format. - **Custom Voices:** Enables voice cloning from short reference audio. --- ## Quick Start ⭐ --- ## Ebook Generation 📚 Auralis converting ebooks into audio formats at lightning speed. For Python script, check out ebook_audio_generator.py. --- ## Intended Use 🌟 Auralis is designed for: - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. - **Developers:** Integrate TTS into applications via a simple Python API. - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. --- ## Performance 📊 **Benchmarks on NVIDIA RTX 3090:** - Short phrases (<100 characters): ~1 second - Medium texts (<1,000 characters): ~5-10 seconds - Full books (~100,000 characters): ~10 minutes **Memory Usage:** - Base VRAM: ~4GB - Peak VRAM: ~10GB --- ## Model Features 🛸 1. **Speed & Efficiency:** - Smart batching for rapid processing of long texts. - Memory-optimized for consumer GPUs. 2. **Easy Integration:** - Python API with support for synchronous and asynchronous workflows. - Streaming mode for continuous playback during generation. 3. **Audio Quality Enhancements:** - Background noise reduction. - Voice clarity and volume normalization. - Customizable audio preprocessing. 4. **Multilingual Support:** - Automatic language detection. - High-quality speech in 15+ languages. 5. **Customization:** - Voice cloning using short reference clips. - Adjustable parameters for tone, pacing, and language. --- ## Limitations & Ethical Considerations ⚠️ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. --- ## Citation 📜 If you use Auralis in your research or projects, please cite:",
14
- "model_explanation_gemini": "Generates high-quality, multilingual text-to-speech with fast processing, voice cloning, and scalability for applications like audiobooks and accessibility.\n\nModel Features:\n1. Fast processing (e.g., full novel in ~10 minutes)\n2. Low VRAM usage (<10GB on RTX 3090)\n3. Scalable for concurrent requests\n4. Streaming support for long texts\n5. Voice cloning capability\n6. Multilingual support (15+ languages)\n7. Noise reduction and"
 
 
 
 
 
 
15
  }
 
11
  "region:us"
12
  ],
13
  "description": "--- license: apache-2.0 base_model: - coqui/XTTS-v2 --- # Auralis 🌌 ## Model Details 🛠️ **Model Name:** Auralis **Model Architecture:** Based on Coqui XTTS-v2 **License:** - license: Apache 2.0 - base_model: XTTS-v2 Components Coqui AI License **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi **Developed by:** AstraMind.ai **GitHub:** AstraMind AI **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. --- ## Model Description 🚀 Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by Coqui XTTS-v2 and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. ### Key Features: - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. - **Scalable:** Handles multiple requests simultaneously. - **Streaming:** Seamlessly processes long texts in a streaming format. - **Custom Voices:** Enables voice cloning from short reference audio. --- ## Quick Start ⭐ --- ## Ebook Generation 📚 Auralis converting ebooks into audio formats at lightning speed. For Python script, check out ebook_audio_generator.py. --- ## Intended Use 🌟 Auralis is designed for: - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. - **Developers:** Integrate TTS into applications via a simple Python API. - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. --- ## Performance 📊 **Benchmarks on NVIDIA RTX 3090:** - Short phrases (<100 characters): ~1 second - Medium texts (<1,000 characters): ~5-10 seconds - Full books (~100,000 characters): ~10 minutes **Memory Usage:** - Base VRAM: ~4GB - Peak VRAM: ~10GB --- ## Model Features 🛸 1. **Speed & Efficiency:** - Smart batching for rapid processing of long texts. - Memory-optimized for consumer GPUs. 2. **Easy Integration:** - Python API with support for synchronous and asynchronous workflows. - Streaming mode for continuous playback during generation. 3. **Audio Quality Enhancements:** - Background noise reduction. - Voice clarity and volume normalization. - Customizable audio preprocessing. 4. **Multilingual Support:** - Automatic language detection. - High-quality speech in 15+ languages. 5. **Customization:** - Voice cloning using short reference clips. - Adjustable parameters for tone, pacing, and language. --- ## Limitations & Ethical Considerations ⚠️ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. --- ## Citation 📜 If you use Auralis in your research or projects, please cite:",
14
+ "model_explanation_gemini": "Generates high-quality, multilingual text-to-speech with fast processing, voice cloning, and scalability for applications like audiobooks and accessibility.\n\nModel Features:\n1. Fast processing (e.g., full novel in ~10 minutes)\n2. Low VRAM usage (<10GB on RTX 3090)\n3. Scalable for concurrent requests\n4. Streaming support for long texts\n5. Voice cloning capability\n6. Multilingual support (15+ languages)\n7. Noise reduction and",
15
+ "release_year": null,
16
+ "parameter_count": null,
17
+ "is_fine_tuned": true,
18
+ "category": "Large Language Model",
19
+ "model_family": "GPT",
20
+ "api_enhanced": true
21
  }
model_data_json/AstraMindAI_xttsv2.json CHANGED
@@ -10,5 +10,10 @@
10
  "region:us"
11
  ],
12
  "description": "--- license: apache-2.0 base_model: - coqui/XTTS-v2 --- # Auralis 🌌 ## Model Details 🛠️ **Model Name:** Auralis **Model Architecture:** Based on Coqui XTTS-v2 **License:** - license: Apache 2.0 - base_model: XTTS-v2 Components Coqui AI License **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi **Developed by:** AstraMind.ai **GitHub:** AstraMind AI **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. --- ## Model Description 🚀 Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by Coqui XTTS-v2 and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. ### Key Features: - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. - **Scalable:** Handles multiple requests simultaneously. - **Streaming:** Seamlessly processes long texts in a streaming format. - **Custom Voices:** Enables voice cloning from short reference audio. --- ## Quick Start ⭐ --- ## Ebook Generation 📚 Auralis converting ebooks into audio formats at lightning speed. For Python script, check out ebook_audio_generator.py. --- ## Intended Use 🌟 Auralis is designed for: - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. - **Developers:** Integrate TTS into applications via a simple Python API. - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. --- ## Performance 📊 **Benchmarks on NVIDIA RTX 3090:** - Short phrases (<100 characters): ~1 second - Medium texts (<1,000 characters): ~5-10 seconds - Full books (~100,000 characters): ~10 minutes **Memory Usage:** - Base VRAM: ~4GB - Peak VRAM: ~10GB --- ## Model Features 🛸 1. **Speed & Efficiency:** - Smart batching for rapid processing of long texts. - Memory-optimized for consumer GPUs. 2. **Easy Integration:** - Python API with support for synchronous and asynchronous workflows. - Streaming mode for continuous playback during generation. 3. **Audio Quality Enhancements:** - Background noise reduction. - Voice clarity and volume normalization. - Customizable audio preprocessing. 4. **Multilingual Support:** - Automatic language detection. - High-quality speech in 15+ languages. 5. **Customization:** - Voice cloning using short reference clips. - Adjustable parameters for tone, pacing, and language. --- ## Limitations & Ethical Considerations ⚠️ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. --- ## Citation 📜 If you use Auralis in your research or projects, please cite:",
13
- "model_explanation_gemini": "Generates high-quality, multilingual text-to-speech with fast processing, voice cloning, and scalability for applications like audiobooks and accessibility.\n\nModel Features: \n1. Multilingual support (15+ languages) \n2. High-speed processing (e.g., full novel in ~10 minutes) \n3. Low VRAM requirements (<10GB on RTX 3090) \n4. Voice cloning from short audio samples \n5. Streaming capability for long texts \n6. Noise reduction and audio"
 
 
 
 
 
14
  }
 
10
  "region:us"
11
  ],
12
  "description": "--- license: apache-2.0 base_model: - coqui/XTTS-v2 --- # Auralis 🌌 ## Model Details 🛠️ **Model Name:** Auralis **Model Architecture:** Based on Coqui XTTS-v2 **License:** - license: Apache 2.0 - base_model: XTTS-v2 Components Coqui AI License **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi **Developed by:** AstraMind.ai **GitHub:** AstraMind AI **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. --- ## Model Description 🚀 Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by Coqui XTTS-v2 and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. ### Key Features: - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. - **Scalable:** Handles multiple requests simultaneously. - **Streaming:** Seamlessly processes long texts in a streaming format. - **Custom Voices:** Enables voice cloning from short reference audio. --- ## Quick Start ⭐ --- ## Ebook Generation 📚 Auralis converting ebooks into audio formats at lightning speed. For Python script, check out ebook_audio_generator.py. --- ## Intended Use 🌟 Auralis is designed for: - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. - **Developers:** Integrate TTS into applications via a simple Python API. - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. --- ## Performance 📊 **Benchmarks on NVIDIA RTX 3090:** - Short phrases (<100 characters): ~1 second - Medium texts (<1,000 characters): ~5-10 seconds - Full books (~100,000 characters): ~10 minutes **Memory Usage:** - Base VRAM: ~4GB - Peak VRAM: ~10GB --- ## Model Features 🛸 1. **Speed & Efficiency:** - Smart batching for rapid processing of long texts. - Memory-optimized for consumer GPUs. 2. **Easy Integration:** - Python API with support for synchronous and asynchronous workflows. - Streaming mode for continuous playback during generation. 3. **Audio Quality Enhancements:** - Background noise reduction. - Voice clarity and volume normalization. - Customizable audio preprocessing. 4. **Multilingual Support:** - Automatic language detection. - High-quality speech in 15+ languages. 5. **Customization:** - Voice cloning using short reference clips. - Adjustable parameters for tone, pacing, and language. --- ## Limitations & Ethical Considerations ⚠️ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. --- ## Citation 📜 If you use Auralis in your research or projects, please cite:",
13
+ "model_explanation_gemini": "Generates high-quality, multilingual text-to-speech with fast processing, voice cloning, and scalability for applications like audiobooks and accessibility.\n\nModel Features: \n1. Multilingual support (15+ languages) \n2. High-speed processing (e.g., full novel in ~10 minutes) \n3. Low VRAM requirements (<10GB on RTX 3090) \n4. Voice cloning from short audio samples \n5. Streaming capability for long texts \n6. Noise reduction and audio",
14
+ "release_year": null,
15
+ "parameter_count": null,
16
+ "is_fine_tuned": true,
17
+ "category": "Other",
18
+ "api_enhanced": true
19
  }
model_data_json/AutonLab_MOMENT-1-large.json CHANGED
@@ -21,5 +21,10 @@
21
  "region:us"
22
  ],
23
  "description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Large MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources <!-- Provide the basic links for the model. --> - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 404 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.",
24
- "model_explanation_gemini": "A foundation model for general-purpose time-series analysis, capable of forecasting, classification, anomaly detection, and imputation, with zero-shot and few-shot capabilities. \n\n**Features:** \n- Supports multiple time-series tasks (forecasting, classification, anomaly detection, imputation) \n- Pretrained for out-of-the-box use (zero-shot forecasting, few-shot classification) \n- Tunable with task-specific data \n- Available in three sizes (Small, Base, Large) \n\n**Comparison:** \nM"
 
 
 
 
 
25
  }
 
21
  "region:us"
22
  ],
23
  "description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Large MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources <!-- Provide the basic links for the model. --> - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 404 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.",
24
+ "model_explanation_gemini": "A foundation model for general-purpose time-series analysis, capable of forecasting, classification, anomaly detection, and imputation, with zero-shot and few-shot capabilities. \n\n**Features:** \n- Supports multiple time-series tasks (forecasting, classification, anomaly detection, imputation) \n- Pretrained for out-of-the-box use (zero-shot forecasting, few-shot classification) \n- Tunable with task-specific data \n- Available in three sizes (Small, Base, Large) \n\n**Comparison:** \nM",
25
+ "release_year": "2024",
26
+ "parameter_count": null,
27
+ "is_fine_tuned": false,
28
+ "category": "Other",
29
+ "api_enhanced": true
30
  }
model_data_json/AutonLab_MOMENT-1-small.json CHANGED
@@ -21,5 +21,10 @@
21
  "region:us"
22
  ],
23
  "description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Small MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources <!-- Provide the basic links for the model. --> - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 48 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.",
24
- "model_explanation_gemini": "\"MOMENT-1-small is a general-purpose time-series foundation model designed for diverse tasks like forecasting, classification, anomaly detection, and imputation, offering out-of-the-box effectiveness and tunability for improved performance.\"\n\nModel Features: \n- Supports multiple time-series tasks (forecasting, classification, anomaly detection, imputation) \n- Pretrained for out-of-the-box use (zero-shot/few-shot capable) \n- Tunable with task-specific data \n- Part of a model family (Small"
 
 
 
 
 
25
  }
 
21
  "region:us"
22
  ],
23
  "description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Small MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources <!-- Provide the basic links for the model. --> - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 48 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.",
24
+ "model_explanation_gemini": "\"MOMENT-1-small is a general-purpose time-series foundation model designed for diverse tasks like forecasting, classification, anomaly detection, and imputation, offering out-of-the-box effectiveness and tunability for improved performance.\"\n\nModel Features: \n- Supports multiple time-series tasks (forecasting, classification, anomaly detection, imputation) \n- Pretrained for out-of-the-box use (zero-shot/few-shot capable) \n- Tunable with task-specific data \n- Part of a model family (Small",
25
+ "release_year": "2024",
26
+ "parameter_count": null,
27
+ "is_fine_tuned": false,
28
+ "category": "Other",
29
+ "api_enhanced": true
30
  }
model_data_json/BAAI_Emu3-Chat-hf.json CHANGED
@@ -13,5 +13,10 @@
13
  "region:us"
14
  ],
15
  "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text inference: false tags: - vision - image-text-to-text arxiv: 2409.18869 --- <div align='center'> <h1>Emu3: Next-Token Prediction is All You Need</h1h1> <h3></h3> Emu3 Team, BAAI </div> <div align='left'> <img src=\" class=\"interpolation-image\" alt=\"arch.\" height=\"80%\" width=\"70%\" /> </div> Below is the model card of Emu3-Chat model, which is adapted from the original Emu3 model card that you can find here. ## Model details **Model type:** Emu3 is an open-source multimodal models trained with next-token prediction task. By tokenizing images and text into a discrete space, Emu3 is trained as a single transformer from scratch on a mixture of multimodal sequences. It is an auto-regressive language model, based on the transformer architecture. **Paper or resources for more information:** ## Highlights - **Emu3** is capable of generating high-quality images following the text input, by simply predicting the next vision token. The model naturally supports flexible resolutions and styles. - **Emu3** shows strong vision-language understanding capabilities to see the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM. - **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next. - **Emu3** outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures. ## How to use the model First, make sure to have . Make sure also to follow the correct prompt template () and add the token to the location where you want to query images: ### Using : ### Using pure : Below is an example script to run generation in precision on a GPU device: ### Model optimization #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: # Citation",
16
- "model_explanation_gemini": "Generates images and videos from text inputs via next-token prediction while excelling in vision-language understanding, outperforming models like SDXL and LLaVA-1.6 without relying on diffusion or pretrained components. \n\n**Features:** \n- Auto-regressive transformer trained on multimodal sequences \n- Text-to-image and text-to-video generation via token prediction \n- Supports flexible resolutions and styles \n- Strong vision-language understanding without CLIP or pretrained LLMs \n- Video generation/extending capabilities"
 
 
 
 
 
17
  }
 
13
  "region:us"
14
  ],
15
  "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text inference: false tags: - vision - image-text-to-text arxiv: 2409.18869 --- <div align='center'> <h1>Emu3: Next-Token Prediction is All You Need</h1h1> <h3></h3> Emu3 Team, BAAI </div> <div align='left'> <img src=\" class=\"interpolation-image\" alt=\"arch.\" height=\"80%\" width=\"70%\" /> </div> Below is the model card of Emu3-Chat model, which is adapted from the original Emu3 model card that you can find here. ## Model details **Model type:** Emu3 is an open-source multimodal models trained with next-token prediction task. By tokenizing images and text into a discrete space, Emu3 is trained as a single transformer from scratch on a mixture of multimodal sequences. It is an auto-regressive language model, based on the transformer architecture. **Paper or resources for more information:** ## Highlights - **Emu3** is capable of generating high-quality images following the text input, by simply predicting the next vision token. The model naturally supports flexible resolutions and styles. - **Emu3** shows strong vision-language understanding capabilities to see the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM. - **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next. - **Emu3** outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures. ## How to use the model First, make sure to have . Make sure also to follow the correct prompt template () and add the token to the location where you want to query images: ### Using : ### Using pure : Below is an example script to run generation in precision on a GPU device: ### Model optimization #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: # Citation",
16
+ "model_explanation_gemini": "Generates images and videos from text inputs via next-token prediction while excelling in vision-language understanding, outperforming models like SDXL and LLaVA-1.6 without relying on diffusion or pretrained components. \n\n**Features:** \n- Auto-regressive transformer trained on multimodal sequences \n- Text-to-image and text-to-video generation via token prediction \n- Supports flexible resolutions and styles \n- Strong vision-language understanding without CLIP or pretrained LLMs \n- Video generation/extending capabilities",
17
+ "release_year": "2024",
18
+ "parameter_count": null,
19
+ "is_fine_tuned": false,
20
+ "category": "Multimodal",
21
+ "api_enhanced": true
22
  }
model_data_json/BAAI_Emu3-VisionTokenizer.json CHANGED
@@ -12,5 +12,10 @@
12
  "region:us"
13
  ],
14
  "description": "--- license: apache-2.0 library_name: transformers --- <div align='center'> <h1>Emu3: Next-Token Prediction is All You Need</h1h1> <h3></h3> Emu3 Team, BAAI | Project Page | Paper | 🤗HF Models | github | Demo | </div> <div align='center'> <img src=\" class=\"interpolation-image\" alt=\"arch.\" height=\"80%\" width=\"70%\" /> </div> We introduce **Emu3**, a new suite of state-of-the-art multimodal models trained solely with **<i>next-token prediction</i>**! By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. ### Emu3 excels in both generation and perception **Emu3** outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures. <div align='center'> <img src=\" class=\"interpolation-image\" alt=\"comparison.\" height=\"80%\" width=\"80%\" /> </div> ### Highlights - **Emu3** is capable of generating high-quality images following the text input, by simply predicting the next vision token. The model naturally supports flexible resolutions and styles. - **Emu3** shows strong vision-language understanding capabilities to see the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM. - **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next. ### Quickstart for Autoencoding",
15
- "model_explanation_gemini": "\"Emu3 is a multimodal model trained with next-token prediction to generate images, videos, and text while excelling in vision-language understanding, outperforming models like SDXL, LLaVA-1.6, and OpenSora-1.2 without relying on diffusion or compositional architectures.\"\n\n### Features: \n1. **Multimodal generation** (images, videos, text) via next-token prediction. \n2. **Flexible resolutions/styles** for image generation."
 
 
 
 
 
16
  }
 
12
  "region:us"
13
  ],
14
  "description": "--- license: apache-2.0 library_name: transformers --- <div align='center'> <h1>Emu3: Next-Token Prediction is All You Need</h1h1> <h3></h3> Emu3 Team, BAAI | Project Page | Paper | 🤗HF Models | github | Demo | </div> <div align='center'> <img src=\" class=\"interpolation-image\" alt=\"arch.\" height=\"80%\" width=\"70%\" /> </div> We introduce **Emu3**, a new suite of state-of-the-art multimodal models trained solely with **<i>next-token prediction</i>**! By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. ### Emu3 excels in both generation and perception **Emu3** outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures. <div align='center'> <img src=\" class=\"interpolation-image\" alt=\"comparison.\" height=\"80%\" width=\"80%\" /> </div> ### Highlights - **Emu3** is capable of generating high-quality images following the text input, by simply predicting the next vision token. The model naturally supports flexible resolutions and styles. - **Emu3** shows strong vision-language understanding capabilities to see the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM. - **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next. ### Quickstart for Autoencoding",
15
+ "model_explanation_gemini": "\"Emu3 is a multimodal model trained with next-token prediction to generate images, videos, and text while excelling in vision-language understanding, outperforming models like SDXL, LLaVA-1.6, and OpenSora-1.2 without relying on diffusion or compositional architectures.\"\n\n### Features: \n1. **Multimodal generation** (images, videos, text) via next-token prediction. \n2. **Flexible resolutions/styles** for image generation.",
16
+ "release_year": "2024",
17
+ "parameter_count": null,
18
+ "is_fine_tuned": false,
19
+ "category": "Embedding",
20
+ "api_enhanced": true
21
  }
model_data_json/BAAI_bge-base-en-v1.5.json CHANGED
@@ -25,5 +25,11 @@
25
  "region:us"
26
  ],
27
  "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.14925373134328 - type: ap value: 39.32336517995478 - type: f1 value: 70.16902252611425 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.386825 - type: ap value: 90.21276917991995 - type: f1 value: 93.37741030006174 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.846000000000004 - type: f1 value: 48.14646269778261 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.754000000000005 - type: map_at_10 value: 55.761 - type: map_at_100 value: 56.330999999999996 - type: map_at_1000 value: 56.333999999999996 - type: map_at_3 value: 51.92 - type: map_at_5 value: 54.010999999999996 - type: mrr_at_1 value: 41.181 - type: mrr_at_10 value: 55.967999999999996 - type: mrr_at_100 value: 56.538 - type: mrr_at_1000 value: 56.542 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.208999999999996 - type: ndcg_at_1 value: 40.754000000000005 - type: ndcg_at_10 value: 63.605000000000004 - type: ndcg_at_100 value: 66.05199999999999 - type: ndcg_at_1000 value: 66.12 - type: ndcg_at_3 value: 55.708 - type: ndcg_at_5 value: 59.452000000000005 - type: precision_at_1 value: 40.754000000000005 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.149000000000001 - type: recall_at_1 value: 40.754000000000005 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 75.747 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.74884539679369 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.8075893810716 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.128470519187736 - type: mrr value: 74.28065778481289 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.24629081484655 - type: cos_sim_spearman value: 86.93752309911496 - type: euclidean_pearson value: 87.58589628573816 - type: euclidean_spearman value: 88.05622328825284 - type: manhattan_pearson value: 87.5594959805773 - type: manhattan_spearman value: 88.19658793233961 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.9512987012987 - type: f1 value: 86.92515357973708 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.10263762928872 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.69711517426737 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.327 - type: map_at_10 value: 44.099 - type: map_at_100 value: 45.525 - type: map_at_1000 value: 45.641999999999996 - type: map_at_3 value: 40.47 - type: map_at_5 value: 42.36 - type: mrr_at_1 value: 39.199 - type: mrr_at_10 value: 49.651 - type: mrr_at_100 value: 50.29 - type: mrr_at_1000 value: 50.329 - type: mrr_at_3 value: 46.924 - type: mrr_at_5 value: 48.548 - type: ndcg_at_1 value: 39.199 - type: ndcg_at_10 value: 50.773 - type: ndcg_at_100 value: 55.67999999999999 - type: ndcg_at_1000 value: 57.495 - type: ndcg_at_3 value: 45.513999999999996 - type: ndcg_at_5 value: 47.703 - type: precision_at_1 value: 39.199 - type: precision_at_10 value: 9.914000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.984 - type: precision_at_5 value: 15.737000000000002 - type: recall_at_1 value: 32.327 - type: recall_at_10 value: 63.743 - type: recall_at_100 value: 84.538 - type: recall_at_1000 value: 96.089 - type: recall_at_3 value: 48.065000000000005 - type: recall_at_5 value: 54.519 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 42.954 - type: map_at_100 value: 44.151 - type: map_at_1000 value: 44.287 - type: map_at_3 value: 39.912 - type: map_at_5 value: 41.798 - type: mrr_at_1 value: 41.465 - type: mrr_at_10 value: 49.351 - type: mrr_at_100 value: 49.980000000000004 - type: mrr_at_1000 value: 50.016000000000005 - type: mrr_at_3 value: 47.144000000000005 - type: mrr_at_5 value: 48.592999999999996 - type: ndcg_at_1 value: 41.465 - type: ndcg_at_10 value: 48.565999999999995 - type: ndcg_at_100 value: 52.76499999999999 - type: ndcg_at_1000 value: 54.749 - type: ndcg_at_3 value: 44.57 - type: ndcg_at_5 value: 46.759 - type: precision_at_1 value: 41.465 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.433 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 21.423000000000002 - type: precision_at_5 value: 15.414 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 57.738 - type: recall_at_100 value: 75.86500000000001 - type: recall_at_1000 value: 88.36 - type: recall_at_3 value: 45.626 - type: recall_at_5 value: 51.812000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.185 - type: map_at_10 value: 53.929 - type: map_at_100 value: 54.92 - type: map_at_1000 value: 54.967999999999996 - type: map_at_3 value: 50.70400000000001 - type: map_at_5 value: 52.673 - type: mrr_at_1 value: 47.398 - type: mrr_at_10 value: 57.303000000000004 - type: mrr_at_100 value: 57.959 - type: mrr_at_1000 value: 57.985 - type: mrr_at_3 value: 54.932 - type: mrr_at_5 value: 56.464999999999996 - type: ndcg_at_1 value: 47.398 - type: ndcg_at_10 value: 59.653 - type: ndcg_at_100 value: 63.627 - type: ndcg_at_1000 value: 64.596 - type: ndcg_at_3 value: 54.455 - type: ndcg_at_5 value: 57.245000000000005 - type: precision_at_1 value: 47.398 - type: precision_at_10 value: 9.524000000000001 - type: precision_at_100 value: 1.243 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.389 - type: precision_at_5 value: 16.752 - type: recall_at_1 value: 41.185 - type: recall_at_10 value: 73.193 - type: recall_at_100 value: 90.357 - type: recall_at_1000 value: 97.253 - type: recall_at_3 value: 59.199999999999996 - type: recall_at_5 value: 66.118 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.27 - type: map_at_10 value: 36.223 - type: map_at_100 value: 37.218 - type: map_at_1000 value: 37.293 - type: map_at_3 value: 33.503 - type: map_at_5 value: 35.097 - type: mrr_at_1 value: 29.492 - type: mrr_at_10 value: 38.352000000000004 - type: mrr_at_100 value: 39.188 - type: mrr_at_1000 value: 39.247 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.401 - type: ndcg_at_1 value: 29.492 - type: ndcg_at_10 value: 41.239 - type: ndcg_at_100 value: 46.066 - type: ndcg_at_1000 value: 47.992000000000004 - type: ndcg_at_3 value: 36.11 - type: ndcg_at_5 value: 38.772 - type: precision_at_1 value: 29.492 - type: precision_at_10 value: 6.260000000000001 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.104000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.27 - type: recall_at_10 value: 54.589 - type: recall_at_100 value: 76.70700000000001 - type: recall_at_1000 value: 91.158 - type: recall_at_3 value: 40.974 - type: recall_at_5 value: 47.327000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.848 - type: map_at_10 value: 26.207 - type: map_at_100 value: 27.478 - type: map_at_1000 value: 27.602 - type: map_at_3 value: 23.405 - type: map_at_5 value: 24.98 - type: mrr_at_1 value: 21.891 - type: mrr_at_10 value: 31.041999999999998 - type: mrr_at_100 value: 32.092 - type: mrr_at_1000 value: 32.151999999999994 - type: mrr_at_3 value: 28.358 - type: mrr_at_5 value: 29.969 - type: ndcg_at_1 value: 21.891 - type: ndcg_at_10 value: 31.585 - type: ndcg_at_100 value: 37.531 - type: ndcg_at_1000 value: 40.256 - type: ndcg_at_3 value: 26.508 - type: ndcg_at_5 value: 28.894 - type: precision_at_1 value: 21.891 - type: precision_at_10 value: 5.795999999999999 - type: precision_at_100 value: 0.9990000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.769 - type: precision_at_5 value: 9.279 - type: recall_at_1 value: 17.848 - type: recall_at_10 value: 43.452 - type: recall_at_100 value: 69.216 - type: recall_at_1000 value: 88.102 - type: recall_at_3 value: 29.18 - type: recall_at_5 value: 35.347 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.94 - type: map_at_10 value: 41.248000000000005 - type: map_at_100 value: 42.495 - type: map_at_1000 value: 42.602000000000004 - type: map_at_3 value: 37.939 - type: map_at_5 value: 39.924 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 47.041 - type: mrr_at_100 value: 47.83 - type: mrr_at_1000 value: 47.878 - type: mrr_at_3 value: 44.466 - type: mrr_at_5 value: 46.111999999999995 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 47.223 - type: ndcg_at_100 value: 52.394 - type: ndcg_at_1000 value: 54.432 - type: ndcg_at_3 value: 42.032000000000004 - type: ndcg_at_5 value: 44.772 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 1.2890000000000001 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 19.698 - type: precision_at_5 value: 14.013 - type: recall_at_1 value: 30.94 - type: recall_at_10 value: 59.316 - type: recall_at_100 value: 80.783 - type: recall_at_1000 value: 94.15400000000001 - type: recall_at_3 value: 44.712 - type: recall_at_5 value: 51.932 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.104 - type: map_at_10 value: 36.675999999999995 - type: map_at_100 value: 38.076 - type: map_at_1000 value: 38.189 - type: map_at_3 value: 33.733999999999995 - type: map_at_5 value: 35.287 - type: mrr_at_1 value: 33.904 - type: mrr_at_10 value: 42.55 - type: mrr_at_100 value: 43.434 - type: mrr_at_1000 value: 43.494 - type: mrr_at_3 value: 40.126 - type: mrr_at_5 value: 41.473 - type: ndcg_at_1 value: 33.904 - type: ndcg_at_10 value: 42.414 - type: ndcg_at_100 value: 48.203 - type: ndcg_at_1000 value: 50.437 - type: ndcg_at_3 value: 37.633 - type: ndcg_at_5 value: 39.67 - type: precision_at_1 value: 33.904 - type: precision_at_10 value: 7.82 - type: precision_at_100 value: 1.2409999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 17.884 - type: precision_at_5 value: 12.648000000000001 - type: recall_at_1 value: 27.104 - type: recall_at_10 value: 53.563 - type: recall_at_100 value: 78.557 - type: recall_at_1000 value: 93.533 - type: recall_at_3 value: 39.92 - type: recall_at_5 value: 45.457 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.707749999999997 - type: map_at_10 value: 36.961 - type: map_at_100 value: 38.158833333333334 - type: map_at_1000 value: 38.270333333333326 - type: map_at_3 value: 34.07183333333334 - type: map_at_5 value: 35.69533333333334 - type: mrr_at_1 value: 32.81875 - type: mrr_at_10 value: 41.293 - type: mrr_at_100 value: 42.116499999999995 - type: mrr_at_1000 value: 42.170249999999996 - type: mrr_at_3 value: 38.83983333333333 - type: mrr_at_5 value: 40.29775 - type: ndcg_at_1 value: 32.81875 - type: ndcg_at_10 value: 42.355 - type: ndcg_at_100 value: 47.41374999999999 - type: ndcg_at_1000 value: 49.5805 - type: ndcg_at_3 value: 37.52825 - type: ndcg_at_5 value: 39.83266666666667 - type: precision_at_1 value: 32.81875 - type: precision_at_10 value: 7.382416666666666 - type: precision_at_100 value: 1.1640833333333334 - type: precision_at_1000 value: 0.15383333333333335 - type: precision_at_3 value: 17.134166666666665 - type: precision_at_5 value: 12.174833333333336 - type: recall_at_1 value: 27.707749999999997 - type: recall_at_10 value: 53.945 - type: recall_at_100 value: 76.191 - type: recall_at_1000 value: 91.101 - type: recall_at_3 value: 40.39083333333334 - type: recall_at_5 value: 46.40083333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.482 - type: map_at_10 value: 33.201 - type: map_at_100 value: 34.107 - type: map_at_1000 value: 34.197 - type: map_at_3 value: 31.174000000000003 - type: map_at_5 value: 32.279 - type: mrr_at_1 value: 29.908 - type: mrr_at_10 value: 36.235 - type: mrr_at_100 value: 37.04 - type: mrr_at_1000 value: 37.105 - type: mrr_at_3 value: 34.355999999999995 - type: mrr_at_5 value: 35.382999999999996 - type: ndcg_at_1 value: 29.908 - type: ndcg_at_10 value: 37.325 - type: ndcg_at_100 value: 41.795 - type: ndcg_at_1000 value: 44.105 - type: ndcg_at_3 value: 33.555 - type: ndcg_at_5 value: 35.266999999999996 - type: precision_at_1 value: 29.908 - type: precision_at_10 value: 5.721 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 14.008000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 26.482 - type: recall_at_10 value: 47.072 - type: recall_at_100 value: 67.27 - type: recall_at_1000 value: 84.371 - type: recall_at_3 value: 36.65 - type: recall_at_5 value: 40.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.815 - type: map_at_10 value: 26.369999999999997 - type: map_at_100 value: 27.458 - type: map_at_1000 value: 27.588 - type: map_at_3 value: 23.990000000000002 - type: map_at_5 value: 25.345000000000002 - type: mrr_at_1 value: 22.953000000000003 - type: mrr_at_10 value: 30.342999999999996 - type: mrr_at_100 value: 31.241000000000003 - type: mrr_at_1000 value: 31.319000000000003 - type: mrr_at_3 value: 28.16 - type: mrr_at_5 value: 29.406 - type: ndcg_at_1 value: 22.953000000000003 - type: ndcg_at_10 value: 31.151 - type: ndcg_at_100 value: 36.309000000000005 - type: ndcg_at_1000 value: 39.227000000000004 - type: ndcg_at_3 value: 26.921 - type: ndcg_at_5 value: 28.938000000000002 - type: precision_at_1 value: 22.953000000000003 - type: precision_at_10 value: 5.602 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 12.606 - type: precision_at_5 value: 9.119 - type: recall_at_1 value: 18.815 - type: recall_at_10 value: 41.574 - type: recall_at_100 value: 64.84400000000001 - type: recall_at_1000 value: 85.406 - type: recall_at_3 value: 29.694 - type: recall_at_5 value: 34.935 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.840999999999998 - type: map_at_10 value: 36.797999999999995 - type: map_at_100 value: 37.993 - type: map_at_1000 value: 38.086999999999996 - type: map_at_3 value: 34.050999999999995 - type: map_at_5 value: 35.379 - type: mrr_at_1 value: 32.649 - type: mrr_at_10 value: 41.025 - type: mrr_at_100 value: 41.878 - type: mrr_at_1000 value: 41.929 - type: mrr_at_3 value: 38.573 - type: mrr_at_5 value: 39.715 - type: ndcg_at_1 value: 32.649 - type: ndcg_at_10 value: 42.142 - type: ndcg_at_100 value: 47.558 - type: ndcg_at_1000 value: 49.643 - type: ndcg_at_3 value: 37.12 - type: ndcg_at_5 value: 38.983000000000004 - type: precision_at_1 value: 32.649 - type: precision_at_10 value: 7.08 - type: precision_at_100 value: 1.1039999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.698 - type: precision_at_5 value: 11.511000000000001 - type: recall_at_1 value: 27.840999999999998 - type: recall_at_10 value: 54.245 - type: recall_at_100 value: 77.947 - type: recall_at_1000 value: 92.36999999999999 - type: recall_at_3 value: 40.146 - type: recall_at_5 value: 44.951 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.529000000000003 - type: map_at_10 value: 35.010000000000005 - type: map_at_100 value: 36.647 - type: map_at_1000 value: 36.857 - type: map_at_3 value: 31.968000000000004 - type: map_at_5 value: 33.554 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 39.550999999999995 - type: mrr_at_100 value: 40.54 - type: mrr_at_1000 value: 40.596 - type: mrr_at_3 value: 36.726 - type: mrr_at_5 value: 38.416 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 40.675 - type: ndcg_at_100 value: 46.548 - type: ndcg_at_1000 value: 49.126 - type: ndcg_at_3 value: 35.829 - type: ndcg_at_5 value: 38.0 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 7.826 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 16.601 - type: precision_at_5 value: 12.095 - type: recall_at_1 value: 26.529000000000003 - type: recall_at_10 value: 51.03 - type: recall_at_100 value: 77.556 - type: recall_at_1000 value: 93.804 - type: recall_at_3 value: 36.986000000000004 - type: recall_at_5 value: 43.096000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.480999999999998 - type: map_at_10 value: 30.817 - type: map_at_100 value: 31.838 - type: map_at_1000 value: 31.932 - type: map_at_3 value: 28.011999999999997 - type: map_at_5 value: 29.668 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 33.072 - type: mrr_at_100 value: 33.926 - type: mrr_at_1000 value: 33.993 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 32.092 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.514 - type: ndcg_at_100 value: 40.489000000000004 - type: ndcg_at_1000 value: 42.908 - type: ndcg_at_3 value: 30.092000000000002 - type: ndcg_at_5 value: 32.989000000000004 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.545 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.446 - type: precision_at_5 value: 9.131 - type: recall_at_1 value: 23.480999999999998 - type: recall_at_10 value: 47.825 - type: recall_at_100 value: 70.652 - type: recall_at_1000 value: 88.612 - type: recall_at_3 value: 33.537 - type: recall_at_5 value: 40.542 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.333999999999998 - type: map_at_10 value: 22.524 - type: map_at_100 value: 24.506 - type: map_at_1000 value: 24.715 - type: map_at_3 value: 19.022 - type: map_at_5 value: 20.693 - type: mrr_at_1 value: 29.186 - type: mrr_at_10 value: 41.22 - type: mrr_at_100 value: 42.16 - type: mrr_at_1000 value: 42.192 - type: mrr_at_3 value: 38.013000000000005 - type: mrr_at_5 value: 39.704 - type: ndcg_at_1 value: 29.186 - type: ndcg_at_10 value: 31.167 - type: ndcg_at_100 value: 38.879000000000005 - type: ndcg_at_1000 value: 42.376000000000005 - type: ndcg_at_3 value: 25.817 - type: ndcg_at_5 value: 27.377000000000002 - type: precision_at_1 value: 29.186 - type: precision_at_10 value: 9.693999999999999 - type: precision_at_100 value: 1.8030000000000002 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 19.11 - type: precision_at_5 value: 14.344999999999999 - type: recall_at_1 value: 13.333999999999998 - type: recall_at_10 value: 37.092000000000006 - type: recall_at_100 value: 63.651 - type: recall_at_1000 value: 83.05 - type: recall_at_3 value: 23.74 - type: recall_at_5 value: 28.655 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.151 - type: map_at_10 value: 19.653000000000002 - type: map_at_100 value: 28.053 - type: map_at_1000 value: 29.709000000000003 - type: map_at_3 value: 14.191 - type: map_at_5 value: 16.456 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.4 - type: mrr_at_100 value: 74.715 - type: mrr_at_1000 value: 74.726 - type: mrr_at_3 value: 72.417 - type: mrr_at_5 value: 73.667 - type: ndcg_at_1 value: 54.25 - type: ndcg_at_10 value: 40.77 - type: ndcg_at_100 value: 46.359 - type: ndcg_at_1000 value: 54.193000000000005 - type: ndcg_at_3 value: 44.832 - type: ndcg_at_5 value: 42.63 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 32.175 - type: precision_at_100 value: 10.668 - type: precision_at_1000 value: 2.067 - type: precision_at_3 value: 47.667 - type: precision_at_5 value: 41.3 - type: recall_at_1 value: 9.151 - type: recall_at_10 value: 25.003999999999998 - type: recall_at_100 value: 52.976 - type: recall_at_1000 value: 78.315 - type: recall_at_3 value: 15.487 - type: recall_at_5 value: 18.999 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.89999999999999 - type: f1 value: 46.47777925067403 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.706 - type: map_at_10 value: 82.423 - type: map_at_100 value: 82.67999999999999 - type: map_at_1000 value: 82.694 - type: map_at_3 value: 81.328 - type: map_at_5 value: 82.001 - type: mrr_at_1 value: 79.613 - type: mrr_at_10 value: 87.07000000000001 - type: mrr_at_100 value: 87.169 - type: mrr_at_1000 value: 87.17 - type: mrr_at_3 value: 86.404 - type: mrr_at_5 value: 86.856 - type: ndcg_at_1 value: 79.613 - type: ndcg_at_10 value: 86.289 - type: ndcg_at_100 value: 87.201 - type: ndcg_at_1000 value: 87.428 - type: ndcg_at_3 value: 84.625 - type: ndcg_at_5 value: 85.53699999999999 - type: precision_at_1 value: 79.613 - type: precision_at_10 value: 10.399 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.473 - type: precision_at_5 value: 20.132 - type: recall_at_1 value: 73.706 - type: recall_at_10 value: 93.559 - type: recall_at_100 value: 97.188 - type: recall_at_1000 value: 98.555 - type: recall_at_3 value: 88.98700000000001 - type: recall_at_5 value: 91.373 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.841 - type: map_at_10 value: 32.643 - type: map_at_100 value: 34.575 - type: map_at_1000 value: 34.736 - type: map_at_3 value: 28.317999999999998 - type: map_at_5 value: 30.964000000000002 - type: mrr_at_1 value: 39.660000000000004 - type: mrr_at_10 value: 48.620000000000005 - type: mrr_at_100 value: 49.384 - type: mrr_at_1000 value: 49.415 - type: mrr_at_3 value: 45.988 - type: mrr_at_5 value: 47.361 - type: ndcg_at_1 value: 39.660000000000004 - type: ndcg_at_10 value: 40.646 - type: ndcg_at_100 value: 47.657 - type: ndcg_at_1000 value: 50.428 - type: ndcg_at_3 value: 36.689 - type: ndcg_at_5 value: 38.211 - type: precision_at_1 value: 39.660000000000004 - type: precision_at_10 value: 11.235000000000001 - type: precision_at_100 value: 1.8530000000000002 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.587999999999997 - type: precision_at_5 value: 18.395 - type: recall_at_1 value: 19.841 - type: recall_at_10 value: 48.135 - type: recall_at_100 value: 74.224 - type: recall_at_1000 value: 90.826 - type: recall_at_3 value: 33.536 - type: recall_at_5 value: 40.311 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.358 - type: map_at_10 value: 64.497 - type: map_at_100 value: 65.362 - type: map_at_1000 value: 65.41900000000001 - type: map_at_3 value: 61.06700000000001 - type: map_at_5 value: 63.317 - type: mrr_at_1 value: 80.716 - type: mrr_at_10 value: 86.10799999999999 - type: mrr_at_100 value: 86.265 - type: mrr_at_1000 value: 86.27 - type: mrr_at_3 value: 85.271 - type: mrr_at_5 value: 85.82499999999999 - type: ndcg_at_1 value: 80.716 - type: ndcg_at_10 value: 72.597 - type: ndcg_at_100 value: 75.549 - type: ndcg_at_1000 value: 76.61 - type: ndcg_at_3 value: 67.874 - type: ndcg_at_5 value: 70.655 - type: precision_at_1 value: 80.716 - type: precision_at_10 value: 15.148 - type: precision_at_100 value: 1.745 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.351 - type: recall_at_1 value: 40.358 - type: recall_at_10 value: 75.739 - type: recall_at_100 value: 87.259 - type: recall_at_1000 value: 94.234 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.878 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.80799999999998 - type: ap value: 86.81350378180757 - type: f1 value: 90.79901248314215 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.096 - type: map_at_10 value: 34.384 - type: map_at_100 value: 35.541 - type: map_at_1000 value: 35.589999999999996 - type: map_at_3 value: 30.496000000000002 - type: map_at_5 value: 32.718 - type: mrr_at_1 value: 22.750999999999998 - type: mrr_at_10 value: 35.024 - type: mrr_at_100 value: 36.125 - type: mrr_at_1000 value: 36.168 - type: mrr_at_3 value: 31.225 - type: mrr_at_5 value: 33.416000000000004 - type: ndcg_at_1 value: 22.750999999999998 - type: ndcg_at_10 value: 41.351 - type: ndcg_at_100 value: 46.92 - type: ndcg_at_1000 value: 48.111 - type: ndcg_at_3 value: 33.439 - type: ndcg_at_5 value: 37.407000000000004 - type: precision_at_1 value: 22.750999999999998 - type: precision_at_10 value: 6.564 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.288 - type: precision_at_5 value: 10.581999999999999 - type: recall_at_1 value: 22.096 - type: recall_at_10 value: 62.771 - type: recall_at_100 value: 88.529 - type: recall_at_1000 value: 97.55 - type: recall_at_3 value: 41.245 - type: recall_at_5 value: 50.788 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.16780665754673 - type: f1 value: 93.96331194859894 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.90606475148198 - type: f1 value: 58.58344986604187 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.14660390047075 - type: f1 value: 74.31533923533614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.16139878950908 - type: f1 value: 80.18532656824924 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.949880906135085 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.56300351524862 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.196521894371315 - type: mrr value: 32.22644231694389 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.783 - type: map_at_10 value: 14.549000000000001 - type: map_at_100 value: 18.433 - type: map_at_1000 value: 19.949 - type: map_at_3 value: 10.936 - type: map_at_5 value: 12.514 - type: mrr_at_1 value: 47.368 - type: mrr_at_10 value: 56.42 - type: mrr_at_100 value: 56.908 - type: mrr_at_1000 value: 56.95 - type: mrr_at_3 value: 54.283 - type: mrr_at_5 value: 55.568 - type: ndcg_at_1 value: 45.666000000000004 - type: ndcg_at_10 value: 37.389 - type: ndcg_at_100 value: 34.253 - type: ndcg_at_1000 value: 43.059999999999995 - type: ndcg_at_3 value: 42.725 - type: ndcg_at_5 value: 40.193 - type: precision_at_1 value: 47.368 - type: precision_at_10 value: 27.988000000000003 - type: precision_at_100 value: 8.672 - type: precision_at_1000 value: 2.164 - type: precision_at_3 value: 40.248 - type: precision_at_5 value: 34.737 - type: recall_at_1 value: 6.783 - type: recall_at_10 value: 17.838 - type: recall_at_100 value: 33.672000000000004 - type: recall_at_1000 value: 66.166 - type: recall_at_3 value: 11.849 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.698999999999998 - type: map_at_10 value: 46.556 - type: map_at_100 value: 47.652 - type: map_at_1000 value: 47.68 - type: map_at_3 value: 42.492000000000004 - type: map_at_5 value: 44.763999999999996 - type: mrr_at_1 value: 35.747 - type: mrr_at_10 value: 49.242999999999995 - type: mrr_at_100 value: 50.052 - type: mrr_at_1000 value: 50.068 - type: mrr_at_3 value: 45.867000000000004 - type: mrr_at_5 value: 47.778999999999996 - type: ndcg_at_1 value: 35.717999999999996 - type: ndcg_at_10 value: 54.14600000000001 - type: ndcg_at_100 value: 58.672999999999995 - type: ndcg_at_1000 value: 59.279 - type: ndcg_at_3 value: 46.407 - type: ndcg_at_5 value: 50.181 - type: precision_at_1 value: 35.717999999999996 - type: precision_at_10 value: 8.844000000000001 - type: precision_at_100 value: 1.139 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.993000000000002 - type: precision_at_5 value: 14.791000000000002 - type: recall_at_1 value: 31.698999999999998 - type: recall_at_10 value: 74.693 - type: recall_at_100 value: 94.15299999999999 - type: recall_at_1000 value: 98.585 - type: recall_at_3 value: 54.388999999999996 - type: recall_at_5 value: 63.08200000000001 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.283 - type: map_at_10 value: 85.24000000000001 - type: map_at_100 value: 85.882 - type: map_at_1000 value: 85.897 - type: map_at_3 value: 82.326 - type: map_at_5 value: 84.177 - type: mrr_at_1 value: 82.21000000000001 - type: mrr_at_10 value: 88.228 - type: mrr_at_100 value: 88.32 - type: mrr_at_1000 value: 88.32 - type: mrr_at_3 value: 87.323 - type: mrr_at_5 value: 87.94800000000001 - type: ndcg_at_1 value: 82.17999999999999 - type: ndcg_at_10 value: 88.9 - type: ndcg_at_100 value: 90.079 - type: ndcg_at_1000 value: 90.158 - type: ndcg_at_3 value: 86.18299999999999 - type: ndcg_at_5 value: 87.71799999999999 - type: precision_at_1 value: 82.17999999999999 - type: precision_at_10 value: 13.464 - type: precision_at_100 value: 1.533 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.693 - type: precision_at_5 value: 24.792 - type: recall_at_1 value: 71.283 - type: recall_at_10 value: 95.742 - type: recall_at_100 value: 99.67200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.888 - type: recall_at_5 value: 92.24 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.24267063669042 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.88056988932578 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.903 - type: map_at_10 value: 13.202 - type: map_at_100 value: 15.5 - type: map_at_1000 value: 15.870999999999999 - type: map_at_3 value: 9.407 - type: map_at_5 value: 11.238 - type: mrr_at_1 value: 24.2 - type: mrr_at_10 value: 35.867 - type: mrr_at_100 value: 37.001 - type: mrr_at_1000 value: 37.043 - type: mrr_at_3 value: 32.5 - type: mrr_at_5 value: 34.35 - type: ndcg_at_1 value: 24.2 - type: ndcg_at_10 value: 21.731 - type: ndcg_at_100 value: 30.7 - type: ndcg_at_1000 value: 36.618 - type: ndcg_at_3 value: 20.72 - type: ndcg_at_5 value: 17.954 - type: precision_at_1 value: 24.2 - type: precision_at_10 value: 11.33 - type: precision_at_100 value: 2.4410000000000003 - type: precision_at_1000 value: 0.386 - type: precision_at_3 value: 19.667 - type: precision_at_5 value: 15.86 - type: recall_at_1 value: 4.903 - type: recall_at_10 value: 22.962 - type: recall_at_100 value: 49.563 - type: recall_at_1000 value: 78.238 - type: recall_at_3 value: 11.953 - type: recall_at_5 value: 16.067999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.12694254604078 - type: cos_sim_spearman value: 80.30141815181918 - type: euclidean_pearson value: 81.34015449877128 - type: euclidean_spearman value: 80.13984197010849 - type: manhattan_pearson value: 81.31767068124086 - type: manhattan_spearman value: 80.11720513114103 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.13112984010417 - type: cos_sim_spearman value: 78.03063573402875 - type: euclidean_pearson value: 83.51928418844804 - type: euclidean_spearman value: 78.4045235411144 - type: manhattan_pearson value: 83.49981637388689 - type: manhattan_spearman value: 78.4042575139372 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.50327987379504 - type: cos_sim_spearman value: 84.18556767756205 - type: euclidean_pearson value: 82.69684424327679 - type: euclidean_spearman value: 83.5368106038335 - type: manhattan_pearson value: 82.57967581007374 - type: manhattan_spearman value: 83.43009053133697 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.50756863007814 - type: cos_sim_spearman value: 82.27204331279108 - type: euclidean_pearson value: 81.39535251429741 - type: euclidean_spearman value: 81.84386626336239 - type: manhattan_pearson value: 81.34281737280695 - type: manhattan_spearman value: 81.81149375673166 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.8727714856726 - type: cos_sim_spearman value: 87.95738287792312 - type: euclidean_pearson value: 86.62920602795887 - type: euclidean_spearman value: 87.05207355381243 - type: manhattan_pearson value: 86.53587918472225 - type: manhattan_spearman value: 86.95382961029586 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.52240359769479 - type: cos_sim_spearman value: 85.47685776238286 - type: euclidean_pearson value: 84.25815333483058 - type: euclidean_spearman value: 85.27415639683198 - type: manhattan_pearson value: 84.29127757025637 - type: manhattan_spearman value: 85.30226224917351 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.42501708915708 - type: cos_sim_spearman value: 86.42276182795041 - type: euclidean_pearson value: 86.5408207354761 - type: euclidean_spearman value: 85.46096321750838 - type: manhattan_pearson value: 86.54177303026881 - type: manhattan_spearman value: 85.50313151916117 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.86521089250766 - type: cos_sim_spearman value: 65.94868540323003 - type: euclidean_pearson value: 67.16569626533084 - type: euclidean_spearman value: 66.37667004134917 - type: manhattan_pearson value: 67.1482365102333 - type: manhattan_spearman value: 66.53240122580029 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.64746265365318 - type: cos_sim_spearman value: 86.41888825906786 - type: euclidean_pearson value: 85.27453642725811 - type: euclidean_spearman value: 85.94095796602544 - type: manhattan_pearson value: 85.28643660505334 - type: manhattan_spearman value: 85.95028003260744 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.48903153618527 - type: mrr value: 96.41081503826601 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 58.594 - type: map_at_10 value: 69.296 - type: map_at_100 value: 69.782 - type: map_at_1000 value: 69.795 - type: map_at_3 value: 66.23 - type: map_at_5 value: 68.293 - type: mrr_at_1 value: 61.667 - type: mrr_at_10 value: 70.339 - type: mrr_at_100 value: 70.708 - type: mrr_at_1000 value: 70.722 - type: mrr_at_3 value: 68.0 - type: mrr_at_5 value: 69.56700000000001 - type: ndcg_at_1 value: 61.667 - type: ndcg_at_10 value: 74.039 - type: ndcg_at_100 value: 76.103 - type: ndcg_at_1000 value: 76.47800000000001 - type: ndcg_at_3 value: 68.967 - type: ndcg_at_5 value: 71.96900000000001 - type: precision_at_1 value: 61.667 - type: precision_at_10 value: 9.866999999999999 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 18.2 - type: recall_at_1 value: 58.594 - type: recall_at_10 value: 87.422 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 74.217 - type: recall_at_5 value: 81.539 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85049504950496 - type: cos_sim_ap value: 96.33111544137081 - type: cos_sim_f1 value: 92.35443037974684 - type: cos_sim_precision value: 93.53846153846153 - type: cos_sim_recall value: 91.2 - type: dot_accuracy value: 99.82376237623762 - type: dot_ap value: 95.38082527310888 - type: dot_f1 value: 90.90909090909092 - type: dot_precision value: 92.90187891440502 - type: dot_recall value: 89.0 - type: euclidean_accuracy value: 99.84851485148515 - type: euclidean_ap value: 96.32316003996347 - type: euclidean_f1 value: 92.2071392659628 - type: euclidean_precision value: 92.71991911021233 - type: euclidean_recall value: 91.7 - type: manhattan_accuracy value: 99.84851485148515 - type: manhattan_ap value: 96.3655668249217 - type: manhattan_f1 value: 92.18356026222895 - type: manhattan_precision value: 92.98067141403867 - type: manhattan_recall value: 91.4 - type: max_accuracy value: 99.85049504950496 - type: max_ap value: 96.3655668249217 - type: max_f1 value: 92.35443037974684 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.94861371629051 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.009430451385 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.61164066427969 - type: mrr value: 55.49710603938544 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.622620124907662 - type: cos_sim_spearman value: 31.0678351356163 - type: dot_pearson value: 30.863727693306814 - type: dot_spearman value: 31.230306567021255 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 2.011 - type: map_at_100 value: 10.974 - type: map_at_1000 value: 25.819 - type: map_at_3 value: 0.6649999999999999 - type: map_at_5 value: 1.076 - type: mrr_at_1 value: 86.0 - type: mrr_at_10 value: 91.8 - type: mrr_at_100 value: 91.8 - type: mrr_at_1000 value: 91.8 - type: mrr_at_3 value: 91.0 - type: mrr_at_5 value: 91.8 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 78.07300000000001 - type: ndcg_at_100 value: 58.231 - type: ndcg_at_1000 value: 51.153000000000006 - type: ndcg_at_3 value: 81.123 - type: ndcg_at_5 value: 81.059 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 83.0 - type: precision_at_100 value: 59.38 - type: precision_at_1000 value: 22.55 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 86.8 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.2079999999999997 - type: recall_at_100 value: 14.069 - type: recall_at_1000 value: 47.678 - type: recall_at_3 value: 0.7040000000000001 - type: recall_at_5 value: 1.161 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.809 - type: map_at_10 value: 10.394 - type: map_at_100 value: 16.598 - type: map_at_1000 value: 18.142 - type: map_at_3 value: 5.572 - type: map_at_5 value: 7.1370000000000005 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 46.564 - type: mrr_at_100 value: 47.469 - type: mrr_at_1000 value: 47.469 - type: mrr_at_3 value: 42.177 - type: mrr_at_5 value: 44.524 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 25.701 - type: ndcg_at_100 value: 37.532 - type: ndcg_at_1000 value: 48.757 - type: ndcg_at_3 value: 28.199999999999996 - type: ndcg_at_5 value: 25.987 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 7.9799999999999995 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.809 - type: recall_at_10 value: 16.887 - type: recall_at_100 value: 48.67 - type: recall_at_1000 value: 82.89699999999999 - type: recall_at_3 value: 6.521000000000001 - type: recall_at_5 value: 9.609 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.57860000000001 - type: ap value: 13.82629211536393 - type: f1 value: 54.59860966183956 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.38030560271647 - type: f1 value: 59.69685552567865 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.4736717043405 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.92853311080646 - type: cos_sim_ap value: 77.67872502591382 - type: cos_sim_f1 value: 70.33941236068895 - type: cos_sim_precision value: 67.63273258645884 - type: cos_sim_recall value: 73.27176781002639 - type: dot_accuracy value: 85.79603027954938 - type: dot_ap value: 73.73786190233379 - type: dot_f1 value: 67.3437901774235 - type: dot_precision value: 65.67201604814443 - type: dot_recall value: 69.10290237467018 - type: euclidean_accuracy value: 86.94045419324074 - type: euclidean_ap value: 77.6687791535167 - type: euclidean_f1 value: 70.47209214023542 - type: euclidean_precision value: 67.7207492094381 - type: euclidean_recall value: 73.45646437994723 - type: manhattan_accuracy value: 86.87488823985218 - type: manhattan_ap value: 77.63373392430728 - type: manhattan_f1 value: 70.40920716112532 - type: manhattan_precision value: 68.31265508684864 - type: manhattan_recall value: 72.63852242744063 - type: max_accuracy value: 86.94045419324074 - type: max_ap value: 77.67872502591382 - type: max_f1 value: 70.47209214023542 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.67155664221679 - type: cos_sim_ap value: 85.64591703003417 - type: cos_sim_f1 value: 77.59531005352656 - type: cos_sim_precision value: 73.60967184801382 - type: cos_sim_recall value: 82.03726516784724 - type: dot_accuracy value: 88.41541506578181 - type: dot_ap value: 84.6482788957769 - type: dot_f1 value: 77.04748541466657 - type: dot_precision value: 74.02440754931176 - type: dot_recall value: 80.3279950723745 - type: euclidean_accuracy value: 88.63080684596576 - type: euclidean_ap value: 85.44570045321562 - type: euclidean_f1 value: 77.28769403336106 - type: euclidean_precision value: 72.90600040958427 - type: euclidean_recall value: 82.22975053895904 - type: manhattan_accuracy value: 88.59393798269105 - type: manhattan_ap value: 85.40271361038187 - type: manhattan_f1 value: 77.17606419344392 - type: manhattan_precision value: 72.4447747078295 - type: manhattan_recall value: 82.5685247921158 - type: max_accuracy value: 88.67155664221679 - type: max_ap value: 85.64591703003417 - type: max_f1 value: 77.59531005352656 license: mit language: - en --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
28
- "model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and semantic similarity. \n\nFeatures: \n- Sentence embedding generation \n- Text classification \n- Retrieval tasks \n- Clustering \n- Semantic similarity (STS) \n- Reranking \n\nComparison: \nOutperforms or competes with other models in MTEB benchmarks for classification, retrieval, and semantic similarity tasks."
 
 
 
 
 
 
29
  }
 
25
  "region:us"
26
  ],
27
  "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.14925373134328 - type: ap value: 39.32336517995478 - type: f1 value: 70.16902252611425 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.386825 - type: ap value: 90.21276917991995 - type: f1 value: 93.37741030006174 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.846000000000004 - type: f1 value: 48.14646269778261 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.754000000000005 - type: map_at_10 value: 55.761 - type: map_at_100 value: 56.330999999999996 - type: map_at_1000 value: 56.333999999999996 - type: map_at_3 value: 51.92 - type: map_at_5 value: 54.010999999999996 - type: mrr_at_1 value: 41.181 - type: mrr_at_10 value: 55.967999999999996 - type: mrr_at_100 value: 56.538 - type: mrr_at_1000 value: 56.542 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.208999999999996 - type: ndcg_at_1 value: 40.754000000000005 - type: ndcg_at_10 value: 63.605000000000004 - type: ndcg_at_100 value: 66.05199999999999 - type: ndcg_at_1000 value: 66.12 - type: ndcg_at_3 value: 55.708 - type: ndcg_at_5 value: 59.452000000000005 - type: precision_at_1 value: 40.754000000000005 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.149000000000001 - type: recall_at_1 value: 40.754000000000005 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 75.747 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.74884539679369 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.8075893810716 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.128470519187736 - type: mrr value: 74.28065778481289 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.24629081484655 - type: cos_sim_spearman value: 86.93752309911496 - type: euclidean_pearson value: 87.58589628573816 - type: euclidean_spearman value: 88.05622328825284 - type: manhattan_pearson value: 87.5594959805773 - type: manhattan_spearman value: 88.19658793233961 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.9512987012987 - type: f1 value: 86.92515357973708 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.10263762928872 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.69711517426737 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.327 - type: map_at_10 value: 44.099 - type: map_at_100 value: 45.525 - type: map_at_1000 value: 45.641999999999996 - type: map_at_3 value: 40.47 - type: map_at_5 value: 42.36 - type: mrr_at_1 value: 39.199 - type: mrr_at_10 value: 49.651 - type: mrr_at_100 value: 50.29 - type: mrr_at_1000 value: 50.329 - type: mrr_at_3 value: 46.924 - type: mrr_at_5 value: 48.548 - type: ndcg_at_1 value: 39.199 - type: ndcg_at_10 value: 50.773 - type: ndcg_at_100 value: 55.67999999999999 - type: ndcg_at_1000 value: 57.495 - type: ndcg_at_3 value: 45.513999999999996 - type: ndcg_at_5 value: 47.703 - type: precision_at_1 value: 39.199 - type: precision_at_10 value: 9.914000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.984 - type: precision_at_5 value: 15.737000000000002 - type: recall_at_1 value: 32.327 - type: recall_at_10 value: 63.743 - type: recall_at_100 value: 84.538 - type: recall_at_1000 value: 96.089 - type: recall_at_3 value: 48.065000000000005 - type: recall_at_5 value: 54.519 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 42.954 - type: map_at_100 value: 44.151 - type: map_at_1000 value: 44.287 - type: map_at_3 value: 39.912 - type: map_at_5 value: 41.798 - type: mrr_at_1 value: 41.465 - type: mrr_at_10 value: 49.351 - type: mrr_at_100 value: 49.980000000000004 - type: mrr_at_1000 value: 50.016000000000005 - type: mrr_at_3 value: 47.144000000000005 - type: mrr_at_5 value: 48.592999999999996 - type: ndcg_at_1 value: 41.465 - type: ndcg_at_10 value: 48.565999999999995 - type: ndcg_at_100 value: 52.76499999999999 - type: ndcg_at_1000 value: 54.749 - type: ndcg_at_3 value: 44.57 - type: ndcg_at_5 value: 46.759 - type: precision_at_1 value: 41.465 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.433 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 21.423000000000002 - type: precision_at_5 value: 15.414 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 57.738 - type: recall_at_100 value: 75.86500000000001 - type: recall_at_1000 value: 88.36 - type: recall_at_3 value: 45.626 - type: recall_at_5 value: 51.812000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.185 - type: map_at_10 value: 53.929 - type: map_at_100 value: 54.92 - type: map_at_1000 value: 54.967999999999996 - type: map_at_3 value: 50.70400000000001 - type: map_at_5 value: 52.673 - type: mrr_at_1 value: 47.398 - type: mrr_at_10 value: 57.303000000000004 - type: mrr_at_100 value: 57.959 - type: mrr_at_1000 value: 57.985 - type: mrr_at_3 value: 54.932 - type: mrr_at_5 value: 56.464999999999996 - type: ndcg_at_1 value: 47.398 - type: ndcg_at_10 value: 59.653 - type: ndcg_at_100 value: 63.627 - type: ndcg_at_1000 value: 64.596 - type: ndcg_at_3 value: 54.455 - type: ndcg_at_5 value: 57.245000000000005 - type: precision_at_1 value: 47.398 - type: precision_at_10 value: 9.524000000000001 - type: precision_at_100 value: 1.243 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.389 - type: precision_at_5 value: 16.752 - type: recall_at_1 value: 41.185 - type: recall_at_10 value: 73.193 - type: recall_at_100 value: 90.357 - type: recall_at_1000 value: 97.253 - type: recall_at_3 value: 59.199999999999996 - type: recall_at_5 value: 66.118 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.27 - type: map_at_10 value: 36.223 - type: map_at_100 value: 37.218 - type: map_at_1000 value: 37.293 - type: map_at_3 value: 33.503 - type: map_at_5 value: 35.097 - type: mrr_at_1 value: 29.492 - type: mrr_at_10 value: 38.352000000000004 - type: mrr_at_100 value: 39.188 - type: mrr_at_1000 value: 39.247 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.401 - type: ndcg_at_1 value: 29.492 - type: ndcg_at_10 value: 41.239 - type: ndcg_at_100 value: 46.066 - type: ndcg_at_1000 value: 47.992000000000004 - type: ndcg_at_3 value: 36.11 - type: ndcg_at_5 value: 38.772 - type: precision_at_1 value: 29.492 - type: precision_at_10 value: 6.260000000000001 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.104000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.27 - type: recall_at_10 value: 54.589 - type: recall_at_100 value: 76.70700000000001 - type: recall_at_1000 value: 91.158 - type: recall_at_3 value: 40.974 - type: recall_at_5 value: 47.327000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.848 - type: map_at_10 value: 26.207 - type: map_at_100 value: 27.478 - type: map_at_1000 value: 27.602 - type: map_at_3 value: 23.405 - type: map_at_5 value: 24.98 - type: mrr_at_1 value: 21.891 - type: mrr_at_10 value: 31.041999999999998 - type: mrr_at_100 value: 32.092 - type: mrr_at_1000 value: 32.151999999999994 - type: mrr_at_3 value: 28.358 - type: mrr_at_5 value: 29.969 - type: ndcg_at_1 value: 21.891 - type: ndcg_at_10 value: 31.585 - type: ndcg_at_100 value: 37.531 - type: ndcg_at_1000 value: 40.256 - type: ndcg_at_3 value: 26.508 - type: ndcg_at_5 value: 28.894 - type: precision_at_1 value: 21.891 - type: precision_at_10 value: 5.795999999999999 - type: precision_at_100 value: 0.9990000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.769 - type: precision_at_5 value: 9.279 - type: recall_at_1 value: 17.848 - type: recall_at_10 value: 43.452 - type: recall_at_100 value: 69.216 - type: recall_at_1000 value: 88.102 - type: recall_at_3 value: 29.18 - type: recall_at_5 value: 35.347 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.94 - type: map_at_10 value: 41.248000000000005 - type: map_at_100 value: 42.495 - type: map_at_1000 value: 42.602000000000004 - type: map_at_3 value: 37.939 - type: map_at_5 value: 39.924 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 47.041 - type: mrr_at_100 value: 47.83 - type: mrr_at_1000 value: 47.878 - type: mrr_at_3 value: 44.466 - type: mrr_at_5 value: 46.111999999999995 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 47.223 - type: ndcg_at_100 value: 52.394 - type: ndcg_at_1000 value: 54.432 - type: ndcg_at_3 value: 42.032000000000004 - type: ndcg_at_5 value: 44.772 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 1.2890000000000001 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 19.698 - type: precision_at_5 value: 14.013 - type: recall_at_1 value: 30.94 - type: recall_at_10 value: 59.316 - type: recall_at_100 value: 80.783 - type: recall_at_1000 value: 94.15400000000001 - type: recall_at_3 value: 44.712 - type: recall_at_5 value: 51.932 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.104 - type: map_at_10 value: 36.675999999999995 - type: map_at_100 value: 38.076 - type: map_at_1000 value: 38.189 - type: map_at_3 value: 33.733999999999995 - type: map_at_5 value: 35.287 - type: mrr_at_1 value: 33.904 - type: mrr_at_10 value: 42.55 - type: mrr_at_100 value: 43.434 - type: mrr_at_1000 value: 43.494 - type: mrr_at_3 value: 40.126 - type: mrr_at_5 value: 41.473 - type: ndcg_at_1 value: 33.904 - type: ndcg_at_10 value: 42.414 - type: ndcg_at_100 value: 48.203 - type: ndcg_at_1000 value: 50.437 - type: ndcg_at_3 value: 37.633 - type: ndcg_at_5 value: 39.67 - type: precision_at_1 value: 33.904 - type: precision_at_10 value: 7.82 - type: precision_at_100 value: 1.2409999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 17.884 - type: precision_at_5 value: 12.648000000000001 - type: recall_at_1 value: 27.104 - type: recall_at_10 value: 53.563 - type: recall_at_100 value: 78.557 - type: recall_at_1000 value: 93.533 - type: recall_at_3 value: 39.92 - type: recall_at_5 value: 45.457 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.707749999999997 - type: map_at_10 value: 36.961 - type: map_at_100 value: 38.158833333333334 - type: map_at_1000 value: 38.270333333333326 - type: map_at_3 value: 34.07183333333334 - type: map_at_5 value: 35.69533333333334 - type: mrr_at_1 value: 32.81875 - type: mrr_at_10 value: 41.293 - type: mrr_at_100 value: 42.116499999999995 - type: mrr_at_1000 value: 42.170249999999996 - type: mrr_at_3 value: 38.83983333333333 - type: mrr_at_5 value: 40.29775 - type: ndcg_at_1 value: 32.81875 - type: ndcg_at_10 value: 42.355 - type: ndcg_at_100 value: 47.41374999999999 - type: ndcg_at_1000 value: 49.5805 - type: ndcg_at_3 value: 37.52825 - type: ndcg_at_5 value: 39.83266666666667 - type: precision_at_1 value: 32.81875 - type: precision_at_10 value: 7.382416666666666 - type: precision_at_100 value: 1.1640833333333334 - type: precision_at_1000 value: 0.15383333333333335 - type: precision_at_3 value: 17.134166666666665 - type: precision_at_5 value: 12.174833333333336 - type: recall_at_1 value: 27.707749999999997 - type: recall_at_10 value: 53.945 - type: recall_at_100 value: 76.191 - type: recall_at_1000 value: 91.101 - type: recall_at_3 value: 40.39083333333334 - type: recall_at_5 value: 46.40083333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.482 - type: map_at_10 value: 33.201 - type: map_at_100 value: 34.107 - type: map_at_1000 value: 34.197 - type: map_at_3 value: 31.174000000000003 - type: map_at_5 value: 32.279 - type: mrr_at_1 value: 29.908 - type: mrr_at_10 value: 36.235 - type: mrr_at_100 value: 37.04 - type: mrr_at_1000 value: 37.105 - type: mrr_at_3 value: 34.355999999999995 - type: mrr_at_5 value: 35.382999999999996 - type: ndcg_at_1 value: 29.908 - type: ndcg_at_10 value: 37.325 - type: ndcg_at_100 value: 41.795 - type: ndcg_at_1000 value: 44.105 - type: ndcg_at_3 value: 33.555 - type: ndcg_at_5 value: 35.266999999999996 - type: precision_at_1 value: 29.908 - type: precision_at_10 value: 5.721 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 14.008000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 26.482 - type: recall_at_10 value: 47.072 - type: recall_at_100 value: 67.27 - type: recall_at_1000 value: 84.371 - type: recall_at_3 value: 36.65 - type: recall_at_5 value: 40.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.815 - type: map_at_10 value: 26.369999999999997 - type: map_at_100 value: 27.458 - type: map_at_1000 value: 27.588 - type: map_at_3 value: 23.990000000000002 - type: map_at_5 value: 25.345000000000002 - type: mrr_at_1 value: 22.953000000000003 - type: mrr_at_10 value: 30.342999999999996 - type: mrr_at_100 value: 31.241000000000003 - type: mrr_at_1000 value: 31.319000000000003 - type: mrr_at_3 value: 28.16 - type: mrr_at_5 value: 29.406 - type: ndcg_at_1 value: 22.953000000000003 - type: ndcg_at_10 value: 31.151 - type: ndcg_at_100 value: 36.309000000000005 - type: ndcg_at_1000 value: 39.227000000000004 - type: ndcg_at_3 value: 26.921 - type: ndcg_at_5 value: 28.938000000000002 - type: precision_at_1 value: 22.953000000000003 - type: precision_at_10 value: 5.602 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 12.606 - type: precision_at_5 value: 9.119 - type: recall_at_1 value: 18.815 - type: recall_at_10 value: 41.574 - type: recall_at_100 value: 64.84400000000001 - type: recall_at_1000 value: 85.406 - type: recall_at_3 value: 29.694 - type: recall_at_5 value: 34.935 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.840999999999998 - type: map_at_10 value: 36.797999999999995 - type: map_at_100 value: 37.993 - type: map_at_1000 value: 38.086999999999996 - type: map_at_3 value: 34.050999999999995 - type: map_at_5 value: 35.379 - type: mrr_at_1 value: 32.649 - type: mrr_at_10 value: 41.025 - type: mrr_at_100 value: 41.878 - type: mrr_at_1000 value: 41.929 - type: mrr_at_3 value: 38.573 - type: mrr_at_5 value: 39.715 - type: ndcg_at_1 value: 32.649 - type: ndcg_at_10 value: 42.142 - type: ndcg_at_100 value: 47.558 - type: ndcg_at_1000 value: 49.643 - type: ndcg_at_3 value: 37.12 - type: ndcg_at_5 value: 38.983000000000004 - type: precision_at_1 value: 32.649 - type: precision_at_10 value: 7.08 - type: precision_at_100 value: 1.1039999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.698 - type: precision_at_5 value: 11.511000000000001 - type: recall_at_1 value: 27.840999999999998 - type: recall_at_10 value: 54.245 - type: recall_at_100 value: 77.947 - type: recall_at_1000 value: 92.36999999999999 - type: recall_at_3 value: 40.146 - type: recall_at_5 value: 44.951 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.529000000000003 - type: map_at_10 value: 35.010000000000005 - type: map_at_100 value: 36.647 - type: map_at_1000 value: 36.857 - type: map_at_3 value: 31.968000000000004 - type: map_at_5 value: 33.554 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 39.550999999999995 - type: mrr_at_100 value: 40.54 - type: mrr_at_1000 value: 40.596 - type: mrr_at_3 value: 36.726 - type: mrr_at_5 value: 38.416 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 40.675 - type: ndcg_at_100 value: 46.548 - type: ndcg_at_1000 value: 49.126 - type: ndcg_at_3 value: 35.829 - type: ndcg_at_5 value: 38.0 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 7.826 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 16.601 - type: precision_at_5 value: 12.095 - type: recall_at_1 value: 26.529000000000003 - type: recall_at_10 value: 51.03 - type: recall_at_100 value: 77.556 - type: recall_at_1000 value: 93.804 - type: recall_at_3 value: 36.986000000000004 - type: recall_at_5 value: 43.096000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.480999999999998 - type: map_at_10 value: 30.817 - type: map_at_100 value: 31.838 - type: map_at_1000 value: 31.932 - type: map_at_3 value: 28.011999999999997 - type: map_at_5 value: 29.668 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 33.072 - type: mrr_at_100 value: 33.926 - type: mrr_at_1000 value: 33.993 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 32.092 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.514 - type: ndcg_at_100 value: 40.489000000000004 - type: ndcg_at_1000 value: 42.908 - type: ndcg_at_3 value: 30.092000000000002 - type: ndcg_at_5 value: 32.989000000000004 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.545 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.446 - type: precision_at_5 value: 9.131 - type: recall_at_1 value: 23.480999999999998 - type: recall_at_10 value: 47.825 - type: recall_at_100 value: 70.652 - type: recall_at_1000 value: 88.612 - type: recall_at_3 value: 33.537 - type: recall_at_5 value: 40.542 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.333999999999998 - type: map_at_10 value: 22.524 - type: map_at_100 value: 24.506 - type: map_at_1000 value: 24.715 - type: map_at_3 value: 19.022 - type: map_at_5 value: 20.693 - type: mrr_at_1 value: 29.186 - type: mrr_at_10 value: 41.22 - type: mrr_at_100 value: 42.16 - type: mrr_at_1000 value: 42.192 - type: mrr_at_3 value: 38.013000000000005 - type: mrr_at_5 value: 39.704 - type: ndcg_at_1 value: 29.186 - type: ndcg_at_10 value: 31.167 - type: ndcg_at_100 value: 38.879000000000005 - type: ndcg_at_1000 value: 42.376000000000005 - type: ndcg_at_3 value: 25.817 - type: ndcg_at_5 value: 27.377000000000002 - type: precision_at_1 value: 29.186 - type: precision_at_10 value: 9.693999999999999 - type: precision_at_100 value: 1.8030000000000002 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 19.11 - type: precision_at_5 value: 14.344999999999999 - type: recall_at_1 value: 13.333999999999998 - type: recall_at_10 value: 37.092000000000006 - type: recall_at_100 value: 63.651 - type: recall_at_1000 value: 83.05 - type: recall_at_3 value: 23.74 - type: recall_at_5 value: 28.655 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.151 - type: map_at_10 value: 19.653000000000002 - type: map_at_100 value: 28.053 - type: map_at_1000 value: 29.709000000000003 - type: map_at_3 value: 14.191 - type: map_at_5 value: 16.456 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.4 - type: mrr_at_100 value: 74.715 - type: mrr_at_1000 value: 74.726 - type: mrr_at_3 value: 72.417 - type: mrr_at_5 value: 73.667 - type: ndcg_at_1 value: 54.25 - type: ndcg_at_10 value: 40.77 - type: ndcg_at_100 value: 46.359 - type: ndcg_at_1000 value: 54.193000000000005 - type: ndcg_at_3 value: 44.832 - type: ndcg_at_5 value: 42.63 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 32.175 - type: precision_at_100 value: 10.668 - type: precision_at_1000 value: 2.067 - type: precision_at_3 value: 47.667 - type: precision_at_5 value: 41.3 - type: recall_at_1 value: 9.151 - type: recall_at_10 value: 25.003999999999998 - type: recall_at_100 value: 52.976 - type: recall_at_1000 value: 78.315 - type: recall_at_3 value: 15.487 - type: recall_at_5 value: 18.999 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.89999999999999 - type: f1 value: 46.47777925067403 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.706 - type: map_at_10 value: 82.423 - type: map_at_100 value: 82.67999999999999 - type: map_at_1000 value: 82.694 - type: map_at_3 value: 81.328 - type: map_at_5 value: 82.001 - type: mrr_at_1 value: 79.613 - type: mrr_at_10 value: 87.07000000000001 - type: mrr_at_100 value: 87.169 - type: mrr_at_1000 value: 87.17 - type: mrr_at_3 value: 86.404 - type: mrr_at_5 value: 86.856 - type: ndcg_at_1 value: 79.613 - type: ndcg_at_10 value: 86.289 - type: ndcg_at_100 value: 87.201 - type: ndcg_at_1000 value: 87.428 - type: ndcg_at_3 value: 84.625 - type: ndcg_at_5 value: 85.53699999999999 - type: precision_at_1 value: 79.613 - type: precision_at_10 value: 10.399 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.473 - type: precision_at_5 value: 20.132 - type: recall_at_1 value: 73.706 - type: recall_at_10 value: 93.559 - type: recall_at_100 value: 97.188 - type: recall_at_1000 value: 98.555 - type: recall_at_3 value: 88.98700000000001 - type: recall_at_5 value: 91.373 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.841 - type: map_at_10 value: 32.643 - type: map_at_100 value: 34.575 - type: map_at_1000 value: 34.736 - type: map_at_3 value: 28.317999999999998 - type: map_at_5 value: 30.964000000000002 - type: mrr_at_1 value: 39.660000000000004 - type: mrr_at_10 value: 48.620000000000005 - type: mrr_at_100 value: 49.384 - type: mrr_at_1000 value: 49.415 - type: mrr_at_3 value: 45.988 - type: mrr_at_5 value: 47.361 - type: ndcg_at_1 value: 39.660000000000004 - type: ndcg_at_10 value: 40.646 - type: ndcg_at_100 value: 47.657 - type: ndcg_at_1000 value: 50.428 - type: ndcg_at_3 value: 36.689 - type: ndcg_at_5 value: 38.211 - type: precision_at_1 value: 39.660000000000004 - type: precision_at_10 value: 11.235000000000001 - type: precision_at_100 value: 1.8530000000000002 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.587999999999997 - type: precision_at_5 value: 18.395 - type: recall_at_1 value: 19.841 - type: recall_at_10 value: 48.135 - type: recall_at_100 value: 74.224 - type: recall_at_1000 value: 90.826 - type: recall_at_3 value: 33.536 - type: recall_at_5 value: 40.311 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.358 - type: map_at_10 value: 64.497 - type: map_at_100 value: 65.362 - type: map_at_1000 value: 65.41900000000001 - type: map_at_3 value: 61.06700000000001 - type: map_at_5 value: 63.317 - type: mrr_at_1 value: 80.716 - type: mrr_at_10 value: 86.10799999999999 - type: mrr_at_100 value: 86.265 - type: mrr_at_1000 value: 86.27 - type: mrr_at_3 value: 85.271 - type: mrr_at_5 value: 85.82499999999999 - type: ndcg_at_1 value: 80.716 - type: ndcg_at_10 value: 72.597 - type: ndcg_at_100 value: 75.549 - type: ndcg_at_1000 value: 76.61 - type: ndcg_at_3 value: 67.874 - type: ndcg_at_5 value: 70.655 - type: precision_at_1 value: 80.716 - type: precision_at_10 value: 15.148 - type: precision_at_100 value: 1.745 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.351 - type: recall_at_1 value: 40.358 - type: recall_at_10 value: 75.739 - type: recall_at_100 value: 87.259 - type: recall_at_1000 value: 94.234 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.878 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.80799999999998 - type: ap value: 86.81350378180757 - type: f1 value: 90.79901248314215 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.096 - type: map_at_10 value: 34.384 - type: map_at_100 value: 35.541 - type: map_at_1000 value: 35.589999999999996 - type: map_at_3 value: 30.496000000000002 - type: map_at_5 value: 32.718 - type: mrr_at_1 value: 22.750999999999998 - type: mrr_at_10 value: 35.024 - type: mrr_at_100 value: 36.125 - type: mrr_at_1000 value: 36.168 - type: mrr_at_3 value: 31.225 - type: mrr_at_5 value: 33.416000000000004 - type: ndcg_at_1 value: 22.750999999999998 - type: ndcg_at_10 value: 41.351 - type: ndcg_at_100 value: 46.92 - type: ndcg_at_1000 value: 48.111 - type: ndcg_at_3 value: 33.439 - type: ndcg_at_5 value: 37.407000000000004 - type: precision_at_1 value: 22.750999999999998 - type: precision_at_10 value: 6.564 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.288 - type: precision_at_5 value: 10.581999999999999 - type: recall_at_1 value: 22.096 - type: recall_at_10 value: 62.771 - type: recall_at_100 value: 88.529 - type: recall_at_1000 value: 97.55 - type: recall_at_3 value: 41.245 - type: recall_at_5 value: 50.788 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.16780665754673 - type: f1 value: 93.96331194859894 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.90606475148198 - type: f1 value: 58.58344986604187 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.14660390047075 - type: f1 value: 74.31533923533614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.16139878950908 - type: f1 value: 80.18532656824924 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.949880906135085 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.56300351524862 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.196521894371315 - type: mrr value: 32.22644231694389 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.783 - type: map_at_10 value: 14.549000000000001 - type: map_at_100 value: 18.433 - type: map_at_1000 value: 19.949 - type: map_at_3 value: 10.936 - type: map_at_5 value: 12.514 - type: mrr_at_1 value: 47.368 - type: mrr_at_10 value: 56.42 - type: mrr_at_100 value: 56.908 - type: mrr_at_1000 value: 56.95 - type: mrr_at_3 value: 54.283 - type: mrr_at_5 value: 55.568 - type: ndcg_at_1 value: 45.666000000000004 - type: ndcg_at_10 value: 37.389 - type: ndcg_at_100 value: 34.253 - type: ndcg_at_1000 value: 43.059999999999995 - type: ndcg_at_3 value: 42.725 - type: ndcg_at_5 value: 40.193 - type: precision_at_1 value: 47.368 - type: precision_at_10 value: 27.988000000000003 - type: precision_at_100 value: 8.672 - type: precision_at_1000 value: 2.164 - type: precision_at_3 value: 40.248 - type: precision_at_5 value: 34.737 - type: recall_at_1 value: 6.783 - type: recall_at_10 value: 17.838 - type: recall_at_100 value: 33.672000000000004 - type: recall_at_1000 value: 66.166 - type: recall_at_3 value: 11.849 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.698999999999998 - type: map_at_10 value: 46.556 - type: map_at_100 value: 47.652 - type: map_at_1000 value: 47.68 - type: map_at_3 value: 42.492000000000004 - type: map_at_5 value: 44.763999999999996 - type: mrr_at_1 value: 35.747 - type: mrr_at_10 value: 49.242999999999995 - type: mrr_at_100 value: 50.052 - type: mrr_at_1000 value: 50.068 - type: mrr_at_3 value: 45.867000000000004 - type: mrr_at_5 value: 47.778999999999996 - type: ndcg_at_1 value: 35.717999999999996 - type: ndcg_at_10 value: 54.14600000000001 - type: ndcg_at_100 value: 58.672999999999995 - type: ndcg_at_1000 value: 59.279 - type: ndcg_at_3 value: 46.407 - type: ndcg_at_5 value: 50.181 - type: precision_at_1 value: 35.717999999999996 - type: precision_at_10 value: 8.844000000000001 - type: precision_at_100 value: 1.139 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.993000000000002 - type: precision_at_5 value: 14.791000000000002 - type: recall_at_1 value: 31.698999999999998 - type: recall_at_10 value: 74.693 - type: recall_at_100 value: 94.15299999999999 - type: recall_at_1000 value: 98.585 - type: recall_at_3 value: 54.388999999999996 - type: recall_at_5 value: 63.08200000000001 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.283 - type: map_at_10 value: 85.24000000000001 - type: map_at_100 value: 85.882 - type: map_at_1000 value: 85.897 - type: map_at_3 value: 82.326 - type: map_at_5 value: 84.177 - type: mrr_at_1 value: 82.21000000000001 - type: mrr_at_10 value: 88.228 - type: mrr_at_100 value: 88.32 - type: mrr_at_1000 value: 88.32 - type: mrr_at_3 value: 87.323 - type: mrr_at_5 value: 87.94800000000001 - type: ndcg_at_1 value: 82.17999999999999 - type: ndcg_at_10 value: 88.9 - type: ndcg_at_100 value: 90.079 - type: ndcg_at_1000 value: 90.158 - type: ndcg_at_3 value: 86.18299999999999 - type: ndcg_at_5 value: 87.71799999999999 - type: precision_at_1 value: 82.17999999999999 - type: precision_at_10 value: 13.464 - type: precision_at_100 value: 1.533 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.693 - type: precision_at_5 value: 24.792 - type: recall_at_1 value: 71.283 - type: recall_at_10 value: 95.742 - type: recall_at_100 value: 99.67200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.888 - type: recall_at_5 value: 92.24 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.24267063669042 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.88056988932578 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.903 - type: map_at_10 value: 13.202 - type: map_at_100 value: 15.5 - type: map_at_1000 value: 15.870999999999999 - type: map_at_3 value: 9.407 - type: map_at_5 value: 11.238 - type: mrr_at_1 value: 24.2 - type: mrr_at_10 value: 35.867 - type: mrr_at_100 value: 37.001 - type: mrr_at_1000 value: 37.043 - type: mrr_at_3 value: 32.5 - type: mrr_at_5 value: 34.35 - type: ndcg_at_1 value: 24.2 - type: ndcg_at_10 value: 21.731 - type: ndcg_at_100 value: 30.7 - type: ndcg_at_1000 value: 36.618 - type: ndcg_at_3 value: 20.72 - type: ndcg_at_5 value: 17.954 - type: precision_at_1 value: 24.2 - type: precision_at_10 value: 11.33 - type: precision_at_100 value: 2.4410000000000003 - type: precision_at_1000 value: 0.386 - type: precision_at_3 value: 19.667 - type: precision_at_5 value: 15.86 - type: recall_at_1 value: 4.903 - type: recall_at_10 value: 22.962 - type: recall_at_100 value: 49.563 - type: recall_at_1000 value: 78.238 - type: recall_at_3 value: 11.953 - type: recall_at_5 value: 16.067999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.12694254604078 - type: cos_sim_spearman value: 80.30141815181918 - type: euclidean_pearson value: 81.34015449877128 - type: euclidean_spearman value: 80.13984197010849 - type: manhattan_pearson value: 81.31767068124086 - type: manhattan_spearman value: 80.11720513114103 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.13112984010417 - type: cos_sim_spearman value: 78.03063573402875 - type: euclidean_pearson value: 83.51928418844804 - type: euclidean_spearman value: 78.4045235411144 - type: manhattan_pearson value: 83.49981637388689 - type: manhattan_spearman value: 78.4042575139372 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.50327987379504 - type: cos_sim_spearman value: 84.18556767756205 - type: euclidean_pearson value: 82.69684424327679 - type: euclidean_spearman value: 83.5368106038335 - type: manhattan_pearson value: 82.57967581007374 - type: manhattan_spearman value: 83.43009053133697 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.50756863007814 - type: cos_sim_spearman value: 82.27204331279108 - type: euclidean_pearson value: 81.39535251429741 - type: euclidean_spearman value: 81.84386626336239 - type: manhattan_pearson value: 81.34281737280695 - type: manhattan_spearman value: 81.81149375673166 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.8727714856726 - type: cos_sim_spearman value: 87.95738287792312 - type: euclidean_pearson value: 86.62920602795887 - type: euclidean_spearman value: 87.05207355381243 - type: manhattan_pearson value: 86.53587918472225 - type: manhattan_spearman value: 86.95382961029586 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.52240359769479 - type: cos_sim_spearman value: 85.47685776238286 - type: euclidean_pearson value: 84.25815333483058 - type: euclidean_spearman value: 85.27415639683198 - type: manhattan_pearson value: 84.29127757025637 - type: manhattan_spearman value: 85.30226224917351 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.42501708915708 - type: cos_sim_spearman value: 86.42276182795041 - type: euclidean_pearson value: 86.5408207354761 - type: euclidean_spearman value: 85.46096321750838 - type: manhattan_pearson value: 86.54177303026881 - type: manhattan_spearman value: 85.50313151916117 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.86521089250766 - type: cos_sim_spearman value: 65.94868540323003 - type: euclidean_pearson value: 67.16569626533084 - type: euclidean_spearman value: 66.37667004134917 - type: manhattan_pearson value: 67.1482365102333 - type: manhattan_spearman value: 66.53240122580029 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.64746265365318 - type: cos_sim_spearman value: 86.41888825906786 - type: euclidean_pearson value: 85.27453642725811 - type: euclidean_spearman value: 85.94095796602544 - type: manhattan_pearson value: 85.28643660505334 - type: manhattan_spearman value: 85.95028003260744 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.48903153618527 - type: mrr value: 96.41081503826601 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 58.594 - type: map_at_10 value: 69.296 - type: map_at_100 value: 69.782 - type: map_at_1000 value: 69.795 - type: map_at_3 value: 66.23 - type: map_at_5 value: 68.293 - type: mrr_at_1 value: 61.667 - type: mrr_at_10 value: 70.339 - type: mrr_at_100 value: 70.708 - type: mrr_at_1000 value: 70.722 - type: mrr_at_3 value: 68.0 - type: mrr_at_5 value: 69.56700000000001 - type: ndcg_at_1 value: 61.667 - type: ndcg_at_10 value: 74.039 - type: ndcg_at_100 value: 76.103 - type: ndcg_at_1000 value: 76.47800000000001 - type: ndcg_at_3 value: 68.967 - type: ndcg_at_5 value: 71.96900000000001 - type: precision_at_1 value: 61.667 - type: precision_at_10 value: 9.866999999999999 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 18.2 - type: recall_at_1 value: 58.594 - type: recall_at_10 value: 87.422 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 74.217 - type: recall_at_5 value: 81.539 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85049504950496 - type: cos_sim_ap value: 96.33111544137081 - type: cos_sim_f1 value: 92.35443037974684 - type: cos_sim_precision value: 93.53846153846153 - type: cos_sim_recall value: 91.2 - type: dot_accuracy value: 99.82376237623762 - type: dot_ap value: 95.38082527310888 - type: dot_f1 value: 90.90909090909092 - type: dot_precision value: 92.90187891440502 - type: dot_recall value: 89.0 - type: euclidean_accuracy value: 99.84851485148515 - type: euclidean_ap value: 96.32316003996347 - type: euclidean_f1 value: 92.2071392659628 - type: euclidean_precision value: 92.71991911021233 - type: euclidean_recall value: 91.7 - type: manhattan_accuracy value: 99.84851485148515 - type: manhattan_ap value: 96.3655668249217 - type: manhattan_f1 value: 92.18356026222895 - type: manhattan_precision value: 92.98067141403867 - type: manhattan_recall value: 91.4 - type: max_accuracy value: 99.85049504950496 - type: max_ap value: 96.3655668249217 - type: max_f1 value: 92.35443037974684 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.94861371629051 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.009430451385 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.61164066427969 - type: mrr value: 55.49710603938544 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.622620124907662 - type: cos_sim_spearman value: 31.0678351356163 - type: dot_pearson value: 30.863727693306814 - type: dot_spearman value: 31.230306567021255 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 2.011 - type: map_at_100 value: 10.974 - type: map_at_1000 value: 25.819 - type: map_at_3 value: 0.6649999999999999 - type: map_at_5 value: 1.076 - type: mrr_at_1 value: 86.0 - type: mrr_at_10 value: 91.8 - type: mrr_at_100 value: 91.8 - type: mrr_at_1000 value: 91.8 - type: mrr_at_3 value: 91.0 - type: mrr_at_5 value: 91.8 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 78.07300000000001 - type: ndcg_at_100 value: 58.231 - type: ndcg_at_1000 value: 51.153000000000006 - type: ndcg_at_3 value: 81.123 - type: ndcg_at_5 value: 81.059 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 83.0 - type: precision_at_100 value: 59.38 - type: precision_at_1000 value: 22.55 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 86.8 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.2079999999999997 - type: recall_at_100 value: 14.069 - type: recall_at_1000 value: 47.678 - type: recall_at_3 value: 0.7040000000000001 - type: recall_at_5 value: 1.161 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.809 - type: map_at_10 value: 10.394 - type: map_at_100 value: 16.598 - type: map_at_1000 value: 18.142 - type: map_at_3 value: 5.572 - type: map_at_5 value: 7.1370000000000005 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 46.564 - type: mrr_at_100 value: 47.469 - type: mrr_at_1000 value: 47.469 - type: mrr_at_3 value: 42.177 - type: mrr_at_5 value: 44.524 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 25.701 - type: ndcg_at_100 value: 37.532 - type: ndcg_at_1000 value: 48.757 - type: ndcg_at_3 value: 28.199999999999996 - type: ndcg_at_5 value: 25.987 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 7.9799999999999995 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.809 - type: recall_at_10 value: 16.887 - type: recall_at_100 value: 48.67 - type: recall_at_1000 value: 82.89699999999999 - type: recall_at_3 value: 6.521000000000001 - type: recall_at_5 value: 9.609 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.57860000000001 - type: ap value: 13.82629211536393 - type: f1 value: 54.59860966183956 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.38030560271647 - type: f1 value: 59.69685552567865 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.4736717043405 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.92853311080646 - type: cos_sim_ap value: 77.67872502591382 - type: cos_sim_f1 value: 70.33941236068895 - type: cos_sim_precision value: 67.63273258645884 - type: cos_sim_recall value: 73.27176781002639 - type: dot_accuracy value: 85.79603027954938 - type: dot_ap value: 73.73786190233379 - type: dot_f1 value: 67.3437901774235 - type: dot_precision value: 65.67201604814443 - type: dot_recall value: 69.10290237467018 - type: euclidean_accuracy value: 86.94045419324074 - type: euclidean_ap value: 77.6687791535167 - type: euclidean_f1 value: 70.47209214023542 - type: euclidean_precision value: 67.7207492094381 - type: euclidean_recall value: 73.45646437994723 - type: manhattan_accuracy value: 86.87488823985218 - type: manhattan_ap value: 77.63373392430728 - type: manhattan_f1 value: 70.40920716112532 - type: manhattan_precision value: 68.31265508684864 - type: manhattan_recall value: 72.63852242744063 - type: max_accuracy value: 86.94045419324074 - type: max_ap value: 77.67872502591382 - type: max_f1 value: 70.47209214023542 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.67155664221679 - type: cos_sim_ap value: 85.64591703003417 - type: cos_sim_f1 value: 77.59531005352656 - type: cos_sim_precision value: 73.60967184801382 - type: cos_sim_recall value: 82.03726516784724 - type: dot_accuracy value: 88.41541506578181 - type: dot_ap value: 84.6482788957769 - type: dot_f1 value: 77.04748541466657 - type: dot_precision value: 74.02440754931176 - type: dot_recall value: 80.3279950723745 - type: euclidean_accuracy value: 88.63080684596576 - type: euclidean_ap value: 85.44570045321562 - type: euclidean_f1 value: 77.28769403336106 - type: euclidean_precision value: 72.90600040958427 - type: euclidean_recall value: 82.22975053895904 - type: manhattan_accuracy value: 88.59393798269105 - type: manhattan_ap value: 85.40271361038187 - type: manhattan_f1 value: 77.17606419344392 - type: manhattan_precision value: 72.4447747078295 - type: manhattan_recall value: 82.5685247921158 - type: max_accuracy value: 88.67155664221679 - type: max_ap value: 85.64591703003417 - type: max_f1 value: 77.59531005352656 license: mit language: - en --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
28
+ "model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and semantic similarity. \n\nFeatures: \n- Sentence embedding generation \n- Text classification \n- Retrieval tasks \n- Clustering \n- Semantic similarity (STS) \n- Reranking \n\nComparison: \nOutperforms or competes with other models in MTEB benchmarks for classification, retrieval, and semantic similarity tasks.",
29
+ "release_year": "2024",
30
+ "parameter_count": null,
31
+ "is_fine_tuned": false,
32
+ "category": "Embedding",
33
+ "model_family": "BERT",
34
+ "api_enhanced": true
35
  }
model_data_json/BAAI_bge-base-en.json CHANGED
@@ -19,5 +19,11 @@
19
  "region:us"
20
  ],
21
  "description": "--- tags: - mteb model-index: - name: bge-base-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.73134328358209 - type: ap value: 38.97277232632892 - type: f1 value: 69.81740361139785 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.56522500000001 - type: ap value: 88.88821771869553 - type: f1 value: 92.54817512659696 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.91 - type: f1 value: 46.28536394320311 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.834 - type: map_at_10 value: 53.564 - type: map_at_100 value: 54.230000000000004 - type: map_at_1000 value: 54.235 - type: map_at_3 value: 49.49 - type: map_at_5 value: 51.784 - type: mrr_at_1 value: 39.26 - type: mrr_at_10 value: 53.744 - type: mrr_at_100 value: 54.410000000000004 - type: mrr_at_1000 value: 54.415 - type: mrr_at_3 value: 49.656 - type: mrr_at_5 value: 52.018 - type: ndcg_at_1 value: 38.834 - type: ndcg_at_10 value: 61.487 - type: ndcg_at_100 value: 64.303 - type: ndcg_at_1000 value: 64.408 - type: ndcg_at_3 value: 53.116 - type: ndcg_at_5 value: 57.248 - type: precision_at_1 value: 38.834 - type: precision_at_10 value: 8.663 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.218999999999998 - type: precision_at_5 value: 14.737 - type: recall_at_1 value: 38.834 - type: recall_at_10 value: 86.629 - type: recall_at_100 value: 98.86200000000001 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 63.656 - type: recall_at_5 value: 73.68400000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.88475477433035 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.85053138403176 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.23221013208242 - type: mrr value: 74.64857318735436 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.4403443247284 - type: cos_sim_spearman value: 85.5326718115169 - type: euclidean_pearson value: 86.0114007449595 - type: euclidean_spearman value: 86.05979225604875 - type: manhattan_pearson value: 86.05423806568598 - type: manhattan_spearman value: 86.02485170086835 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.44480519480518 - type: f1 value: 86.41301900941988 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.17547250880036 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.74514172687293 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.096000000000004 - type: map_at_10 value: 43.345 - type: map_at_100 value: 44.73 - type: map_at_1000 value: 44.85 - type: map_at_3 value: 39.956 - type: map_at_5 value: 41.727 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.742000000000004 - type: mrr_at_100 value: 49.474000000000004 - type: mrr_at_1000 value: 49.513 - type: mrr_at_3 value: 46.161 - type: mrr_at_5 value: 47.721000000000004 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.464999999999996 - type: ndcg_at_100 value: 54.632000000000005 - type: ndcg_at_1000 value: 56.52 - type: ndcg_at_3 value: 44.687 - type: ndcg_at_5 value: 46.814 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.471 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.079 - type: recall_at_1 value: 32.096000000000004 - type: recall_at_10 value: 60.99099999999999 - type: recall_at_100 value: 83.075 - type: recall_at_1000 value: 95.178 - type: recall_at_3 value: 47.009 - type: recall_at_5 value: 53.348 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.588 - type: map_at_10 value: 42.251 - type: map_at_100 value: 43.478 - type: map_at_1000 value: 43.617 - type: map_at_3 value: 39.381 - type: map_at_5 value: 41.141 - type: mrr_at_1 value: 41.21 - type: mrr_at_10 value: 48.765 - type: mrr_at_100 value: 49.403000000000006 - type: mrr_at_1000 value: 49.451 - type: mrr_at_3 value: 46.73 - type: mrr_at_5 value: 47.965999999999994 - type: ndcg_at_1 value: 41.21 - type: ndcg_at_10 value: 47.704 - type: ndcg_at_100 value: 51.916 - type: ndcg_at_1000 value: 54.013999999999996 - type: ndcg_at_3 value: 44.007000000000005 - type: ndcg_at_5 value: 45.936 - type: precision_at_1 value: 41.21 - type: precision_at_10 value: 8.885 - type: precision_at_100 value: 1.409 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 21.274 - type: precision_at_5 value: 15.045 - type: recall_at_1 value: 32.588 - type: recall_at_10 value: 56.333 - type: recall_at_100 value: 74.251 - type: recall_at_1000 value: 87.518 - type: recall_at_3 value: 44.962 - type: recall_at_5 value: 50.609 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.308 - type: map_at_10 value: 53.12 - type: map_at_100 value: 54.123 - type: map_at_1000 value: 54.173 - type: map_at_3 value: 50.017999999999994 - type: map_at_5 value: 51.902 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 56.531 - type: mrr_at_100 value: 57.19800000000001 - type: mrr_at_1000 value: 57.225 - type: mrr_at_3 value: 54.368 - type: mrr_at_5 value: 55.713 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 58.811 - type: ndcg_at_100 value: 62.834 - type: ndcg_at_1000 value: 63.849999999999994 - type: ndcg_at_3 value: 53.88699999999999 - type: ndcg_at_5 value: 56.477999999999994 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.398 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 24.221999999999998 - type: precision_at_5 value: 16.539 - type: recall_at_1 value: 40.308 - type: recall_at_10 value: 72.146 - type: recall_at_100 value: 89.60900000000001 - type: recall_at_1000 value: 96.733 - type: recall_at_3 value: 58.91499999999999 - type: recall_at_5 value: 65.34299999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.383000000000003 - type: map_at_10 value: 35.802 - type: map_at_100 value: 36.756 - type: map_at_1000 value: 36.826 - type: map_at_3 value: 32.923 - type: map_at_5 value: 34.577999999999996 - type: mrr_at_1 value: 29.604999999999997 - type: mrr_at_10 value: 37.918 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.786 - type: mrr_at_3 value: 35.198 - type: mrr_at_5 value: 36.808 - type: ndcg_at_1 value: 29.604999999999997 - type: ndcg_at_10 value: 40.836 - type: ndcg_at_100 value: 45.622 - type: ndcg_at_1000 value: 47.427 - type: ndcg_at_3 value: 35.208 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 29.604999999999997 - type: precision_at_10 value: 6.226 - type: precision_at_100 value: 0.9079999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 14.463000000000001 - type: precision_at_5 value: 10.35 - type: recall_at_1 value: 27.383000000000003 - type: recall_at_10 value: 54.434000000000005 - type: recall_at_100 value: 76.632 - type: recall_at_1000 value: 90.25 - type: recall_at_3 value: 39.275 - type: recall_at_5 value: 46.225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.885 - type: map_at_10 value: 25.724000000000004 - type: map_at_100 value: 26.992 - type: map_at_1000 value: 27.107999999999997 - type: map_at_3 value: 23.04 - type: map_at_5 value: 24.529 - type: mrr_at_1 value: 22.264 - type: mrr_at_10 value: 30.548 - type: mrr_at_100 value: 31.593 - type: mrr_at_1000 value: 31.657999999999998 - type: mrr_at_3 value: 27.756999999999998 - type: mrr_at_5 value: 29.398999999999997 - type: ndcg_at_1 value: 22.264 - type: ndcg_at_10 value: 30.902 - type: ndcg_at_100 value: 36.918 - type: ndcg_at_1000 value: 39.735 - type: ndcg_at_3 value: 25.915 - type: ndcg_at_5 value: 28.255999999999997 - type: precision_at_1 value: 22.264 - type: precision_at_10 value: 5.634 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 12.396 - type: precision_at_5 value: 9.055 - type: recall_at_1 value: 17.885 - type: recall_at_10 value: 42.237 - type: recall_at_100 value: 68.489 - type: recall_at_1000 value: 88.721 - type: recall_at_3 value: 28.283 - type: recall_at_5 value: 34.300000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.737000000000002 - type: map_at_10 value: 39.757 - type: map_at_100 value: 40.992 - type: map_at_1000 value: 41.102 - type: map_at_3 value: 36.612 - type: map_at_5 value: 38.413000000000004 - type: mrr_at_1 value: 35.804 - type: mrr_at_10 value: 45.178000000000004 - type: mrr_at_100 value: 45.975 - type: mrr_at_1000 value: 46.021 - type: mrr_at_3 value: 42.541000000000004 - type: mrr_at_5 value: 44.167 - type: ndcg_at_1 value: 35.804 - type: ndcg_at_10 value: 45.608 - type: ndcg_at_100 value: 50.746 - type: ndcg_at_1000 value: 52.839999999999996 - type: ndcg_at_3 value: 40.52 - type: ndcg_at_5 value: 43.051 - type: precision_at_1 value: 35.804 - type: precision_at_10 value: 8.104 - type: precision_at_100 value: 1.256 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.121 - type: precision_at_5 value: 13.532 - type: recall_at_1 value: 29.737000000000002 - type: recall_at_10 value: 57.66 - type: recall_at_100 value: 79.121 - type: recall_at_1000 value: 93.023 - type: recall_at_3 value: 43.13 - type: recall_at_5 value: 49.836000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.299 - type: map_at_10 value: 35.617 - type: map_at_100 value: 36.972 - type: map_at_1000 value: 37.096000000000004 - type: map_at_3 value: 32.653999999999996 - type: map_at_5 value: 34.363 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 41.423 - type: mrr_at_100 value: 42.333999999999996 - type: mrr_at_1000 value: 42.398 - type: mrr_at_3 value: 39.193 - type: mrr_at_5 value: 40.426 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 41.271 - type: ndcg_at_100 value: 46.843 - type: ndcg_at_1000 value: 49.366 - type: ndcg_at_3 value: 36.735 - type: ndcg_at_5 value: 38.775999999999996 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.580000000000001 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.541999999999998 - type: precision_at_5 value: 12.443 - type: recall_at_1 value: 26.299 - type: recall_at_10 value: 52.256 - type: recall_at_100 value: 75.919 - type: recall_at_1000 value: 93.185 - type: recall_at_3 value: 39.271 - type: recall_at_5 value: 44.901 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.05741666666667 - type: map_at_10 value: 36.086416666666665 - type: map_at_100 value: 37.26916666666667 - type: map_at_1000 value: 37.38191666666666 - type: map_at_3 value: 33.34225 - type: map_at_5 value: 34.86425 - type: mrr_at_1 value: 32.06008333333333 - type: mrr_at_10 value: 40.36658333333333 - type: mrr_at_100 value: 41.206500000000005 - type: mrr_at_1000 value: 41.261083333333325 - type: mrr_at_3 value: 38.01208333333334 - type: mrr_at_5 value: 39.36858333333333 - type: ndcg_at_1 value: 32.06008333333333 - type: ndcg_at_10 value: 41.3535 - type: ndcg_at_100 value: 46.42066666666666 - type: ndcg_at_1000 value: 48.655166666666666 - type: ndcg_at_3 value: 36.78041666666667 - type: ndcg_at_5 value: 38.91783333333334 - type: precision_at_1 value: 32.06008333333333 - type: precision_at_10 value: 7.169833333333332 - type: precision_at_100 value: 1.1395 - type: precision_at_1000 value: 0.15158333333333332 - type: precision_at_3 value: 16.852 - type: precision_at_5 value: 11.8645 - type: recall_at_1 value: 27.05741666666667 - type: recall_at_10 value: 52.64491666666666 - type: recall_at_100 value: 74.99791666666667 - type: recall_at_1000 value: 90.50524999999999 - type: recall_at_3 value: 39.684000000000005 - type: recall_at_5 value: 45.37225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.607999999999997 - type: map_at_10 value: 32.28 - type: map_at_100 value: 33.261 - type: map_at_1000 value: 33.346 - type: map_at_3 value: 30.514999999999997 - type: map_at_5 value: 31.415 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.384 - type: mrr_at_100 value: 36.24 - type: mrr_at_1000 value: 36.299 - type: mrr_at_3 value: 33.717000000000006 - type: mrr_at_5 value: 34.507 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 36.248000000000005 - type: ndcg_at_100 value: 41.034 - type: ndcg_at_1000 value: 43.35 - type: ndcg_at_3 value: 32.987 - type: ndcg_at_5 value: 34.333999999999996 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.506 - type: precision_at_100 value: 0.853 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.11 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 25.607999999999997 - type: recall_at_10 value: 45.344 - type: recall_at_100 value: 67.132 - type: recall_at_1000 value: 84.676 - type: recall_at_3 value: 36.02 - type: recall_at_5 value: 39.613 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.44 - type: map_at_10 value: 25.651000000000003 - type: map_at_100 value: 26.735 - type: map_at_1000 value: 26.86 - type: map_at_3 value: 23.409 - type: map_at_5 value: 24.604 - type: mrr_at_1 value: 22.195 - type: mrr_at_10 value: 29.482000000000003 - type: mrr_at_100 value: 30.395 - type: mrr_at_1000 value: 30.471999999999998 - type: mrr_at_3 value: 27.409 - type: mrr_at_5 value: 28.553 - type: ndcg_at_1 value: 22.195 - type: ndcg_at_10 value: 30.242 - type: ndcg_at_100 value: 35.397 - type: ndcg_at_1000 value: 38.287 - type: ndcg_at_3 value: 26.201 - type: ndcg_at_5 value: 28.008 - type: precision_at_1 value: 22.195 - type: precision_at_10 value: 5.372 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 12.228 - type: precision_at_5 value: 8.727 - type: recall_at_1 value: 18.44 - type: recall_at_10 value: 40.325 - type: recall_at_100 value: 63.504000000000005 - type: recall_at_1000 value: 83.909 - type: recall_at_3 value: 28.925 - type: recall_at_5 value: 33.641 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.535999999999998 - type: map_at_10 value: 35.358000000000004 - type: map_at_100 value: 36.498999999999995 - type: map_at_1000 value: 36.597 - type: map_at_3 value: 32.598 - type: map_at_5 value: 34.185 - type: mrr_at_1 value: 31.25 - type: mrr_at_10 value: 39.593 - type: mrr_at_100 value: 40.443 - type: mrr_at_1000 value: 40.498 - type: mrr_at_3 value: 37.018 - type: mrr_at_5 value: 38.492 - type: ndcg_at_1 value: 31.25 - type: ndcg_at_10 value: 40.71 - type: ndcg_at_100 value: 46.079 - type: ndcg_at_1000 value: 48.287 - type: ndcg_at_3 value: 35.667 - type: ndcg_at_5 value: 38.080000000000005 - type: precision_at_1 value: 31.25 - type: precision_at_10 value: 6.847 - type: precision_at_100 value: 1.079 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 16.262 - type: precision_at_5 value: 11.455 - type: recall_at_1 value: 26.535999999999998 - type: recall_at_10 value: 52.92099999999999 - type: recall_at_100 value: 76.669 - type: recall_at_1000 value: 92.096 - type: recall_at_3 value: 38.956 - type: recall_at_5 value: 45.239000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.691 - type: map_at_10 value: 33.417 - type: map_at_100 value: 35.036 - type: map_at_1000 value: 35.251 - type: map_at_3 value: 30.646 - type: map_at_5 value: 32.177 - type: mrr_at_1 value: 30.04 - type: mrr_at_10 value: 37.905 - type: mrr_at_100 value: 38.929 - type: mrr_at_1000 value: 38.983000000000004 - type: mrr_at_3 value: 35.276999999999994 - type: mrr_at_5 value: 36.897000000000006 - type: ndcg_at_1 value: 30.04 - type: ndcg_at_10 value: 39.037 - type: ndcg_at_100 value: 44.944 - type: ndcg_at_1000 value: 47.644 - type: ndcg_at_3 value: 34.833999999999996 - type: ndcg_at_5 value: 36.83 - type: precision_at_1 value: 30.04 - type: precision_at_10 value: 7.4510000000000005 - type: precision_at_100 value: 1.492 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 11.897 - type: recall_at_1 value: 24.691 - type: recall_at_10 value: 49.303999999999995 - type: recall_at_100 value: 76.20400000000001 - type: recall_at_1000 value: 93.30000000000001 - type: recall_at_3 value: 36.594 - type: recall_at_5 value: 42.41 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.118 - type: map_at_10 value: 30.714999999999996 - type: map_at_100 value: 31.656000000000002 - type: map_at_1000 value: 31.757 - type: map_at_3 value: 28.355000000000004 - type: map_at_5 value: 29.337000000000003 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 32.93 - type: mrr_at_100 value: 33.762 - type: mrr_at_1000 value: 33.829 - type: mrr_at_3 value: 30.775999999999996 - type: mrr_at_5 value: 31.774 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.408 - type: ndcg_at_100 value: 40.083 - type: ndcg_at_1000 value: 42.542 - type: ndcg_at_3 value: 30.717 - type: ndcg_at_5 value: 32.385000000000005 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.843 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.001 - type: precision_at_5 value: 8.834999999999999 - type: recall_at_1 value: 23.118 - type: recall_at_10 value: 47.788000000000004 - type: recall_at_100 value: 69.37 - type: recall_at_1000 value: 87.47399999999999 - type: recall_at_3 value: 34.868 - type: recall_at_5 value: 39.001999999999995 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 14.288 - type: map_at_10 value: 23.256 - type: map_at_100 value: 25.115 - type: map_at_1000 value: 25.319000000000003 - type: map_at_3 value: 20.005 - type: map_at_5 value: 21.529999999999998 - type: mrr_at_1 value: 31.401 - type: mrr_at_10 value: 42.251 - type: mrr_at_100 value: 43.236999999999995 - type: mrr_at_1000 value: 43.272 - type: mrr_at_3 value: 39.164 - type: mrr_at_5 value: 40.881 - type: ndcg_at_1 value: 31.401 - type: ndcg_at_10 value: 31.615 - type: ndcg_at_100 value: 38.982 - type: ndcg_at_1000 value: 42.496 - type: ndcg_at_3 value: 26.608999999999998 - type: ndcg_at_5 value: 28.048000000000002 - type: precision_at_1 value: 31.401 - type: precision_at_10 value: 9.536999999999999 - type: precision_at_100 value: 1.763 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 19.153000000000002 - type: precision_at_5 value: 14.228 - type: recall_at_1 value: 14.288 - type: recall_at_10 value: 36.717 - type: recall_at_100 value: 61.9 - type: recall_at_1000 value: 81.676 - type: recall_at_3 value: 24.203 - type: recall_at_5 value: 28.793999999999997 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.019 - type: map_at_10 value: 19.963 - type: map_at_100 value: 28.834 - type: map_at_1000 value: 30.537999999999997 - type: map_at_3 value: 14.45 - type: map_at_5 value: 16.817999999999998 - type: mrr_at_1 value: 65.75 - type: mrr_at_10 value: 74.646 - type: mrr_at_100 value: 74.946 - type: mrr_at_1000 value: 74.95100000000001 - type: mrr_at_3 value: 72.625 - type: mrr_at_5 value: 74.012 - type: ndcg_at_1 value: 54 - type: ndcg_at_10 value: 42.014 - type: ndcg_at_100 value: 47.527 - type: ndcg_at_1000 value: 54.911 - type: ndcg_at_3 value: 46.586 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 65.75 - type: precision_at_10 value: 33.475 - type: precision_at_100 value: 11.16 - type: precision_at_1000 value: 2.145 - type: precision_at_3 value: 50.083 - type: precision_at_5 value: 42.55 - type: recall_at_1 value: 9.019 - type: recall_at_10 value: 25.558999999999997 - type: recall_at_100 value: 53.937999999999995 - type: recall_at_1000 value: 77.67399999999999 - type: recall_at_3 value: 15.456 - type: recall_at_5 value: 19.259 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.635 - type: f1 value: 47.692783881403926 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 76.893 - type: map_at_10 value: 84.897 - type: map_at_100 value: 85.122 - type: map_at_1000 value: 85.135 - type: map_at_3 value: 83.88 - type: map_at_5 value: 84.565 - type: mrr_at_1 value: 83.003 - type: mrr_at_10 value: 89.506 - type: mrr_at_100 value: 89.574 - type: mrr_at_1000 value: 89.575 - type: mrr_at_3 value: 88.991 - type: mrr_at_5 value: 89.349 - type: ndcg_at_1 value: 83.003 - type: ndcg_at_10 value: 88.351 - type: ndcg_at_100 value: 89.128 - type: ndcg_at_1000 value: 89.34100000000001 - type: ndcg_at_3 value: 86.92 - type: ndcg_at_5 value: 87.78200000000001 - type: precision_at_1 value: 83.003 - type: precision_at_10 value: 10.517999999999999 - type: precision_at_100 value: 1.115 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.062999999999995 - type: precision_at_5 value: 20.498 - type: recall_at_1 value: 76.893 - type: recall_at_10 value: 94.374 - type: recall_at_100 value: 97.409 - type: recall_at_1000 value: 98.687 - type: recall_at_3 value: 90.513 - type: recall_at_5 value: 92.709 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.829 - type: map_at_10 value: 32.86 - type: map_at_100 value: 34.838 - type: map_at_1000 value: 35.006 - type: map_at_3 value: 28.597 - type: map_at_5 value: 31.056 - type: mrr_at_1 value: 41.358 - type: mrr_at_10 value: 49.542 - type: mrr_at_100 value: 50.29900000000001 - type: mrr_at_1000 value: 50.334999999999994 - type: mrr_at_3 value: 46.579 - type: mrr_at_5 value: 48.408 - type: ndcg_at_1 value: 41.358 - type: ndcg_at_10 value: 40.758 - type: ndcg_at_100 value: 47.799 - type: ndcg_at_1000 value: 50.589 - type: ndcg_at_3 value: 36.695 - type: ndcg_at_5 value: 38.193 - type: precision_at_1 value: 41.358 - type: precision_at_10 value: 11.142000000000001 - type: precision_at_100 value: 1.8350000000000002 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 24.023 - type: precision_at_5 value: 17.963 - type: recall_at_1 value: 20.829 - type: recall_at_10 value: 47.467999999999996 - type: recall_at_100 value: 73.593 - type: recall_at_1000 value: 90.122 - type: recall_at_3 value: 32.74 - type: recall_at_5 value: 39.608 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.324 - type: map_at_10 value: 64.183 - type: map_at_100 value: 65.037 - type: map_at_1000 value: 65.094 - type: map_at_3 value: 60.663 - type: map_at_5 value: 62.951 - type: mrr_at_1 value: 80.648 - type: mrr_at_10 value: 86.005 - type: mrr_at_100 value: 86.157 - type: mrr_at_1000 value: 86.162 - type: mrr_at_3 value: 85.116 - type: mrr_at_5 value: 85.703 - type: ndcg_at_1 value: 80.648 - type: ndcg_at_10 value: 72.351 - type: ndcg_at_100 value: 75.279 - type: ndcg_at_1000 value: 76.357 - type: ndcg_at_3 value: 67.484 - type: ndcg_at_5 value: 70.31500000000001 - type: precision_at_1 value: 80.648 - type: precision_at_10 value: 15.103 - type: precision_at_100 value: 1.7399999999999998 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.232 - type: precision_at_5 value: 28.165000000000003 - type: recall_at_1 value: 40.324 - type: recall_at_10 value: 75.517 - type: recall_at_100 value: 86.982 - type: recall_at_1000 value: 94.072 - type: recall_at_3 value: 64.848 - type: recall_at_5 value: 70.41199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.4 - type: ap value: 87.4422032289312 - type: f1 value: 91.39249564302281 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.03 - type: map_at_10 value: 34.402 - type: map_at_100 value: 35.599 - type: map_at_1000 value: 35.648 - type: map_at_3 value: 30.603 - type: map_at_5 value: 32.889 - type: mrr_at_1 value: 22.679 - type: mrr_at_10 value: 35.021 - type: mrr_at_100 value: 36.162 - type: mrr_at_1000 value: 36.205 - type: mrr_at_3 value: 31.319999999999997 - type: mrr_at_5 value: 33.562 - type: ndcg_at_1 value: 22.692999999999998 - type: ndcg_at_10 value: 41.258 - type: ndcg_at_100 value: 46.967 - type: ndcg_at_1000 value: 48.175000000000004 - type: ndcg_at_3 value: 33.611000000000004 - type: ndcg_at_5 value: 37.675 - type: precision_at_1 value: 22.692999999999998 - type: precision_at_10 value: 6.5089999999999995 - type: precision_at_100 value: 0.936 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.413 - type: precision_at_5 value: 10.702 - type: recall_at_1 value: 22.03 - type: recall_at_10 value: 62.248000000000005 - type: recall_at_100 value: 88.524 - type: recall_at_1000 value: 97.714 - type: recall_at_3 value: 41.617 - type: recall_at_5 value: 51.359 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.36844505243957 - type: f1 value: 94.12408743818202 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.43410852713177 - type: f1 value: 58.501855709435624 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.04909213180902 - type: f1 value: 74.1800860395823 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.76126429051781 - type: f1 value: 79.85705217473232 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.70119520292863 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.33544316467486 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.75499243990726 - type: mrr value: 31.70602251821063 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.451999999999999 - type: map_at_10 value: 13.918 - type: map_at_100 value: 17.316000000000003 - type: map_at_1000 value: 18.747 - type: map_at_3 value: 10.471 - type: map_at_5 value: 12.104 - type: mrr_at_1 value: 46.749 - type: mrr_at_10 value: 55.717000000000006 - type: mrr_at_100 value: 56.249 - type: mrr_at_1000 value: 56.288000000000004 - type: mrr_at_3 value: 53.818 - type: mrr_at_5 value: 55.103 - type: ndcg_at_1 value: 45.201 - type: ndcg_at_10 value: 35.539 - type: ndcg_at_100 value: 32.586 - type: ndcg_at_1000 value: 41.486000000000004 - type: ndcg_at_3 value: 41.174 - type: ndcg_at_5 value: 38.939 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 25.944 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.076 - type: precision_at_3 value: 38.7 - type: precision_at_5 value: 33.56 - type: recall_at_1 value: 6.451999999999999 - type: recall_at_10 value: 17.302 - type: recall_at_100 value: 32.14 - type: recall_at_1000 value: 64.12 - type: recall_at_3 value: 11.219 - type: recall_at_5 value: 13.993 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 32.037 - type: map_at_10 value: 46.565 - type: map_at_100 value: 47.606 - type: map_at_1000 value: 47.636 - type: map_at_3 value: 42.459 - type: map_at_5 value: 44.762 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 49.291000000000004 - type: mrr_at_100 value: 50.059 - type: mrr_at_1000 value: 50.078 - type: mrr_at_3 value: 45.829 - type: mrr_at_5 value: 47.797 - type: ndcg_at_1 value: 36.153 - type: ndcg_at_10 value: 53.983000000000004 - type: ndcg_at_100 value: 58.347 - type: ndcg_at_1000 value: 59.058 - type: ndcg_at_3 value: 46.198 - type: ndcg_at_5 value: 50.022 - type: precision_at_1 value: 36.153 - type: precision_at_10 value: 8.763 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.751 - type: precision_at_5 value: 14.646999999999998 - type: recall_at_1 value: 32.037 - type: recall_at_10 value: 74.008 - type: recall_at_100 value: 92.893 - type: recall_at_1000 value: 98.16 - type: recall_at_3 value: 53.705999999999996 - type: recall_at_5 value: 62.495 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.152 - type: map_at_10 value: 85.104 - type: map_at_100 value: 85.745 - type: map_at_1000 value: 85.761 - type: map_at_3 value: 82.175 - type: map_at_5 value: 84.066 - type: mrr_at_1 value: 82.03 - type: mrr_at_10 value: 88.115 - type: mrr_at_100 value: 88.21 - type: mrr_at_1000 value: 88.211 - type: mrr_at_3 value: 87.19200000000001 - type: mrr_at_5 value: 87.85 - type: ndcg_at_1 value: 82.03 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.96300000000001 - type: ndcg_at_1000 value: 90.056 - type: ndcg_at_3 value: 86.051 - type: ndcg_at_5 value: 87.63499999999999 - type: precision_at_1 value: 82.03 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.627 - type: precision_at_5 value: 24.784 - type: recall_at_1 value: 71.152 - type: recall_at_10 value: 95.649 - type: recall_at_100 value: 99.58200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.767 - type: recall_at_5 value: 92.233 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.48713646277477 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.394940772438545 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.043 - type: map_at_10 value: 12.949 - type: map_at_100 value: 15.146 - type: map_at_1000 value: 15.495000000000001 - type: map_at_3 value: 9.333 - type: map_at_5 value: 11.312999999999999 - type: mrr_at_1 value: 24.9 - type: mrr_at_10 value: 35.958 - type: mrr_at_100 value: 37.152 - type: mrr_at_1000 value: 37.201 - type: mrr_at_3 value: 32.667 - type: mrr_at_5 value: 34.567 - type: ndcg_at_1 value: 24.9 - type: ndcg_at_10 value: 21.298000000000002 - type: ndcg_at_100 value: 29.849999999999998 - type: ndcg_at_1000 value: 35.506 - type: ndcg_at_3 value: 20.548 - type: ndcg_at_5 value: 18.064 - type: precision_at_1 value: 24.9 - type: precision_at_10 value: 10.9 - type: precision_at_100 value: 2.331 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 15.939999999999998 - type: recall_at_1 value: 5.043 - type: recall_at_10 value: 22.092 - type: recall_at_100 value: 47.323 - type: recall_at_1000 value: 74.553 - type: recall_at_3 value: 11.728 - type: recall_at_5 value: 16.188 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.7007085938325 - type: cos_sim_spearman value: 80.0171084446234 - type: euclidean_pearson value: 81.28133218355893 - type: euclidean_spearman value: 79.99291731740131 - type: manhattan_pearson value: 81.22926922327846 - type: manhattan_spearman value: 79.94444878127038 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.7411883252923 - type: cos_sim_spearman value: 77.93462937801245 - type: euclidean_pearson value: 83.00858563882404 - type: euclidean_spearman value: 77.82717362433257 - type: manhattan_pearson value: 82.92887645790769 - type: manhattan_spearman value: 77.78807488222115 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.04222459361023 - type: cos_sim_spearman value: 83.85931509330395 - type: euclidean_pearson value: 83.26916063876055 - type: euclidean_spearman value: 83.98621985648353 - type: manhattan_pearson value: 83.14935679184327 - type: manhattan_spearman value: 83.87938828586304 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.41136639535318 - type: cos_sim_spearman value: 81.51200091040481 - type: euclidean_pearson value: 81.45382456114775 - type: euclidean_spearman value: 81.46201181707931 - type: manhattan_pearson value: 81.37243088439584 - type: manhattan_spearman value: 81.39828421893426 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.71942451732227 - type: cos_sim_spearman value: 87.33044482064973 - type: euclidean_pearson value: 86.58580899365178 - type: euclidean_spearman value: 87.09206723832895 - type: manhattan_pearson value: 86.47460784157013 - type: manhattan_spearman value: 86.98367656583076 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.55868078863449 - type: cos_sim_spearman value: 85.38299230074065 - type: euclidean_pearson value: 84.64715256244595 - type: euclidean_spearman value: 85.49112229604047 - type: manhattan_pearson value: 84.60814346792462 - type: manhattan_spearman value: 85.44886026766822 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.99292526370614 - type: cos_sim_spearman value: 85.58139465695983 - type: euclidean_pearson value: 86.51325066734084 - type: euclidean_spearman value: 85.56736418284562 - type: manhattan_pearson value: 86.48190836601357 - type: manhattan_spearman value: 85.51616256224258 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.54124715078807 - type: cos_sim_spearman value: 65.32134275948374 - type: euclidean_pearson value: 67.09791698300816 - type: euclidean_spearman value: 65.79468982468465 - type: manhattan_pearson value: 67.13304723693966 - type: manhattan_spearman value: 65.68439995849283 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.4231099581624 - type: cos_sim_spearman value: 85.95475815226862 - type: euclidean_pearson value: 85.00339401999706 - type: euclidean_spearman value: 85.74133081802971 - type: manhattan_pearson value: 85.00407987181666 - type: manhattan_spearman value: 85.77509596397363 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.25666719585716 - type: mrr value: 96.32769917083642 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.828 - type: map_at_10 value: 68.369 - type: map_at_100 value: 68.83399999999999 - type: map_at_1000 value: 68.856 - type: map_at_3 value: 65.38000000000001 - type: map_at_5 value: 67.06299999999999 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 69.45400000000001 - type: mrr_at_100 value: 69.785 - type: mrr_at_1000 value: 69.807 - type: mrr_at_3 value: 67 - type: mrr_at_5 value: 68.43299999999999 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 73.258 - type: ndcg_at_100 value: 75.173 - type: ndcg_at_1000 value: 75.696 - type: ndcg_at_3 value: 68.162 - type: ndcg_at_5 value: 70.53399999999999 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.087 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 57.828 - type: recall_at_10 value: 87.122 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 73.139 - type: recall_at_5 value: 79.361 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85247524752475 - type: cos_sim_ap value: 96.25640197639723 - type: cos_sim_f1 value: 92.37851662404091 - type: cos_sim_precision value: 94.55497382198953 - type: cos_sim_recall value: 90.3 - type: dot_accuracy value: 99.76138613861386 - type: dot_ap value: 93.40295864389073 - type: dot_f1 value: 87.64267990074441 - type: dot_precision value: 86.99507389162562 - type: dot_recall value: 88.3 - type: euclidean_accuracy value: 99.85049504950496 - type: euclidean_ap value: 96.24254350525462 - type: euclidean_f1 value: 92.32323232323232 - type: euclidean_precision value: 93.26530612244898 - type: euclidean_recall value: 91.4 - type: manhattan_accuracy value: 99.85346534653465 - type: manhattan_ap value: 96.2635334753325 - type: manhattan_f1 value: 92.37899073120495 - type: manhattan_precision value: 95.22292993630573 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.85346534653465 - type: max_ap value: 96.2635334753325 - type: max_f1 value: 92.37899073120495 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.83905786483794 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.031896152126436 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.551326709447146 - type: mrr value: 55.43758222986165 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.305688567308874 - type: cos_sim_spearman value: 29.27135743434515 - type: dot_pearson value: 30.336741878796563 - type: dot_spearman value: 30.513365725895937 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.245 - type: map_at_10 value: 1.92 - type: map_at_100 value: 10.519 - type: map_at_1000 value: 23.874000000000002 - type: map_at_3 value: 0.629 - type: map_at_5 value: 1.0290000000000001 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.5 - type: mrr_at_100 value: 93.5 - type: mrr_at_1000 value: 93.5 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.5 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 76.447 - type: ndcg_at_100 value: 56.516 - type: ndcg_at_1000 value: 48.583999999999996 - type: ndcg_at_3 value: 78.877 - type: ndcg_at_5 value: 79.174 - type: precision_at_1 value: 88 - type: precision_at_10 value: 80.60000000000001 - type: precision_at_100 value: 57.64 - type: precision_at_1000 value: 21.227999999999998 - type: precision_at_3 value: 82 - type: precision_at_5 value: 83.6 - type: recall_at_1 value: 0.245 - type: recall_at_10 value: 2.128 - type: recall_at_100 value: 13.767 - type: recall_at_1000 value: 44.958 - type: recall_at_3 value: 0.654 - type: recall_at_5 value: 1.111 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.5170000000000003 - type: map_at_10 value: 10.915 - type: map_at_100 value: 17.535 - type: map_at_1000 value: 19.042 - type: map_at_3 value: 5.689 - type: map_at_5 value: 7.837 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 49.547999999999995 - type: mrr_at_100 value: 50.653000000000006 - type: mrr_at_1000 value: 50.653000000000006 - type: mrr_at_3 value: 44.558 - type: mrr_at_5 value: 48.333 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 26.543 - type: ndcg_at_100 value: 38.946 - type: ndcg_at_1000 value: 49.406 - type: ndcg_at_3 value: 29.903000000000002 - type: ndcg_at_5 value: 29.231 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.265 - type: precision_at_100 value: 8.102 - type: precision_at_1000 value: 1.5 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.5170000000000003 - type: recall_at_10 value: 16.88 - type: recall_at_100 value: 49.381 - type: recall_at_1000 value: 81.23899999999999 - type: recall_at_3 value: 6.965000000000001 - type: recall_at_5 value: 10.847999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.5942 - type: ap value: 13.92074156956546 - type: f1 value: 54.671999698839066 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.39728353140916 - type: f1 value: 59.68980496759517 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.11181870104935 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.46957143708649 - type: cos_sim_ap value: 76.16120197845457 - type: cos_sim_f1 value: 69.69919295671315 - type: cos_sim_precision value: 64.94986326344576 - type: cos_sim_recall value: 75.19788918205805 - type: dot_accuracy value: 83.0780234845324 - type: dot_ap value: 64.21717343541934 - type: dot_f1 value: 59.48375497624245 - type: dot_precision value: 57.94345759319489 - type: dot_recall value: 61.108179419525065 - type: euclidean_accuracy value: 86.6543482148179 - type: euclidean_ap value: 76.4527555010203 - type: euclidean_f1 value: 70.10156056477584 - type: euclidean_precision value: 66.05975723622782 - type: euclidean_recall value: 74.67018469656992 - type: manhattan_accuracy value: 86.66030875603504 - type: manhattan_ap value: 76.40304567255436 - type: manhattan_f1 value: 70.05275426328058 - type: manhattan_precision value: 65.4666360926393 - type: manhattan_recall value: 75.32981530343008 - type: max_accuracy value: 86.66030875603504 - type: max_ap value: 76.4527555010203 - type: max_f1 value: 70.10156056477584 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.42123646524624 - type: cos_sim_ap value: 85.15431437761646 - type: cos_sim_f1 value: 76.98069301530742 - type: cos_sim_precision value: 72.9314502239063 - type: cos_sim_recall value: 81.50600554357868 - type: dot_accuracy value: 86.70974502270346 - type: dot_ap value: 80.77621563599457 - type: dot_f1 value: 73.87058697285117 - type: dot_precision value: 68.98256396552877 - type: dot_recall value: 79.50415768401602 - type: euclidean_accuracy value: 88.46392672798541 - type: euclidean_ap value: 85.20370297495491 - type: euclidean_f1 value: 77.01372369624886 - type: euclidean_precision value: 73.39052800446397 - type: euclidean_recall value: 81.01324299353249 - type: manhattan_accuracy value: 88.43481973066325 - type: manhattan_ap value: 85.16318289864545 - type: manhattan_f1 value: 76.90884877182597 - type: manhattan_precision value: 74.01737396753062 - type: manhattan_recall value: 80.03541730828458 - type: max_accuracy value: 88.46392672798541 - type: max_ap value: 85.20370297495491 - type: max_f1 value: 77.01372369624886 license: mit language: - en --- **Recommend switching to newest BAAI/bge-base-en-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
22
- "model_explanation_gemini": "\"BAAI_bge-base-en is a versatile model excelling in classification, retrieval, clustering, reranking, and semantic textual similarity (STS) tasks, as demonstrated by its performance across multiple MTEB benchmarks.\"\n\nModel Features: \n- Classification (e.g., Amazon reviews, Banking77) \n- Retrieval (e.g., ArguAna, CQADupstack) \n- Clustering (e.g., Arxiv/Biorxiv papers) \n- Reranking (e"
 
 
 
 
 
 
23
  }
 
19
  "region:us"
20
  ],
21
  "description": "--- tags: - mteb model-index: - name: bge-base-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.73134328358209 - type: ap value: 38.97277232632892 - type: f1 value: 69.81740361139785 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.56522500000001 - type: ap value: 88.88821771869553 - type: f1 value: 92.54817512659696 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.91 - type: f1 value: 46.28536394320311 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.834 - type: map_at_10 value: 53.564 - type: map_at_100 value: 54.230000000000004 - type: map_at_1000 value: 54.235 - type: map_at_3 value: 49.49 - type: map_at_5 value: 51.784 - type: mrr_at_1 value: 39.26 - type: mrr_at_10 value: 53.744 - type: mrr_at_100 value: 54.410000000000004 - type: mrr_at_1000 value: 54.415 - type: mrr_at_3 value: 49.656 - type: mrr_at_5 value: 52.018 - type: ndcg_at_1 value: 38.834 - type: ndcg_at_10 value: 61.487 - type: ndcg_at_100 value: 64.303 - type: ndcg_at_1000 value: 64.408 - type: ndcg_at_3 value: 53.116 - type: ndcg_at_5 value: 57.248 - type: precision_at_1 value: 38.834 - type: precision_at_10 value: 8.663 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.218999999999998 - type: precision_at_5 value: 14.737 - type: recall_at_1 value: 38.834 - type: recall_at_10 value: 86.629 - type: recall_at_100 value: 98.86200000000001 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 63.656 - type: recall_at_5 value: 73.68400000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.88475477433035 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.85053138403176 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.23221013208242 - type: mrr value: 74.64857318735436 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.4403443247284 - type: cos_sim_spearman value: 85.5326718115169 - type: euclidean_pearson value: 86.0114007449595 - type: euclidean_spearman value: 86.05979225604875 - type: manhattan_pearson value: 86.05423806568598 - type: manhattan_spearman value: 86.02485170086835 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.44480519480518 - type: f1 value: 86.41301900941988 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.17547250880036 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.74514172687293 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.096000000000004 - type: map_at_10 value: 43.345 - type: map_at_100 value: 44.73 - type: map_at_1000 value: 44.85 - type: map_at_3 value: 39.956 - type: map_at_5 value: 41.727 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.742000000000004 - type: mrr_at_100 value: 49.474000000000004 - type: mrr_at_1000 value: 49.513 - type: mrr_at_3 value: 46.161 - type: mrr_at_5 value: 47.721000000000004 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.464999999999996 - type: ndcg_at_100 value: 54.632000000000005 - type: ndcg_at_1000 value: 56.52 - type: ndcg_at_3 value: 44.687 - type: ndcg_at_5 value: 46.814 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.471 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.079 - type: recall_at_1 value: 32.096000000000004 - type: recall_at_10 value: 60.99099999999999 - type: recall_at_100 value: 83.075 - type: recall_at_1000 value: 95.178 - type: recall_at_3 value: 47.009 - type: recall_at_5 value: 53.348 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.588 - type: map_at_10 value: 42.251 - type: map_at_100 value: 43.478 - type: map_at_1000 value: 43.617 - type: map_at_3 value: 39.381 - type: map_at_5 value: 41.141 - type: mrr_at_1 value: 41.21 - type: mrr_at_10 value: 48.765 - type: mrr_at_100 value: 49.403000000000006 - type: mrr_at_1000 value: 49.451 - type: mrr_at_3 value: 46.73 - type: mrr_at_5 value: 47.965999999999994 - type: ndcg_at_1 value: 41.21 - type: ndcg_at_10 value: 47.704 - type: ndcg_at_100 value: 51.916 - type: ndcg_at_1000 value: 54.013999999999996 - type: ndcg_at_3 value: 44.007000000000005 - type: ndcg_at_5 value: 45.936 - type: precision_at_1 value: 41.21 - type: precision_at_10 value: 8.885 - type: precision_at_100 value: 1.409 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 21.274 - type: precision_at_5 value: 15.045 - type: recall_at_1 value: 32.588 - type: recall_at_10 value: 56.333 - type: recall_at_100 value: 74.251 - type: recall_at_1000 value: 87.518 - type: recall_at_3 value: 44.962 - type: recall_at_5 value: 50.609 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.308 - type: map_at_10 value: 53.12 - type: map_at_100 value: 54.123 - type: map_at_1000 value: 54.173 - type: map_at_3 value: 50.017999999999994 - type: map_at_5 value: 51.902 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 56.531 - type: mrr_at_100 value: 57.19800000000001 - type: mrr_at_1000 value: 57.225 - type: mrr_at_3 value: 54.368 - type: mrr_at_5 value: 55.713 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 58.811 - type: ndcg_at_100 value: 62.834 - type: ndcg_at_1000 value: 63.849999999999994 - type: ndcg_at_3 value: 53.88699999999999 - type: ndcg_at_5 value: 56.477999999999994 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.398 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 24.221999999999998 - type: precision_at_5 value: 16.539 - type: recall_at_1 value: 40.308 - type: recall_at_10 value: 72.146 - type: recall_at_100 value: 89.60900000000001 - type: recall_at_1000 value: 96.733 - type: recall_at_3 value: 58.91499999999999 - type: recall_at_5 value: 65.34299999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.383000000000003 - type: map_at_10 value: 35.802 - type: map_at_100 value: 36.756 - type: map_at_1000 value: 36.826 - type: map_at_3 value: 32.923 - type: map_at_5 value: 34.577999999999996 - type: mrr_at_1 value: 29.604999999999997 - type: mrr_at_10 value: 37.918 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.786 - type: mrr_at_3 value: 35.198 - type: mrr_at_5 value: 36.808 - type: ndcg_at_1 value: 29.604999999999997 - type: ndcg_at_10 value: 40.836 - type: ndcg_at_100 value: 45.622 - type: ndcg_at_1000 value: 47.427 - type: ndcg_at_3 value: 35.208 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 29.604999999999997 - type: precision_at_10 value: 6.226 - type: precision_at_100 value: 0.9079999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 14.463000000000001 - type: precision_at_5 value: 10.35 - type: recall_at_1 value: 27.383000000000003 - type: recall_at_10 value: 54.434000000000005 - type: recall_at_100 value: 76.632 - type: recall_at_1000 value: 90.25 - type: recall_at_3 value: 39.275 - type: recall_at_5 value: 46.225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.885 - type: map_at_10 value: 25.724000000000004 - type: map_at_100 value: 26.992 - type: map_at_1000 value: 27.107999999999997 - type: map_at_3 value: 23.04 - type: map_at_5 value: 24.529 - type: mrr_at_1 value: 22.264 - type: mrr_at_10 value: 30.548 - type: mrr_at_100 value: 31.593 - type: mrr_at_1000 value: 31.657999999999998 - type: mrr_at_3 value: 27.756999999999998 - type: mrr_at_5 value: 29.398999999999997 - type: ndcg_at_1 value: 22.264 - type: ndcg_at_10 value: 30.902 - type: ndcg_at_100 value: 36.918 - type: ndcg_at_1000 value: 39.735 - type: ndcg_at_3 value: 25.915 - type: ndcg_at_5 value: 28.255999999999997 - type: precision_at_1 value: 22.264 - type: precision_at_10 value: 5.634 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 12.396 - type: precision_at_5 value: 9.055 - type: recall_at_1 value: 17.885 - type: recall_at_10 value: 42.237 - type: recall_at_100 value: 68.489 - type: recall_at_1000 value: 88.721 - type: recall_at_3 value: 28.283 - type: recall_at_5 value: 34.300000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.737000000000002 - type: map_at_10 value: 39.757 - type: map_at_100 value: 40.992 - type: map_at_1000 value: 41.102 - type: map_at_3 value: 36.612 - type: map_at_5 value: 38.413000000000004 - type: mrr_at_1 value: 35.804 - type: mrr_at_10 value: 45.178000000000004 - type: mrr_at_100 value: 45.975 - type: mrr_at_1000 value: 46.021 - type: mrr_at_3 value: 42.541000000000004 - type: mrr_at_5 value: 44.167 - type: ndcg_at_1 value: 35.804 - type: ndcg_at_10 value: 45.608 - type: ndcg_at_100 value: 50.746 - type: ndcg_at_1000 value: 52.839999999999996 - type: ndcg_at_3 value: 40.52 - type: ndcg_at_5 value: 43.051 - type: precision_at_1 value: 35.804 - type: precision_at_10 value: 8.104 - type: precision_at_100 value: 1.256 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.121 - type: precision_at_5 value: 13.532 - type: recall_at_1 value: 29.737000000000002 - type: recall_at_10 value: 57.66 - type: recall_at_100 value: 79.121 - type: recall_at_1000 value: 93.023 - type: recall_at_3 value: 43.13 - type: recall_at_5 value: 49.836000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.299 - type: map_at_10 value: 35.617 - type: map_at_100 value: 36.972 - type: map_at_1000 value: 37.096000000000004 - type: map_at_3 value: 32.653999999999996 - type: map_at_5 value: 34.363 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 41.423 - type: mrr_at_100 value: 42.333999999999996 - type: mrr_at_1000 value: 42.398 - type: mrr_at_3 value: 39.193 - type: mrr_at_5 value: 40.426 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 41.271 - type: ndcg_at_100 value: 46.843 - type: ndcg_at_1000 value: 49.366 - type: ndcg_at_3 value: 36.735 - type: ndcg_at_5 value: 38.775999999999996 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.580000000000001 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.541999999999998 - type: precision_at_5 value: 12.443 - type: recall_at_1 value: 26.299 - type: recall_at_10 value: 52.256 - type: recall_at_100 value: 75.919 - type: recall_at_1000 value: 93.185 - type: recall_at_3 value: 39.271 - type: recall_at_5 value: 44.901 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.05741666666667 - type: map_at_10 value: 36.086416666666665 - type: map_at_100 value: 37.26916666666667 - type: map_at_1000 value: 37.38191666666666 - type: map_at_3 value: 33.34225 - type: map_at_5 value: 34.86425 - type: mrr_at_1 value: 32.06008333333333 - type: mrr_at_10 value: 40.36658333333333 - type: mrr_at_100 value: 41.206500000000005 - type: mrr_at_1000 value: 41.261083333333325 - type: mrr_at_3 value: 38.01208333333334 - type: mrr_at_5 value: 39.36858333333333 - type: ndcg_at_1 value: 32.06008333333333 - type: ndcg_at_10 value: 41.3535 - type: ndcg_at_100 value: 46.42066666666666 - type: ndcg_at_1000 value: 48.655166666666666 - type: ndcg_at_3 value: 36.78041666666667 - type: ndcg_at_5 value: 38.91783333333334 - type: precision_at_1 value: 32.06008333333333 - type: precision_at_10 value: 7.169833333333332 - type: precision_at_100 value: 1.1395 - type: precision_at_1000 value: 0.15158333333333332 - type: precision_at_3 value: 16.852 - type: precision_at_5 value: 11.8645 - type: recall_at_1 value: 27.05741666666667 - type: recall_at_10 value: 52.64491666666666 - type: recall_at_100 value: 74.99791666666667 - type: recall_at_1000 value: 90.50524999999999 - type: recall_at_3 value: 39.684000000000005 - type: recall_at_5 value: 45.37225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.607999999999997 - type: map_at_10 value: 32.28 - type: map_at_100 value: 33.261 - type: map_at_1000 value: 33.346 - type: map_at_3 value: 30.514999999999997 - type: map_at_5 value: 31.415 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.384 - type: mrr_at_100 value: 36.24 - type: mrr_at_1000 value: 36.299 - type: mrr_at_3 value: 33.717000000000006 - type: mrr_at_5 value: 34.507 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 36.248000000000005 - type: ndcg_at_100 value: 41.034 - type: ndcg_at_1000 value: 43.35 - type: ndcg_at_3 value: 32.987 - type: ndcg_at_5 value: 34.333999999999996 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.506 - type: precision_at_100 value: 0.853 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.11 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 25.607999999999997 - type: recall_at_10 value: 45.344 - type: recall_at_100 value: 67.132 - type: recall_at_1000 value: 84.676 - type: recall_at_3 value: 36.02 - type: recall_at_5 value: 39.613 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.44 - type: map_at_10 value: 25.651000000000003 - type: map_at_100 value: 26.735 - type: map_at_1000 value: 26.86 - type: map_at_3 value: 23.409 - type: map_at_5 value: 24.604 - type: mrr_at_1 value: 22.195 - type: mrr_at_10 value: 29.482000000000003 - type: mrr_at_100 value: 30.395 - type: mrr_at_1000 value: 30.471999999999998 - type: mrr_at_3 value: 27.409 - type: mrr_at_5 value: 28.553 - type: ndcg_at_1 value: 22.195 - type: ndcg_at_10 value: 30.242 - type: ndcg_at_100 value: 35.397 - type: ndcg_at_1000 value: 38.287 - type: ndcg_at_3 value: 26.201 - type: ndcg_at_5 value: 28.008 - type: precision_at_1 value: 22.195 - type: precision_at_10 value: 5.372 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 12.228 - type: precision_at_5 value: 8.727 - type: recall_at_1 value: 18.44 - type: recall_at_10 value: 40.325 - type: recall_at_100 value: 63.504000000000005 - type: recall_at_1000 value: 83.909 - type: recall_at_3 value: 28.925 - type: recall_at_5 value: 33.641 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.535999999999998 - type: map_at_10 value: 35.358000000000004 - type: map_at_100 value: 36.498999999999995 - type: map_at_1000 value: 36.597 - type: map_at_3 value: 32.598 - type: map_at_5 value: 34.185 - type: mrr_at_1 value: 31.25 - type: mrr_at_10 value: 39.593 - type: mrr_at_100 value: 40.443 - type: mrr_at_1000 value: 40.498 - type: mrr_at_3 value: 37.018 - type: mrr_at_5 value: 38.492 - type: ndcg_at_1 value: 31.25 - type: ndcg_at_10 value: 40.71 - type: ndcg_at_100 value: 46.079 - type: ndcg_at_1000 value: 48.287 - type: ndcg_at_3 value: 35.667 - type: ndcg_at_5 value: 38.080000000000005 - type: precision_at_1 value: 31.25 - type: precision_at_10 value: 6.847 - type: precision_at_100 value: 1.079 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 16.262 - type: precision_at_5 value: 11.455 - type: recall_at_1 value: 26.535999999999998 - type: recall_at_10 value: 52.92099999999999 - type: recall_at_100 value: 76.669 - type: recall_at_1000 value: 92.096 - type: recall_at_3 value: 38.956 - type: recall_at_5 value: 45.239000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.691 - type: map_at_10 value: 33.417 - type: map_at_100 value: 35.036 - type: map_at_1000 value: 35.251 - type: map_at_3 value: 30.646 - type: map_at_5 value: 32.177 - type: mrr_at_1 value: 30.04 - type: mrr_at_10 value: 37.905 - type: mrr_at_100 value: 38.929 - type: mrr_at_1000 value: 38.983000000000004 - type: mrr_at_3 value: 35.276999999999994 - type: mrr_at_5 value: 36.897000000000006 - type: ndcg_at_1 value: 30.04 - type: ndcg_at_10 value: 39.037 - type: ndcg_at_100 value: 44.944 - type: ndcg_at_1000 value: 47.644 - type: ndcg_at_3 value: 34.833999999999996 - type: ndcg_at_5 value: 36.83 - type: precision_at_1 value: 30.04 - type: precision_at_10 value: 7.4510000000000005 - type: precision_at_100 value: 1.492 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 11.897 - type: recall_at_1 value: 24.691 - type: recall_at_10 value: 49.303999999999995 - type: recall_at_100 value: 76.20400000000001 - type: recall_at_1000 value: 93.30000000000001 - type: recall_at_3 value: 36.594 - type: recall_at_5 value: 42.41 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.118 - type: map_at_10 value: 30.714999999999996 - type: map_at_100 value: 31.656000000000002 - type: map_at_1000 value: 31.757 - type: map_at_3 value: 28.355000000000004 - type: map_at_5 value: 29.337000000000003 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 32.93 - type: mrr_at_100 value: 33.762 - type: mrr_at_1000 value: 33.829 - type: mrr_at_3 value: 30.775999999999996 - type: mrr_at_5 value: 31.774 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.408 - type: ndcg_at_100 value: 40.083 - type: ndcg_at_1000 value: 42.542 - type: ndcg_at_3 value: 30.717 - type: ndcg_at_5 value: 32.385000000000005 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.843 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.001 - type: precision_at_5 value: 8.834999999999999 - type: recall_at_1 value: 23.118 - type: recall_at_10 value: 47.788000000000004 - type: recall_at_100 value: 69.37 - type: recall_at_1000 value: 87.47399999999999 - type: recall_at_3 value: 34.868 - type: recall_at_5 value: 39.001999999999995 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 14.288 - type: map_at_10 value: 23.256 - type: map_at_100 value: 25.115 - type: map_at_1000 value: 25.319000000000003 - type: map_at_3 value: 20.005 - type: map_at_5 value: 21.529999999999998 - type: mrr_at_1 value: 31.401 - type: mrr_at_10 value: 42.251 - type: mrr_at_100 value: 43.236999999999995 - type: mrr_at_1000 value: 43.272 - type: mrr_at_3 value: 39.164 - type: mrr_at_5 value: 40.881 - type: ndcg_at_1 value: 31.401 - type: ndcg_at_10 value: 31.615 - type: ndcg_at_100 value: 38.982 - type: ndcg_at_1000 value: 42.496 - type: ndcg_at_3 value: 26.608999999999998 - type: ndcg_at_5 value: 28.048000000000002 - type: precision_at_1 value: 31.401 - type: precision_at_10 value: 9.536999999999999 - type: precision_at_100 value: 1.763 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 19.153000000000002 - type: precision_at_5 value: 14.228 - type: recall_at_1 value: 14.288 - type: recall_at_10 value: 36.717 - type: recall_at_100 value: 61.9 - type: recall_at_1000 value: 81.676 - type: recall_at_3 value: 24.203 - type: recall_at_5 value: 28.793999999999997 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.019 - type: map_at_10 value: 19.963 - type: map_at_100 value: 28.834 - type: map_at_1000 value: 30.537999999999997 - type: map_at_3 value: 14.45 - type: map_at_5 value: 16.817999999999998 - type: mrr_at_1 value: 65.75 - type: mrr_at_10 value: 74.646 - type: mrr_at_100 value: 74.946 - type: mrr_at_1000 value: 74.95100000000001 - type: mrr_at_3 value: 72.625 - type: mrr_at_5 value: 74.012 - type: ndcg_at_1 value: 54 - type: ndcg_at_10 value: 42.014 - type: ndcg_at_100 value: 47.527 - type: ndcg_at_1000 value: 54.911 - type: ndcg_at_3 value: 46.586 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 65.75 - type: precision_at_10 value: 33.475 - type: precision_at_100 value: 11.16 - type: precision_at_1000 value: 2.145 - type: precision_at_3 value: 50.083 - type: precision_at_5 value: 42.55 - type: recall_at_1 value: 9.019 - type: recall_at_10 value: 25.558999999999997 - type: recall_at_100 value: 53.937999999999995 - type: recall_at_1000 value: 77.67399999999999 - type: recall_at_3 value: 15.456 - type: recall_at_5 value: 19.259 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.635 - type: f1 value: 47.692783881403926 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 76.893 - type: map_at_10 value: 84.897 - type: map_at_100 value: 85.122 - type: map_at_1000 value: 85.135 - type: map_at_3 value: 83.88 - type: map_at_5 value: 84.565 - type: mrr_at_1 value: 83.003 - type: mrr_at_10 value: 89.506 - type: mrr_at_100 value: 89.574 - type: mrr_at_1000 value: 89.575 - type: mrr_at_3 value: 88.991 - type: mrr_at_5 value: 89.349 - type: ndcg_at_1 value: 83.003 - type: ndcg_at_10 value: 88.351 - type: ndcg_at_100 value: 89.128 - type: ndcg_at_1000 value: 89.34100000000001 - type: ndcg_at_3 value: 86.92 - type: ndcg_at_5 value: 87.78200000000001 - type: precision_at_1 value: 83.003 - type: precision_at_10 value: 10.517999999999999 - type: precision_at_100 value: 1.115 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.062999999999995 - type: precision_at_5 value: 20.498 - type: recall_at_1 value: 76.893 - type: recall_at_10 value: 94.374 - type: recall_at_100 value: 97.409 - type: recall_at_1000 value: 98.687 - type: recall_at_3 value: 90.513 - type: recall_at_5 value: 92.709 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.829 - type: map_at_10 value: 32.86 - type: map_at_100 value: 34.838 - type: map_at_1000 value: 35.006 - type: map_at_3 value: 28.597 - type: map_at_5 value: 31.056 - type: mrr_at_1 value: 41.358 - type: mrr_at_10 value: 49.542 - type: mrr_at_100 value: 50.29900000000001 - type: mrr_at_1000 value: 50.334999999999994 - type: mrr_at_3 value: 46.579 - type: mrr_at_5 value: 48.408 - type: ndcg_at_1 value: 41.358 - type: ndcg_at_10 value: 40.758 - type: ndcg_at_100 value: 47.799 - type: ndcg_at_1000 value: 50.589 - type: ndcg_at_3 value: 36.695 - type: ndcg_at_5 value: 38.193 - type: precision_at_1 value: 41.358 - type: precision_at_10 value: 11.142000000000001 - type: precision_at_100 value: 1.8350000000000002 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 24.023 - type: precision_at_5 value: 17.963 - type: recall_at_1 value: 20.829 - type: recall_at_10 value: 47.467999999999996 - type: recall_at_100 value: 73.593 - type: recall_at_1000 value: 90.122 - type: recall_at_3 value: 32.74 - type: recall_at_5 value: 39.608 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.324 - type: map_at_10 value: 64.183 - type: map_at_100 value: 65.037 - type: map_at_1000 value: 65.094 - type: map_at_3 value: 60.663 - type: map_at_5 value: 62.951 - type: mrr_at_1 value: 80.648 - type: mrr_at_10 value: 86.005 - type: mrr_at_100 value: 86.157 - type: mrr_at_1000 value: 86.162 - type: mrr_at_3 value: 85.116 - type: mrr_at_5 value: 85.703 - type: ndcg_at_1 value: 80.648 - type: ndcg_at_10 value: 72.351 - type: ndcg_at_100 value: 75.279 - type: ndcg_at_1000 value: 76.357 - type: ndcg_at_3 value: 67.484 - type: ndcg_at_5 value: 70.31500000000001 - type: precision_at_1 value: 80.648 - type: precision_at_10 value: 15.103 - type: precision_at_100 value: 1.7399999999999998 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.232 - type: precision_at_5 value: 28.165000000000003 - type: recall_at_1 value: 40.324 - type: recall_at_10 value: 75.517 - type: recall_at_100 value: 86.982 - type: recall_at_1000 value: 94.072 - type: recall_at_3 value: 64.848 - type: recall_at_5 value: 70.41199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.4 - type: ap value: 87.4422032289312 - type: f1 value: 91.39249564302281 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.03 - type: map_at_10 value: 34.402 - type: map_at_100 value: 35.599 - type: map_at_1000 value: 35.648 - type: map_at_3 value: 30.603 - type: map_at_5 value: 32.889 - type: mrr_at_1 value: 22.679 - type: mrr_at_10 value: 35.021 - type: mrr_at_100 value: 36.162 - type: mrr_at_1000 value: 36.205 - type: mrr_at_3 value: 31.319999999999997 - type: mrr_at_5 value: 33.562 - type: ndcg_at_1 value: 22.692999999999998 - type: ndcg_at_10 value: 41.258 - type: ndcg_at_100 value: 46.967 - type: ndcg_at_1000 value: 48.175000000000004 - type: ndcg_at_3 value: 33.611000000000004 - type: ndcg_at_5 value: 37.675 - type: precision_at_1 value: 22.692999999999998 - type: precision_at_10 value: 6.5089999999999995 - type: precision_at_100 value: 0.936 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.413 - type: precision_at_5 value: 10.702 - type: recall_at_1 value: 22.03 - type: recall_at_10 value: 62.248000000000005 - type: recall_at_100 value: 88.524 - type: recall_at_1000 value: 97.714 - type: recall_at_3 value: 41.617 - type: recall_at_5 value: 51.359 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.36844505243957 - type: f1 value: 94.12408743818202 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.43410852713177 - type: f1 value: 58.501855709435624 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.04909213180902 - type: f1 value: 74.1800860395823 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.76126429051781 - type: f1 value: 79.85705217473232 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.70119520292863 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.33544316467486 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.75499243990726 - type: mrr value: 31.70602251821063 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.451999999999999 - type: map_at_10 value: 13.918 - type: map_at_100 value: 17.316000000000003 - type: map_at_1000 value: 18.747 - type: map_at_3 value: 10.471 - type: map_at_5 value: 12.104 - type: mrr_at_1 value: 46.749 - type: mrr_at_10 value: 55.717000000000006 - type: mrr_at_100 value: 56.249 - type: mrr_at_1000 value: 56.288000000000004 - type: mrr_at_3 value: 53.818 - type: mrr_at_5 value: 55.103 - type: ndcg_at_1 value: 45.201 - type: ndcg_at_10 value: 35.539 - type: ndcg_at_100 value: 32.586 - type: ndcg_at_1000 value: 41.486000000000004 - type: ndcg_at_3 value: 41.174 - type: ndcg_at_5 value: 38.939 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 25.944 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.076 - type: precision_at_3 value: 38.7 - type: precision_at_5 value: 33.56 - type: recall_at_1 value: 6.451999999999999 - type: recall_at_10 value: 17.302 - type: recall_at_100 value: 32.14 - type: recall_at_1000 value: 64.12 - type: recall_at_3 value: 11.219 - type: recall_at_5 value: 13.993 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 32.037 - type: map_at_10 value: 46.565 - type: map_at_100 value: 47.606 - type: map_at_1000 value: 47.636 - type: map_at_3 value: 42.459 - type: map_at_5 value: 44.762 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 49.291000000000004 - type: mrr_at_100 value: 50.059 - type: mrr_at_1000 value: 50.078 - type: mrr_at_3 value: 45.829 - type: mrr_at_5 value: 47.797 - type: ndcg_at_1 value: 36.153 - type: ndcg_at_10 value: 53.983000000000004 - type: ndcg_at_100 value: 58.347 - type: ndcg_at_1000 value: 59.058 - type: ndcg_at_3 value: 46.198 - type: ndcg_at_5 value: 50.022 - type: precision_at_1 value: 36.153 - type: precision_at_10 value: 8.763 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.751 - type: precision_at_5 value: 14.646999999999998 - type: recall_at_1 value: 32.037 - type: recall_at_10 value: 74.008 - type: recall_at_100 value: 92.893 - type: recall_at_1000 value: 98.16 - type: recall_at_3 value: 53.705999999999996 - type: recall_at_5 value: 62.495 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.152 - type: map_at_10 value: 85.104 - type: map_at_100 value: 85.745 - type: map_at_1000 value: 85.761 - type: map_at_3 value: 82.175 - type: map_at_5 value: 84.066 - type: mrr_at_1 value: 82.03 - type: mrr_at_10 value: 88.115 - type: mrr_at_100 value: 88.21 - type: mrr_at_1000 value: 88.211 - type: mrr_at_3 value: 87.19200000000001 - type: mrr_at_5 value: 87.85 - type: ndcg_at_1 value: 82.03 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.96300000000001 - type: ndcg_at_1000 value: 90.056 - type: ndcg_at_3 value: 86.051 - type: ndcg_at_5 value: 87.63499999999999 - type: precision_at_1 value: 82.03 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.627 - type: precision_at_5 value: 24.784 - type: recall_at_1 value: 71.152 - type: recall_at_10 value: 95.649 - type: recall_at_100 value: 99.58200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.767 - type: recall_at_5 value: 92.233 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.48713646277477 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.394940772438545 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.043 - type: map_at_10 value: 12.949 - type: map_at_100 value: 15.146 - type: map_at_1000 value: 15.495000000000001 - type: map_at_3 value: 9.333 - type: map_at_5 value: 11.312999999999999 - type: mrr_at_1 value: 24.9 - type: mrr_at_10 value: 35.958 - type: mrr_at_100 value: 37.152 - type: mrr_at_1000 value: 37.201 - type: mrr_at_3 value: 32.667 - type: mrr_at_5 value: 34.567 - type: ndcg_at_1 value: 24.9 - type: ndcg_at_10 value: 21.298000000000002 - type: ndcg_at_100 value: 29.849999999999998 - type: ndcg_at_1000 value: 35.506 - type: ndcg_at_3 value: 20.548 - type: ndcg_at_5 value: 18.064 - type: precision_at_1 value: 24.9 - type: precision_at_10 value: 10.9 - type: precision_at_100 value: 2.331 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 15.939999999999998 - type: recall_at_1 value: 5.043 - type: recall_at_10 value: 22.092 - type: recall_at_100 value: 47.323 - type: recall_at_1000 value: 74.553 - type: recall_at_3 value: 11.728 - type: recall_at_5 value: 16.188 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.7007085938325 - type: cos_sim_spearman value: 80.0171084446234 - type: euclidean_pearson value: 81.28133218355893 - type: euclidean_spearman value: 79.99291731740131 - type: manhattan_pearson value: 81.22926922327846 - type: manhattan_spearman value: 79.94444878127038 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.7411883252923 - type: cos_sim_spearman value: 77.93462937801245 - type: euclidean_pearson value: 83.00858563882404 - type: euclidean_spearman value: 77.82717362433257 - type: manhattan_pearson value: 82.92887645790769 - type: manhattan_spearman value: 77.78807488222115 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.04222459361023 - type: cos_sim_spearman value: 83.85931509330395 - type: euclidean_pearson value: 83.26916063876055 - type: euclidean_spearman value: 83.98621985648353 - type: manhattan_pearson value: 83.14935679184327 - type: manhattan_spearman value: 83.87938828586304 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.41136639535318 - type: cos_sim_spearman value: 81.51200091040481 - type: euclidean_pearson value: 81.45382456114775 - type: euclidean_spearman value: 81.46201181707931 - type: manhattan_pearson value: 81.37243088439584 - type: manhattan_spearman value: 81.39828421893426 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.71942451732227 - type: cos_sim_spearman value: 87.33044482064973 - type: euclidean_pearson value: 86.58580899365178 - type: euclidean_spearman value: 87.09206723832895 - type: manhattan_pearson value: 86.47460784157013 - type: manhattan_spearman value: 86.98367656583076 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.55868078863449 - type: cos_sim_spearman value: 85.38299230074065 - type: euclidean_pearson value: 84.64715256244595 - type: euclidean_spearman value: 85.49112229604047 - type: manhattan_pearson value: 84.60814346792462 - type: manhattan_spearman value: 85.44886026766822 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.99292526370614 - type: cos_sim_spearman value: 85.58139465695983 - type: euclidean_pearson value: 86.51325066734084 - type: euclidean_spearman value: 85.56736418284562 - type: manhattan_pearson value: 86.48190836601357 - type: manhattan_spearman value: 85.51616256224258 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.54124715078807 - type: cos_sim_spearman value: 65.32134275948374 - type: euclidean_pearson value: 67.09791698300816 - type: euclidean_spearman value: 65.79468982468465 - type: manhattan_pearson value: 67.13304723693966 - type: manhattan_spearman value: 65.68439995849283 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.4231099581624 - type: cos_sim_spearman value: 85.95475815226862 - type: euclidean_pearson value: 85.00339401999706 - type: euclidean_spearman value: 85.74133081802971 - type: manhattan_pearson value: 85.00407987181666 - type: manhattan_spearman value: 85.77509596397363 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.25666719585716 - type: mrr value: 96.32769917083642 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.828 - type: map_at_10 value: 68.369 - type: map_at_100 value: 68.83399999999999 - type: map_at_1000 value: 68.856 - type: map_at_3 value: 65.38000000000001 - type: map_at_5 value: 67.06299999999999 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 69.45400000000001 - type: mrr_at_100 value: 69.785 - type: mrr_at_1000 value: 69.807 - type: mrr_at_3 value: 67 - type: mrr_at_5 value: 68.43299999999999 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 73.258 - type: ndcg_at_100 value: 75.173 - type: ndcg_at_1000 value: 75.696 - type: ndcg_at_3 value: 68.162 - type: ndcg_at_5 value: 70.53399999999999 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.087 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 57.828 - type: recall_at_10 value: 87.122 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 73.139 - type: recall_at_5 value: 79.361 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85247524752475 - type: cos_sim_ap value: 96.25640197639723 - type: cos_sim_f1 value: 92.37851662404091 - type: cos_sim_precision value: 94.55497382198953 - type: cos_sim_recall value: 90.3 - type: dot_accuracy value: 99.76138613861386 - type: dot_ap value: 93.40295864389073 - type: dot_f1 value: 87.64267990074441 - type: dot_precision value: 86.99507389162562 - type: dot_recall value: 88.3 - type: euclidean_accuracy value: 99.85049504950496 - type: euclidean_ap value: 96.24254350525462 - type: euclidean_f1 value: 92.32323232323232 - type: euclidean_precision value: 93.26530612244898 - type: euclidean_recall value: 91.4 - type: manhattan_accuracy value: 99.85346534653465 - type: manhattan_ap value: 96.2635334753325 - type: manhattan_f1 value: 92.37899073120495 - type: manhattan_precision value: 95.22292993630573 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.85346534653465 - type: max_ap value: 96.2635334753325 - type: max_f1 value: 92.37899073120495 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.83905786483794 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.031896152126436 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.551326709447146 - type: mrr value: 55.43758222986165 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.305688567308874 - type: cos_sim_spearman value: 29.27135743434515 - type: dot_pearson value: 30.336741878796563 - type: dot_spearman value: 30.513365725895937 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.245 - type: map_at_10 value: 1.92 - type: map_at_100 value: 10.519 - type: map_at_1000 value: 23.874000000000002 - type: map_at_3 value: 0.629 - type: map_at_5 value: 1.0290000000000001 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.5 - type: mrr_at_100 value: 93.5 - type: mrr_at_1000 value: 93.5 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.5 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 76.447 - type: ndcg_at_100 value: 56.516 - type: ndcg_at_1000 value: 48.583999999999996 - type: ndcg_at_3 value: 78.877 - type: ndcg_at_5 value: 79.174 - type: precision_at_1 value: 88 - type: precision_at_10 value: 80.60000000000001 - type: precision_at_100 value: 57.64 - type: precision_at_1000 value: 21.227999999999998 - type: precision_at_3 value: 82 - type: precision_at_5 value: 83.6 - type: recall_at_1 value: 0.245 - type: recall_at_10 value: 2.128 - type: recall_at_100 value: 13.767 - type: recall_at_1000 value: 44.958 - type: recall_at_3 value: 0.654 - type: recall_at_5 value: 1.111 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.5170000000000003 - type: map_at_10 value: 10.915 - type: map_at_100 value: 17.535 - type: map_at_1000 value: 19.042 - type: map_at_3 value: 5.689 - type: map_at_5 value: 7.837 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 49.547999999999995 - type: mrr_at_100 value: 50.653000000000006 - type: mrr_at_1000 value: 50.653000000000006 - type: mrr_at_3 value: 44.558 - type: mrr_at_5 value: 48.333 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 26.543 - type: ndcg_at_100 value: 38.946 - type: ndcg_at_1000 value: 49.406 - type: ndcg_at_3 value: 29.903000000000002 - type: ndcg_at_5 value: 29.231 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.265 - type: precision_at_100 value: 8.102 - type: precision_at_1000 value: 1.5 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.5170000000000003 - type: recall_at_10 value: 16.88 - type: recall_at_100 value: 49.381 - type: recall_at_1000 value: 81.23899999999999 - type: recall_at_3 value: 6.965000000000001 - type: recall_at_5 value: 10.847999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.5942 - type: ap value: 13.92074156956546 - type: f1 value: 54.671999698839066 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.39728353140916 - type: f1 value: 59.68980496759517 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.11181870104935 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.46957143708649 - type: cos_sim_ap value: 76.16120197845457 - type: cos_sim_f1 value: 69.69919295671315 - type: cos_sim_precision value: 64.94986326344576 - type: cos_sim_recall value: 75.19788918205805 - type: dot_accuracy value: 83.0780234845324 - type: dot_ap value: 64.21717343541934 - type: dot_f1 value: 59.48375497624245 - type: dot_precision value: 57.94345759319489 - type: dot_recall value: 61.108179419525065 - type: euclidean_accuracy value: 86.6543482148179 - type: euclidean_ap value: 76.4527555010203 - type: euclidean_f1 value: 70.10156056477584 - type: euclidean_precision value: 66.05975723622782 - type: euclidean_recall value: 74.67018469656992 - type: manhattan_accuracy value: 86.66030875603504 - type: manhattan_ap value: 76.40304567255436 - type: manhattan_f1 value: 70.05275426328058 - type: manhattan_precision value: 65.4666360926393 - type: manhattan_recall value: 75.32981530343008 - type: max_accuracy value: 86.66030875603504 - type: max_ap value: 76.4527555010203 - type: max_f1 value: 70.10156056477584 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.42123646524624 - type: cos_sim_ap value: 85.15431437761646 - type: cos_sim_f1 value: 76.98069301530742 - type: cos_sim_precision value: 72.9314502239063 - type: cos_sim_recall value: 81.50600554357868 - type: dot_accuracy value: 86.70974502270346 - type: dot_ap value: 80.77621563599457 - type: dot_f1 value: 73.87058697285117 - type: dot_precision value: 68.98256396552877 - type: dot_recall value: 79.50415768401602 - type: euclidean_accuracy value: 88.46392672798541 - type: euclidean_ap value: 85.20370297495491 - type: euclidean_f1 value: 77.01372369624886 - type: euclidean_precision value: 73.39052800446397 - type: euclidean_recall value: 81.01324299353249 - type: manhattan_accuracy value: 88.43481973066325 - type: manhattan_ap value: 85.16318289864545 - type: manhattan_f1 value: 76.90884877182597 - type: manhattan_precision value: 74.01737396753062 - type: manhattan_recall value: 80.03541730828458 - type: max_accuracy value: 88.46392672798541 - type: max_ap value: 85.20370297495491 - type: max_f1 value: 77.01372369624886 license: mit language: - en --- **Recommend switching to newest BAAI/bge-base-en-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
22
+ "model_explanation_gemini": "\"BAAI_bge-base-en is a versatile model excelling in classification, retrieval, clustering, reranking, and semantic textual similarity (STS) tasks, as demonstrated by its performance across multiple MTEB benchmarks.\"\n\nModel Features: \n- Classification (e.g., Amazon reviews, Banking77) \n- Retrieval (e.g., ArguAna, CQADupstack) \n- Clustering (e.g., Arxiv/Biorxiv papers) \n- Reranking (e",
23
+ "release_year": "2023",
24
+ "parameter_count": null,
25
+ "is_fine_tuned": false,
26
+ "category": "Embedding",
27
+ "model_family": "BERT",
28
+ "api_enhanced": true
29
  }
model_data_json/BAAI_bge-base-zh-v1.5.json CHANGED
@@ -18,5 +18,11 @@
18
  "region:us"
19
  ],
20
  "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: mit language: - zh --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
21
- "model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable retrieval, classification, clustering, or semantic search tasks. \n\n**Features**: \n- Maps text to low-dimensional dense vectors \n- Optimized for Chinese language \n- Version 1.5 improves similarity distribution and retrieval ability \n- Supports fine-tuning and inference \n\n**Comparison**: \nBGE-base-zh-v1.5 is a Chinese embedding model that outperforms smaller models (e.g., bge-small-zh-v1.5) while"
 
 
 
 
 
 
22
  }
 
18
  "region:us"
19
  ],
20
  "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: mit language: - zh --- <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
21
+ "model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable retrieval, classification, clustering, or semantic search tasks. \n\n**Features**: \n- Maps text to low-dimensional dense vectors \n- Optimized for Chinese language \n- Version 1.5 improves similarity distribution and retrieval ability \n- Supports fine-tuning and inference \n\n**Comparison**: \nBGE-base-zh-v1.5 is a Chinese embedding model that outperforms smaller models (e.g., bge-small-zh-v1.5) while",
22
+ "release_year": "2023",
23
+ "parameter_count": null,
24
+ "is_fine_tuned": false,
25
+ "category": "Embedding",
26
+ "model_family": "BERT",
27
+ "api_enhanced": true
28
  }
model_data_json/BAAI_bge-base-zh.json CHANGED
@@ -16,5 +16,11 @@
16
  "region:us"
17
  ],
18
  "description": "--- license: mit language: - zh --- **Recommend switching to newest BAAI/bge-base-zh-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
19
- "model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable retrieval, classification, clustering, or semantic search tasks, including use in LLM vector databases. \n\n**Features**: \n- Language: Chinese \n- Task: Text embedding for retrieval, classification, clustering, semantic search \n- Usage: Compatible with vector databases for LLMs \n- Comparison: Base-scale model with performance similar to larger BGE models; outperforms same-size alternatives (ranked 1st on C-MTEB benchmark)."
 
 
 
 
 
 
20
  }
 
16
  "region:us"
17
  ],
18
  "description": "--- license: mit language: - zh --- **Recommend switching to newest BAAI/bge-base-zh-v1.5, which has more reasonable similarity distribution and same method of usage.** <h1 align=\"center\">FlagEmbedding</h1> <h4 align=\"center\"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href=\"#evaluation\">Evaluation</a> | <a href=\"#train\">Train</a> | <a href=\"#contact\">Contact</a> | <a href=\"#citation\">Citation</a> | <a href=\"#license\">License</a> <p> </h4> More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary> <!-- ### More --> - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions <details> <summary>1. How to fine-tune bge embedding model?</summary> <!-- ### How to fine-tune bge embedding model? --> Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. </details> <details> <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary> <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 --> **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9). </details> <details> <summary>3. When does the query instruction need to be used</summary> <!-- ### When does the query instruction need to be used --> For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction. </details> ## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.",
19
+ "model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable retrieval, classification, clustering, or semantic search tasks, including use in LLM vector databases. \n\n**Features**: \n- Language: Chinese \n- Task: Text embedding for retrieval, classification, clustering, semantic search \n- Usage: Compatible with vector databases for LLMs \n- Comparison: Base-scale model with performance similar to larger BGE models; outperforms same-size alternatives (ranked 1st on C-MTEB benchmark).",
20
+ "release_year": "2023",
21
+ "parameter_count": null,
22
+ "is_fine_tuned": false,
23
+ "category": "Embedding",
24
+ "model_family": "BERT",
25
+ "api_enhanced": true
26
  }