aidevhund commited on
Commit
6850f57
·
verified ·
1 Parent(s): 7c3e962

Update markdowm.py

Browse files
Files changed (1) hide show
  1. markdowm.py +21 -21
markdowm.py CHANGED
@@ -17,7 +17,7 @@ With QueryVault Chatbot, you can interactively query your document, receive cont
17
  ## 🚀 **Steps to Use the HundAI QueryVault Chatbot**
18
 
19
  1. **Upload Your File**
20
- Begin by uploading a document. Supported formats include `.pdf`, `.docx`, `.txt`, `.csv`, `.xlsx`, `.pptx`, `.html`, `.jpg`, `.png`, and more.
21
 
22
  2. **Select Embedding Model**
23
  Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
@@ -39,30 +39,30 @@ Upon uploading a document, the bot utilizes **LlamaParse** to parse its content.
39
  ## 🔍 **Available LLMs and Embedding Models**
40
 
41
  ### **Embedding Models** (For indexing document content)
42
- 1. **`BAAI/bge-large-en`**
43
  - **Size**: 335M parameters
44
  - **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
45
- 2. **`BAAI/bge-small-en-v1.5`**
46
  - **Size**: 33.4M parameters
47
  - **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
48
- 3. **`NeuML/pubmedbert-base-embeddings`**
49
  - **Size**: 768-dimensional dense vector space
50
  - **Best For**: Biomedical or medical-related text; highly specialized.
51
- 4. **`BAAI/llm-embedder`**
52
  - **Size**: 109M parameters
53
  - **Best For**: Basic embeddings for straightforward use cases.
54
 
55
  ### **LLMs** (For generating answers)
56
- 1. **`Mixtral-8x7B-Instruct`**
57
  - **Size**: 46.7B parameters
58
  - **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
59
- 2. **`Meta-Llama-3-8B-Instruct`**
60
  - **Size**: 8.03B parameters
61
  - **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
62
- 3. **`Mistral-7B`**
63
  - **Size**: 7.24B parameters
64
  - **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
65
- 4. **`HundAI`**
66
  - **Size**: 7.22B parameters
67
  - **Purpose**: Robust fine-tuned model for inference, leveraging large-scale data for highly contextual responses.
68
 
@@ -74,18 +74,18 @@ The choice of embedding models plays a crucial role in determining the speed and
74
 
75
  | **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
76
  |:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
77
- | **Fastest Response** | `BAAI/bge-small-en-v1.5` | Speed-oriented, ideal for high-frequency querying | May miss nuanced details |
78
- | **High Accuracy for Large Texts** | `BAAI/bge-large-en` | High accuracy, captures complex document structure | Slower response time |
79
- | **Balanced General Purpose** | `BAAI/llm-embedder` | Reliable, quick response, adaptable across topics | Moderate accuracy, general use case |
80
- | **Biomedical & Specialized Text** | `NeuML/pubmedbert-base-embeddings` | Optimized for medical and scientific text | Specialized, slightly slower |
81
 
82
  ---
83
 
84
  ## 📂 **Supported File Formats**
85
 
86
  The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
87
- - **Documents**: `.pdf`, `.docx`, `.doc`, `.txt`, `.csv`, `.xlsx`, `.pptx`, `.html`
88
- - **Images**: `.jpg`, `.jpeg`, `.png`, `.webp`, `.svg`
89
 
90
  ---
91
 
@@ -114,17 +114,17 @@ guide = '''
114
 
115
  | **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
116
  |-----------------------------|-------------------|-------------------------------------|---------------------------------|
117
- | `BAAI/bge-small-en-v1.5` | **Fastest** | Ideal for quick indexing | May miss nuanced details |
118
- | `BAAI/llm-embedder` | **Fast** | Balanced performance and detail | Slightly less precise than large models |
119
- | `BAAI/bge-large-en` | **Slow** | Best overall precision and detail | Slower due to complexity |
120
 
121
 
122
  ### Language Models (LLMs) and Use Cases
123
 
124
  | **LLM** | **Best Use Case** |
125
  |------------------------------------|-----------------------------------------|
126
- | `Mixtral-8x7B-Instruct-v0.1` | Works well for **both short and long answers** |
127
- | `Meta-Llama-3-8B-Instruct` | Ideal for **long-length answers** |
128
- | `HundAI` | Best suited for **short-length answers** |
129
 
130
  '''
 
17
  ## 🚀 **Steps to Use the HundAI QueryVault Chatbot**
18
 
19
  1. **Upload Your File**
20
+ Begin by uploading a document. Supported formats include .pdf, .docx, .txt, .csv, .xlsx, .pptx, .html, .jpg, .png, and more.
21
 
22
  2. **Select Embedding Model**
23
  Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
 
39
  ## 🔍 **Available LLMs and Embedding Models**
40
 
41
  ### **Embedding Models** (For indexing document content)
42
+ 1. **BAAI/bge-large-en**
43
  - **Size**: 335M parameters
44
  - **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
45
+ 2. **BAAI/bge-small-en-v1.5**
46
  - **Size**: 33.4M parameters
47
  - **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
48
+ 3. **NeuML/pubmedbert-base-embeddings**
49
  - **Size**: 768-dimensional dense vector space
50
  - **Best For**: Biomedical or medical-related text; highly specialized.
51
+ 4. **BAAI/llm-embedder**
52
  - **Size**: 109M parameters
53
  - **Best For**: Basic embeddings for straightforward use cases.
54
 
55
  ### **LLMs** (For generating answers)
56
+ 1. **Mixtral-8x7B-Instruct**
57
  - **Size**: 46.7B parameters
58
  - **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
59
+ 2. **Meta-Llama-3-8B-Instruct**
60
  - **Size**: 8.03B parameters
61
  - **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
62
+ 3. **Mistral-7B**
63
  - **Size**: 7.24B parameters
64
  - **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
65
+ 4. **HundAI-7B-S**
66
  - **Size**: 7.22B parameters
67
  - **Purpose**: Robust fine-tuned model for inference, leveraging large-scale data for highly contextual responses.
68
 
 
74
 
75
  | **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
76
  |:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
77
+ | **Fastest Response** | BAAI/bge-small-en-v1.5 | Speed-oriented, ideal for high-frequency querying | May miss nuanced details |
78
+ | **High Accuracy for Large Texts** | BAAI/bge-large-en | High accuracy, captures complex document structure | Slower response time |
79
+ | **Balanced General Purpose** | BAAI/llm-embedder | Reliable, quick response, adaptable across topics | Moderate accuracy, general use case |
80
+ | **Biomedical & Specialized Text** | NeuML/pubmedbert-base-embeddings | Optimized for medical and scientific text | Specialized, slightly slower |
81
 
82
  ---
83
 
84
  ## 📂 **Supported File Formats**
85
 
86
  The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
87
+ - **Documents**: .pdf, .docx, .doc, .txt, .csv, .xlsx, .pptx, .html
88
+ - **Images**: .jpg, .jpeg, .png, .webp, .svg
89
 
90
  ---
91
 
 
114
 
115
  | **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
116
  |-----------------------------|-------------------|-------------------------------------|---------------------------------|
117
+ | BAAI/bge-small-en-v1.5 | **Fastest** | Ideal for quick indexing | May miss nuanced details |
118
+ | BAAI/llm-embedder | **Fast** | Balanced performance and detail | Slightly less precise than large models |
119
+ | BAAI/bge-large-en | **Slow** | Best overall precision and detail | Slower due to complexity |
120
 
121
 
122
  ### Language Models (LLMs) and Use Cases
123
 
124
  | **LLM** | **Best Use Case** |
125
  |------------------------------------|-----------------------------------------|
126
+ | Mixtral-8x7B-Instruct-v0.1 | Works well for **both short and long answers** |
127
+ | Meta-Llama-3-8B-Instruct | Ideal for **long-length answers** |
128
+ | HundAI-7B-S | Best suited for **short-length answers** |
129
 
130
  '''