Update markdowm.py
Browse files- markdowm.py +21 -21
markdowm.py
CHANGED
@@ -17,7 +17,7 @@ With QueryVault Chatbot, you can interactively query your document, receive cont
|
|
17 |
## 🚀 **Steps to Use the HundAI QueryVault Chatbot**
|
18 |
|
19 |
1. **Upload Your File**
|
20 |
-
Begin by uploading a document. Supported formats include
|
21 |
|
22 |
2. **Select Embedding Model**
|
23 |
Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
|
@@ -39,30 +39,30 @@ Upon uploading a document, the bot utilizes **LlamaParse** to parse its content.
|
|
39 |
## 🔍 **Available LLMs and Embedding Models**
|
40 |
|
41 |
### **Embedding Models** (For indexing document content)
|
42 |
-
1.
|
43 |
- **Size**: 335M parameters
|
44 |
- **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
|
45 |
-
2.
|
46 |
- **Size**: 33.4M parameters
|
47 |
- **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
|
48 |
-
3.
|
49 |
- **Size**: 768-dimensional dense vector space
|
50 |
- **Best For**: Biomedical or medical-related text; highly specialized.
|
51 |
-
4.
|
52 |
- **Size**: 109M parameters
|
53 |
- **Best For**: Basic embeddings for straightforward use cases.
|
54 |
|
55 |
### **LLMs** (For generating answers)
|
56 |
-
1.
|
57 |
- **Size**: 46.7B parameters
|
58 |
- **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
|
59 |
-
2.
|
60 |
- **Size**: 8.03B parameters
|
61 |
- **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
|
62 |
-
3.
|
63 |
- **Size**: 7.24B parameters
|
64 |
- **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
|
65 |
-
4.
|
66 |
- **Size**: 7.22B parameters
|
67 |
- **Purpose**: Robust fine-tuned model for inference, leveraging large-scale data for highly contextual responses.
|
68 |
|
@@ -74,18 +74,18 @@ The choice of embedding models plays a crucial role in determining the speed and
|
|
74 |
|
75 |
| **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
|
76 |
|:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
|
77 |
-
| **Fastest Response** |
|
78 |
-
| **High Accuracy for Large Texts** |
|
79 |
-
| **Balanced General Purpose** |
|
80 |
-
| **Biomedical & Specialized Text** |
|
81 |
|
82 |
---
|
83 |
|
84 |
## 📂 **Supported File Formats**
|
85 |
|
86 |
The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
|
87 |
-
- **Documents**:
|
88 |
-
- **Images**:
|
89 |
|
90 |
---
|
91 |
|
@@ -114,17 +114,17 @@ guide = '''
|
|
114 |
|
115 |
| **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
|
116 |
|-----------------------------|-------------------|-------------------------------------|---------------------------------|
|
117 |
-
|
|
118 |
-
|
|
119 |
-
|
|
120 |
|
121 |
|
122 |
### Language Models (LLMs) and Use Cases
|
123 |
|
124 |
| **LLM** | **Best Use Case** |
|
125 |
|------------------------------------|-----------------------------------------|
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
129 |
|
130 |
'''
|
|
|
17 |
## 🚀 **Steps to Use the HundAI QueryVault Chatbot**
|
18 |
|
19 |
1. **Upload Your File**
|
20 |
+
Begin by uploading a document. Supported formats include .pdf, .docx, .txt, .csv, .xlsx, .pptx, .html, .jpg, .png, and more.
|
21 |
|
22 |
2. **Select Embedding Model**
|
23 |
Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
|
|
|
39 |
## 🔍 **Available LLMs and Embedding Models**
|
40 |
|
41 |
### **Embedding Models** (For indexing document content)
|
42 |
+
1. **BAAI/bge-large-en**
|
43 |
- **Size**: 335M parameters
|
44 |
- **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
|
45 |
+
2. **BAAI/bge-small-en-v1.5**
|
46 |
- **Size**: 33.4M parameters
|
47 |
- **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
|
48 |
+
3. **NeuML/pubmedbert-base-embeddings**
|
49 |
- **Size**: 768-dimensional dense vector space
|
50 |
- **Best For**: Biomedical or medical-related text; highly specialized.
|
51 |
+
4. **BAAI/llm-embedder**
|
52 |
- **Size**: 109M parameters
|
53 |
- **Best For**: Basic embeddings for straightforward use cases.
|
54 |
|
55 |
### **LLMs** (For generating answers)
|
56 |
+
1. **Mixtral-8x7B-Instruct**
|
57 |
- **Size**: 46.7B parameters
|
58 |
- **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
|
59 |
+
2. **Meta-Llama-3-8B-Instruct**
|
60 |
- **Size**: 8.03B parameters
|
61 |
- **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
|
62 |
+
3. **Mistral-7B**
|
63 |
- **Size**: 7.24B parameters
|
64 |
- **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
|
65 |
+
4. **HundAI-7B-S**
|
66 |
- **Size**: 7.22B parameters
|
67 |
- **Purpose**: Robust fine-tuned model for inference, leveraging large-scale data for highly contextual responses.
|
68 |
|
|
|
74 |
|
75 |
| **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
|
76 |
|:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
|
77 |
+
| **Fastest Response** | BAAI/bge-small-en-v1.5 | Speed-oriented, ideal for high-frequency querying | May miss nuanced details |
|
78 |
+
| **High Accuracy for Large Texts** | BAAI/bge-large-en | High accuracy, captures complex document structure | Slower response time |
|
79 |
+
| **Balanced General Purpose** | BAAI/llm-embedder | Reliable, quick response, adaptable across topics | Moderate accuracy, general use case |
|
80 |
+
| **Biomedical & Specialized Text** | NeuML/pubmedbert-base-embeddings | Optimized for medical and scientific text | Specialized, slightly slower |
|
81 |
|
82 |
---
|
83 |
|
84 |
## 📂 **Supported File Formats**
|
85 |
|
86 |
The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
|
87 |
+
- **Documents**: .pdf, .docx, .doc, .txt, .csv, .xlsx, .pptx, .html
|
88 |
+
- **Images**: .jpg, .jpeg, .png, .webp, .svg
|
89 |
|
90 |
---
|
91 |
|
|
|
114 |
|
115 |
| **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
|
116 |
|-----------------------------|-------------------|-------------------------------------|---------------------------------|
|
117 |
+
| BAAI/bge-small-en-v1.5 | **Fastest** | Ideal for quick indexing | May miss nuanced details |
|
118 |
+
| BAAI/llm-embedder | **Fast** | Balanced performance and detail | Slightly less precise than large models |
|
119 |
+
| BAAI/bge-large-en | **Slow** | Best overall precision and detail | Slower due to complexity |
|
120 |
|
121 |
|
122 |
### Language Models (LLMs) and Use Cases
|
123 |
|
124 |
| **LLM** | **Best Use Case** |
|
125 |
|------------------------------------|-----------------------------------------|
|
126 |
+
| Mixtral-8x7B-Instruct-v0.1 | Works well for **both short and long answers** |
|
127 |
+
| Meta-Llama-3-8B-Instruct | Ideal for **long-length answers** |
|
128 |
+
| HundAI-7B-S | Best suited for **short-length answers** |
|
129 |
|
130 |
'''
|