Mishig victor HF Staff commited on
Commit
c07e88e
·
unverified ·
1 Parent(s): 3e99a64

Revamp llama.cpp docs (#1214)

Browse files

* Revamp llama.cpp docs

* format

* update readme

* update index page

* update readme

* bertter fomratting

* Update README.md

Co-authored-by: Victor Muštar <[email protected]>

* Update README.md

Co-authored-by: Victor Muštar <[email protected]>

* fix hashlink

* document llama hf args

* format

---------

Co-authored-by: Victor Muštar <[email protected]>

README.md CHANGED
@@ -20,15 +20,79 @@ load_balancing_strategy: random
20
 
21
  A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).
22
 
23
- 0. [No Setup Deploy](#no-setup-deploy)
24
- 1. [Setup](#setup)
25
- 2. [Launch](#launch)
26
- 3. [Web Search](#web-search)
27
- 4. [Text Embedding Models](#text-embedding-models)
28
- 5. [Extra parameters](#extra-parameters)
29
- 6. [Common issues](#common-issues)
30
- 7. [Deploying to a HF Space](#deploying-to-a-hf-space)
31
- 8. [Building](#building)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## No Setup Deploy
34
 
@@ -415,11 +479,14 @@ MODELS=`[{
415
 
416
  chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
417
 
418
- If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
419
 
420
- 1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
421
- 2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
422
- 3. Add the following to your `.env.local`:
 
 
 
423
 
424
  ```env
425
  MODELS=`[
 
20
 
21
  A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).
22
 
23
+ 0. [Quickstart](#quickstart)
24
+ 1. [No Setup Deploy](#no-setup-deploy)
25
+ 2. [Setup](#setup)
26
+ 3. [Launch](#launch)
27
+ 4. [Web Search](#web-search)
28
+ 5. [Text Embedding Models](#text-embedding-models)
29
+ 6. [Extra parameters](#extra-parameters)
30
+ 7. [Common issues](#common-issues)
31
+ 8. [Deploying to a HF Space](#deploying-to-a-hf-space)
32
+ 9. [Building](#building)
33
+
34
+ ## Quickstart
35
+
36
+ You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
37
+
38
+ **Step 1 (Start llama.cpp server):**
39
+
40
+ ```bash
41
+ # install llama.cpp
42
+ brew install llama.cpp
43
+ # start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
44
+ llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
45
+ ```
46
+
47
+ A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
48
+
49
+ **Step 2 (tell chat-ui to use local llama.cpp server):**
50
+
51
+ Add the following to your `.env.local`:
52
+
53
+ ```ini
54
+ MODELS=`[
55
+ {
56
+ "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
57
+ "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
58
+ "preprompt": "",
59
+ "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
60
+ "parameters": {
61
+ "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
62
+ "temperature": 0.7,
63
+ "max_new_tokens": 1024,
64
+ "truncate": 3071
65
+ },
66
+ "endpoints": [{
67
+ "type" : "llamacpp",
68
+ "baseURL": "http://localhost:8080"
69
+ }],
70
+ },
71
+ ]`
72
+ ```
73
+
74
+ Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
75
+
76
+ **Step 3 (make sure you have MongoDb running locally):**
77
+
78
+ ```bash
79
+ docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
80
+ ```
81
+
82
+ Read more [here](#database).
83
+
84
+ **Step 4 (start chat-ui):**
85
+
86
+ ```bash
87
+ git clone https://github.com/huggingface/chat-ui
88
+ cd chat-ui
89
+ npm install
90
+ npm run dev -- --open
91
+ ```
92
+
93
+ Read more [here](#launch).
94
+
95
+ <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
96
 
97
  ## No Setup Deploy
98
 
 
479
 
480
  chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
481
 
482
+ If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
483
 
484
+ ```bash
485
+ # install llama.cpp
486
+ brew install llama.cpp
487
+ # start llama.cpp server
488
+ llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
489
+ ```
490
 
491
  ```env
492
  MODELS=`[
docs/source/configuration/models/providers/llamacpp.md CHANGED
@@ -7,32 +7,43 @@
7
 
8
  Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
9
 
10
- If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:
11
 
12
- 1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
13
- 2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
14
- 3. Add the following to your `.env.local`:
 
 
 
 
 
 
 
 
 
15
 
16
  ```ini
17
  MODELS=`[
18
  {
19
- "name": "Local Zephyr",
20
- "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
 
 
21
  "parameters": {
22
- "temperature": 0.1,
23
- "top_p": 0.95,
24
- "repetition_penalty": 1.2,
25
- "top_k": 50,
26
- "truncate": 1000,
27
- "max_new_tokens": 2048,
28
- "stop": ["</s>"]
29
  },
30
- "endpoints": [
31
- {
32
- "url": "http://127.0.0.1:8080",
33
- "type": "llamacpp"
34
- }
35
- ]
36
- }
37
  ]`
38
  ```
 
 
 
 
 
 
7
 
8
  Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
9
 
10
+ If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
11
 
12
+ ```bash
13
+ # install llama.cpp
14
+ brew install llama.cpp
15
+ # start llama.cpp server
16
+ llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
17
+ ```
18
+
19
+ _note: you can swap the `hf-repo` and `hf-file` with your fav GGUF on the [Hub](https://huggingface.co/models?library=gguf). For example: `--hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` for [this repo](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) & `--hf-file tinyllama-1.1b-chat-v1.0.Q4_0.gguf` for [this file](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blob/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf)._
20
+
21
+ A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).
22
+
23
+ Add the following to your `.env.local`:
24
 
25
  ```ini
26
  MODELS=`[
27
  {
28
+ "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
29
+ "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
30
+ "preprompt": "",
31
+ "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
32
  "parameters": {
33
+ "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
34
+ "temperature": 0.7,
35
+ "max_new_tokens": 1024,
36
+ "truncate": 3071
 
 
 
37
  },
38
+ "endpoints": [{
39
+ "type" : "llamacpp",
40
+ "baseURL": "http://localhost:8080"
41
+ }],
42
+ },
 
 
43
  ]`
44
  ```
45
+
46
+ <div class="flex justify-center">
47
+ <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
48
+ <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
49
+ </div>
docs/source/index.md CHANGED
@@ -9,3 +9,69 @@ Open source chat interface with support for tools, web search, multimodal and ma
9
  🐙 **Multimodal**: Accepts image file uploads on supported providers
10
 
11
  👤 **OpenID**: Optionally setup OpenID for user authentication
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  🐙 **Multimodal**: Accepts image file uploads on supported providers
10
 
11
  👤 **OpenID**: Optionally setup OpenID for user authentication
12
+
13
+ ## Quickstart Locally
14
+
15
+ You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
16
+
17
+ **Step 1 (Start llama.cpp server):**
18
+
19
+ ```bash
20
+ # install llama.cpp
21
+ brew install llama.cpp
22
+ # start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
23
+ llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
24
+ ```
25
+
26
+ A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
27
+
28
+ **Step 2 (tell chat-ui to use local llama.cpp server):**
29
+
30
+ Add the following to your `.env.local`:
31
+
32
+ ```ini
33
+ MODELS=`[
34
+ {
35
+ "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
36
+ "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
37
+ "preprompt": "",
38
+ "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
39
+ "parameters": {
40
+ "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
41
+ "temperature": 0.7,
42
+ "max_new_tokens": 1024,
43
+ "truncate": 3071
44
+ },
45
+ "endpoints": [{
46
+ "type" : "llamacpp",
47
+ "baseURL": "http://localhost:8080"
48
+ }],
49
+ },
50
+ ]`
51
+ ```
52
+
53
+ Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
54
+
55
+ **Step 3 (make sure you have MongoDb running locally):**
56
+
57
+ ```bash
58
+ docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
59
+ ```
60
+
61
+ Read more [here](https://github.com/huggingface/chat-ui?tab=Readme-ov-file#database).
62
+
63
+ **Step 4 (start chat-ui):**
64
+
65
+ ```bash
66
+ git clone https://github.com/huggingface/chat-ui
67
+ cd chat-ui
68
+ npm install
69
+ npm run dev -- --open
70
+ ```
71
+
72
+ read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#launch).
73
+
74
+ <div class="flex justify-center">
75
+ <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
76
+ <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
77
+ </div>