File size: 22,665 Bytes
8ed2a73
 
df15e27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ed2a73
 
8630094
 
92db3b4
8630094
 
 
 
 
 
ca583c4
df15e27
ca583c4
df15e27
 
 
ca583c4
df15e27
 
 
ca583c4
df15e27
 
 
 
8630094
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ed2a73
 
8630094
 
5c98690
8630094
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c98690
8630094
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c98690
8630094
 
 
 
 
5c98690
8630094
2a1661d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ed2a73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13f50bc
 
 
8ed2a73
 
 
13f50bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ed2a73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c98690
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
---
license: apache-2.0
language:
- en
tags:
- How to use reasoning models.
- How to use thinking models.
- How to create reasoninng models.
- deepseek
- reasoning
- reason
- thinking
- all use cases
- creative
- fiction writing
- plot generation
- sub-plot generation
- fiction writing
- story generation
- scene continue
- storytelling
- fiction story
- romance
- all genres
- story
- writing
- vivid writing
- fiction
- roleplaying
- bfloat16
- float32
- float16
- role play
- sillytavern
- backyard
- lmstudio
- Text Generation WebUI
- llama 3
- mistral
- llama 3.1
- qwen 2.5
- context 128k
- mergekit
- merge
pipeline_tag: text-generation
---

<h2>How-To-Use-Reasoning-Thinking-Models-and-Create-Them - DOCUMENT</h2>

This document covers suggestions and methods to get the most out of "Reasoning/Thinking" models, including tips/track for generation, parameters/samplers,
System Prompt/Role settings, as well as links to "Reasoning/Thinking Models" and How to create your own (via adapters).

This is a live document and updates will occur often.

This document and the information contained in it can be used for ANY "Reasoning/Thinking" model - at my repo and/or other repos.

LINKS to models and adapters:

<B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b>

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]

<B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b>

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]

<B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b>

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]

These collections will update over time. Newest items are usually at the bottom of each collection.

---

<B>Support: Document about Parameters, Samplers and How to Set These:</b>

---

For additional generational support, general questions, and detailed parameter info and a lot more see also:

https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

---

<B>Support: AI Auto-Correct Engine (software patch for SillyTavern Front End)</b>

---

AI Auto-Correct Engine (built, and programmed by DavidAU) auto-corrects AI generation in real-time, including modification of the 
live generation stream to and from the AI... creating a two way street of information that operates, changes, and edits automatically.
This system is for all GGUF, EXL2, HQQ, and other quants/compressions and full source models too.

Below is an example generation using a standard GGUF (and standard AI app), but auto-corrected via this engine.
The engine is an API level system. 

Software Link:

https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE

---

<h2>MAIN: How To Use Reasoning / Thinking Models 101 </h2>

<B>Special Operation Instructions:</B>

---

<B>Template Considerations:</b>

For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.

For most models you will need: Llama 3 Instruct or Chat, Chatml and/or Command-R OR standard "Jinja Autoloaded Template" 
(this is contained in the quant and will autoload in SOME AI Apps).

The last one is usually the BEST CHOICE for a reasoning / thinking model (and in many cases other models too).

In Lmstudio, this option appears in the lower left, "template to use -> Manual or "Jinja Template".

This option/setting it will vary from AI/LLM app.

A "Jinja" template is usually in the model's "source code" / "full precision" version and located usually in "tokenizer_config.json" file
(usually the very BOTTOM/END of the file) which is then "copied" to the GGUF quants and available to "AI/LLM" apps.

Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):

<pre>
<small>
"chat_template": "{% if not add_generation_prompt is defined %}
  {% set add_generation_prompt = false %}
  {% endif %}
  {% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
  {%- for message in messages %}
  {%- if message['role'] == 'system' %}
  {% set ns.system_prompt = message['content'] %}
  {%- endif %}
  {%- endfor %}
  {{bos_token}}
  {{ns.system_prompt}}
  {%- for message in messages %}
  {%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}
  {{'<|User|>' + message['content']}}
    {%- endif %}
    {%- if message['role'] == 'assistant' and message['content'] is none %}
    {%- set ns.is_tool = false -%}
    {%- for tool in message['tool_calls']%}
    {%- if not ns.is_first %}
    {{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n'
      + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
        {%- set ns.is_first = true -%}
        {%- else %}
        {{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' 
          + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' 
          + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}
            {%- endif %}
            {%- endfor %}
            {%- endif %}
            {%- if message['role'] == 'assistant' and message['content'] is not none %}
            {%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}
              {%- set ns.is_tool = false -%}
              {%- else %}
              {% set content = message['content'] %}
              {% if '</think>' in content %}
              {% set content = content.split('</think>')[-1] %}
              {% endif %}
              {{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}
                {%- endif %}{%- endif %}
                {%- if message['role'] == 'tool' %}
                {%- set ns.is_tool = true -%}
                {%- if ns.is_output_first %}
                {{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
                  {%- set ns.is_output_first = false %}
                  {%- else %}
                  {{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
                    {%- endif %}
                    {%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}
                      {% endif %}
                      {% if add_generation_prompt and not ns.is_tool %}
                      {{'<|Assistant|>'}}
                        {% endif %}"
</small>
</pre>

In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.

Additional Section "General Notes" is at the end of this document.

GENERATON TIPS:

General:

Here are some example prompts that will "activate" thinking properly, note the length statements.

Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800-1000 words.

Romance: Love in the Limelight. Write one scene within a larger story set in Wales. A famous (fictional) actor ducks into a small-town bookstore to escape paparazzi. The scene takes us through the characters meeting in this odd circumstance. Over the course of the scene, the actor and the bookstore owner have a conversation charged by an undercurrent of unspoken chemistry. Write the actor as somewhat of a rogue with a fragile ego, which needs to be fed by having everyone like him. He is thoroughly charming, but the bookstore owner seems (at least superficially) immune to this; which paradoxically provokes a genuine attraction and derails the charm offensive. The bookstore owner, despite the superficial rebuffs of the actor's charm, is inwardly more than a little charmed and flustered despite themselves. Write primarily in dialogue, in the distinct voices of each character. 800-1000 words.

Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...

Using insane levels of bravo and self confidence, tell me in 800-1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face.

Advanced:

You can input just the "thinking" part AS A "prompt" and sometimes get the model to start and process from this point.

Likewise you can EDIT the "thinking" part too -> and change the thought process itself.

Another way: Prompt, Copy/paste the "thinking" and output. 

New chat -> Same prompt - > Start generation 
- > Stop, EDIT the output- > put the "raw thoughts" (you can edit too) back in (minus any output)
  >  Hit continue. 

Another / Other option(s):

In the "thoughts" -> change the wording/phrases that trigger thoughts/rethinking - even changing up the words themselves
IE from "alternatively" or "considering this" will have an impact on thinking/reasoning and "end conclusions".

This is "generational steering", which is covered in this document:

https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

Really Advanced:

If you are using a frontend like SillyTavern and/or and app like Textgeneration WebUI, Llama-Server (Llamacpp) or Koboldcpp you can change the LOGITS
bias for word(s) and/or phrase(s). 

Some of these apps also have "anti-slop" / word/phrase blocking too.

IE: LOWER "alternatively" and raise "considering" (you can also BLOCK word(s) and/or phrase(s) directly too).

By adjusting these bias(es) and/or adding blocks you can alter how the model thinks too - because reasoning, like normal AI/LLM generation is all about
prediction. 

When you change the "chosen" next word and/or phrase you alter the output AND generation too. The model chooses a different path, maybe 
a slight bit different - but each choice is cumulative.

Careful testing and adjustment(s) can vastly alter the reasoning/thinking processes which may assist with your use case(s).

TEMP/SETTINGS:

1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .

PROMPTS:

1. If you enter a prompt without implied "step by step" requirements (ie: Generate a scene, write a story, give me 6 plots for xyz), "thinking" (one or more) MAY activate AFTER first generation. (IE: Generate a scene -> scene will generate, followed by suggestions for improvement in "thoughts")
2. If you enter a prompt where "thinking" is stated or implied (ie puzzle, riddle, solve this, brainstorm this idea etc), "thoughts" process(es) in Deepseek will activate almost immediately. Sometimes you need to regen it to activate.
3. You will also get a lot of variations - some will continue the generation, others will talk about how to improve it, and some (ie generation of a scene) will cause the characters to "reason" about this situation. In some cases, the model will ask you to continue generation / thoughts too.
4. In some cases the model's "thoughts" may appear in the generation itself.
5. State the word size length max IN THE PROMPT for best results, especially for activation of "thinking." (see examples below)
6. Sometimes the "censorship" (from Deepseek) will activate, regen the prompt to clear it.
7. You may want to try your prompt once at "default" or "safe" temp settings, another at temp 1.2, and a third at 2.5 as an example. This will give you a broad range of "reasoning/thoughts/problem" solving.

GENERATION - THOUGHTS/REASONING:

1. It may take one or more regens for "thinking" to "activate." (depending on the prompt)
2. Model can generate a LOT of "thoughts". Sometimes the most interesting ones are 3,4,5 or more levels deep. 
3. Many times the "thoughts" are unique and very different from one another.
4. Temp/rep pen settings can affect reasoning/thoughts too.
5. Change up or add directives/instructions or increase the detail level(s) in your prompt to improve reasoning/thinking.
6. Adding to your prompt: "think outside the box", "brainstorm X number of ideas", "focus on the most uncommon approaches" can drastically improve your results.

GENERAL SUGGESTIONS:

1. I have found opening a "new chat" per prompt works best with "thinking/reasoning activation", with temp .6, rep pen 1.05 ... THEN "regen" as required.
2. Sometimes the model will really really get completely unhinged and you need to manually stop it. 
3. Depending on your AI app, "thoughts" may appear with "< THINK >" and "</ THINK >" tags AND/OR the AI will generate "thoughts" directly in the main output or later output(s).
4. Although quant q4KM was used for testing/examples, higher quants will provide better generation / more sound "reasoning/thinking".

ADDITIONAL SUPPORT:

For additional generational support, general questions, and detailed parameter info and a lot more see also:

https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

---

<B>Recommended Settings (all) - For usage with "Think" / "Reasoning":</B>

temp: .6 , rep pen: 1.07 (range : 1.02 to 1.12), rep pen range: 64, top_k: 40, top_p: .95, min_p: .05 

Temp of 1+, 2+, 3+ will result in much deeper, richer and "more interesting" thoughts and reasoning.

Model behaviour may change with other parameter(s) and/or sampler(s) activated - especially the "thinking/reasoning" process.

--- 

<B>System Role / System Prompts - Reasoning On/Off/Variable and Augment The Model's Power:</b>

<small> ( <font color="red">Critical Setting for model operation </font> ) </small>

---

System Role / System Prompt / System Message (called "System Prompt" in this section) 
is "root access" to the model and controls internal workings - both instruction following and output generation and in the
case of this model reasoning control and on/off for reasoning too.

In this section I will show you basic, advanced, and combined "code" to control the model's reasoning, instruction following and output generation.

If you do not set a "system prompt", reasoning/thinking will be OFF by default
( unless the model has automatic invoking - IE always in "thinking mode" ), and the model will operate like a normal LLM.

HOW TO SET:

Depending on your AI "app" you may have to copy/paste on of the "codes" below to enable reasoning/thinking in the 
"System Prompt" or "System Role" window.

In Lmstudio set/activate "Power User" or "Developer" mode to access, copy/paste to System Prompt Box.

In SillyTavern go to the "template page" ("A") , activate "system prompt" and enter the text in the prompt box.

In Ollama see [ https://github.com/ollama/ollama/blob/main/README.md ] ; and setting the "system message".

In Koboldcpp, load the model, start it, go to settings -> select a template and enter the text in the "sys prompt" box.

SYSTEM PROMPTS AVAILABLE:

When you copy/paste PRESERVE formatting, including line breaks. 

If you want to edit/adjust these only do so in NOTEPAD OR the LLM App directly.



SIMPLE:

This is the generic system prompt used for generation and testing: 

<PRE>
You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.
</PRE>

This System Role/Prompt will give you "basic thinking/reasoning": 

<PRE>
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside &lt;think&gt; &lt;/think&gt; tags, and then provide your solution or response to the problem.
</PRE>

ADVANCED:

Logical and Creative - these will SIGNFICANTLY alter the output, and many times improve it too.

This will also cause more thoughts, deeper thoughts, and in many cases more detailed/stronger thoughts too.

Keep in mind you may also want to test the model with NO system prompt at all - including the default one.

Special Credit to: Eric Hartford, Cognitivecomputations ; these are based on his work.

CRITICAL: 

Copy and paste exactly as shown, preserve formatting and line breaks.

SIDE NOTE: 

These can be used in ANY Deepseek / Thinking model, including models not at this repo. 

These, if used in a "non thinking" model, will also alter model performance too.

<PRE>
You are an AI assistant developed by the world wide community of ai experts.

Your primary directive is to provide well-reasoned, structured, and extensively detailed responses.

Formatting Requirements:

1. Always structure your replies using: &lt;think&gt;{reasoning}&lt;/think&gt;{answer}
2. The &lt;think&gt;&lt;/think&gt; block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the &lt;think&gt;&lt;/think&gt; block may be left empty.
4. The user does not see the &lt;think&gt;&lt;/think&gt; section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a &lt;/think&gt; and proceed to the {answer}

Response Guidelines:

1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Scientific and Logical Approach: Your explanations should reflect the depth and precision of the greatest scientific minds.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.
</PRE>

CREATIVE:

<PRE>
You are an AI assistant developed by a world wide community of ai experts.

Your primary directive is to provide highly creative, well-reasoned, structured, and extensively detailed responses.

Formatting Requirements:

1. Always structure your replies using: &lt;think&gt;{reasoning}&lt;/think&gt;{answer}
2. The &lt;think&gt;&lt;/think&gt; block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the &lt;think&gt;&lt;/think&gt; block may be left empty.
4. The user does not see the &lt;think&gt;&lt;/think&gt; section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a &lt;/think&gt; and proceed to the {answer}

Response Guidelines:

1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.
</PRE>

---

<B>General Notes:</b>

These are general notes that have been collected from my various repos and/or from various experiences with both specific models
and all models.

These notes may assist you with other model(s) operation(s).

---

From : 

https://huggingface.co/DavidAU/L3.1-MOE-2X8B-Deepseek-DeepHermes-e32-uncensored-abliterated-13.7B-gguf

Due to how this model is configured, I suggest 2-4 generations depending on your use case(s) as each will vary widely in terms of context, thinking/reasoning and response.

Likewise, again depending on how your prompt is worded, it may take 1-4 regens for "thinking" to engage, however sometimes the model will generate a response, then think/reason and improve on this response and continue again. This is in part from "Deepseek" parts in the model.

If you raise temp over .9, you may want to consider 4+ generations.

Note on "reasoning/thinking" this will activate depending on the wording in your prompt(s) and also temp selected.

There can also be variations because of how the models interact per generation.

Also, as general note:

If you are getting "long winded" generation/thinking/reasoning you may want to breakdown the "problem(s)" to solve into one or more prompts. This will allow the model to focus more strongly, and in some case give far better answers.

IE:

If you ask it to generate 6 general plots for a story VS generate one plot with these specific requirements - you may get better results.

--- 

From :

https://huggingface.co/DavidAU/Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-gguf

Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.

Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.

Likewise, because these are small models, it may do a tonne of "thinking"/"reasoning" and then "forget" to finish a / the task(s). In this case, prompt the model to "Complete the task XYZ with the 'reasoning plan' above" .

Likewise it may function better if you breakdown the reasoning/thinking task(s) into smaller pieces :

"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".

Also set context limit at 4k minimum, 8K+ suggested.

---