Commits · Luigi/ZeroGPU-LLM-Inference

Update README.md

371669a

Running
verified

Luigi commited on 3 days ago

update readme

076c1f2

Luigi commited on 3 days ago

add Qwen2.5-Omni-3B & MiMo-7B-RL

6a4537b

Luigi commited on 4 days ago

disable models unrunnable on hf spaces

7c5f318

Luigi commited on 4 days ago

switch Phi-4-mini-Instruct" from unsloth to microsoft

23b3848

Luigi commited on 4 days ago

extend max tokens

d730ffe

Luigi commited on 4 days ago

adjust layout

4911925

Luigi commited on 4 days ago

add "Granite-4.0-Tiny-Preview" model

09d9700

Luigi commited on 4 days ago

add phi-4 reasonning and phi-4-mini reasonning

68e6569

Luigi commited on 4 days ago

bugfix: set device to xpu by mistake

1242438

Luigi commited on 4 days ago

add all qwen3 variants

2882063

Luigi commited on 4 days ago

user can define search timeout

e2ee907

Luigi commited on 4 days ago

give 5 second for web earch to gather reults

c00d442

Luigi commited on 4 days ago

support thinking models and streamingly display thought

8c3c2b9

Luigi commited on 4 days ago

do not preview prompt at error return from chat response

c09049b

Luigi commited on 4 days ago

inject assistant placeholder at right time

12dd3f3

Luigi commited on 4 days ago

disable L142 which is not needed

3c176a1

Luigi commited on 4 days ago

fix bug in prompt preview display

41ee8bf

Luigi commited on 4 days ago

add prompt preview for debug

5fc0117

Luigi commited on 4 days ago

fix: prevent self-talking issue by using tokenizer chat_template formatting

960db60

Luigi commited on 4 days ago

bugfix to Error: "str" object has no attribute "pad_token_id"

889f080

Luigi commited on 6 days ago

add taiwan elm 1.1b & 270m instruct

c8399e3

Luigi commited on 6 days ago

add type in qwen3 0.6b repo id

76d4d60

Luigi commited on 8 days ago

add qwen3

fe395ab

Luigi commited on 8 days ago

Add Smollm2 360m instruct fine-tuned on TaiwanChat

7308211

Luigi commited on 9 days ago

keep debug message

37f7787

Luigi commited on 14 days ago

add debug to show web resarch result

a2f07a4

Luigi commited on 14 days ago

give 1 second for web search to grab data

9ad3ffd

Luigi commited on 14 days ago

inject web search result if web search enabled

bc257ff

Luigi commited on 14 days ago

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0}

293686e

Luigi commited on 14 days ago

bugfixc: not using pipeline for response generation

939895d

Luigi commited on 14 days ago

Add original SmolLM2 135M Instruct for comparaison

423dc1a

Luigi commited on 14 days ago

Add SmolLM2-135M-Instruct-TaiwanChat

38fcc03

Luigi commited on 14 days ago

Add SmolLM2-135M TaiwanChat

0d642b7

Luigi commited on 14 days ago

Update README.md

34cf84a
verified

Luigi commited on 14 days ago

default to gemma-3-4b

88a6a62

Luigi commited on 24 days ago

model repo_id typo fix

89372fa

Luigi commited on 24 days ago

enable web search by default

6235e63

Luigi commited on 24 days ago

remove tinyllama which has bad response quality

a22cf42

Luigi commited on 24 days ago

make streaming response

5ea073d

Luigi commited on 24 days ago

apply history flatten before it goint to prompt

ef361b0

Luigi commited on 24 days ago

better management on system prompt

5f6306a

Luigi commited on 24 days ago

add accelerate

5ed3cb3

Luigi commited on 24 days ago

usue chat pipeline instead of model and tokenizer individually

ac8e9cc

Luigi commited on 24 days ago

bugfix to padding-related issues

f248fec

Luigi commited on 24 days ago

add attention mask

b6b3940

Luigi commited on 24 days ago

Clean model description

4731160

Luigi commited on 24 days ago

pin torch to 2.4.0

4c6b4c5

Luigi commited on 24 days ago

add sentencepiece tokenzier

4afc958

Luigi commited on 24 days ago

update requirements

51e3e3c

Luigi commited on 24 days ago

Commit History

Update README.md 371669a Running verified

update readme 076c1f2

add Qwen2.5-Omni-3B & MiMo-7B-RL 6a4537b

disable models unrunnable on hf spaces 7c5f318

switch Phi-4-mini-Instruct" from unsloth to microsoft 23b3848

extend max tokens d730ffe

adjust layout 4911925

add "Granite-4.0-Tiny-Preview" model 09d9700

add phi-4 reasonning and phi-4-mini reasonning 68e6569

bugfix: set device to xpu by mistake 1242438

add all qwen3 variants 2882063

user can define search timeout e2ee907

give 5 second for web earch to gather reults c00d442

support thinking models and streamingly display thought 8c3c2b9

do not preview prompt at error return from chat response c09049b

inject assistant placeholder at right time 12dd3f3

disable L142 which is not needed 3c176a1

fix bug in prompt preview display 41ee8bf

add prompt preview for debug 5fc0117

fix: prevent self-talking issue by using tokenizer chat_template formatting 960db60

bugfix to Error: "str" object has no attribute "pad_token_id" 889f080

add taiwan elm 1.1b & 270m instruct c8399e3

add type in qwen3 0.6b repo id 76d4d60

add qwen3 fe395ab

Add Smollm2 360m instruct fine-tuned on TaiwanChat 7308211

keep debug message 37f7787

add debug to show web resarch result a2f07a4

give 1 second for web search to grab data 9ad3ffd

inject web search result if web search enabled bc257ff

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0} 293686e

bugfixc: not using pipeline for response generation 939895d

Add original SmolLM2 135M Instruct for comparaison 423dc1a

Add SmolLM2-135M-Instruct-TaiwanChat 38fcc03

Add SmolLM2-135M TaiwanChat 0d642b7

Update README.md 34cf84a verified

default to gemma-3-4b 88a6a62

model repo_id typo fix 89372fa

enable web search by default 6235e63

remove tinyllama which has bad response quality a22cf42

make streaming response 5ea073d

apply history flatten before it goint to prompt ef361b0

better management on system prompt 5f6306a

add accelerate 5ed3cb3

usue chat pipeline instead of model and tokenizer individually ac8e9cc

bugfix to padding-related issues f248fec

add attention mask b6b3940

Clean model description 4731160

pin torch to 2.4.0 4c6b4c5

add sentencepiece tokenzier 4afc958

update requirements 51e3e3c

Update README.md

371669a

Running
verified

update readme

076c1f2

add Qwen2.5-Omni-3B & MiMo-7B-RL

6a4537b

disable models unrunnable on hf spaces

7c5f318

switch Phi-4-mini-Instruct" from unsloth to microsoft

23b3848

extend max tokens

d730ffe

adjust layout

4911925

add "Granite-4.0-Tiny-Preview" model

09d9700

add phi-4 reasonning and phi-4-mini reasonning

68e6569

bugfix: set device to xpu by mistake

1242438

add all qwen3 variants

2882063

user can define search timeout

e2ee907

give 5 second for web earch to gather reults

c00d442

support thinking models and streamingly display thought

8c3c2b9

do not preview prompt at error return from chat response

c09049b

inject assistant placeholder at right time

12dd3f3

disable L142 which is not needed

3c176a1

fix bug in prompt preview display

41ee8bf

add prompt preview for debug

5fc0117

fix: prevent self-talking issue by using tokenizer chat_template formatting

960db60

bugfix to Error: "str" object has no attribute "pad_token_id"

889f080

add taiwan elm 1.1b & 270m instruct

c8399e3

add type in qwen3 0.6b repo id

76d4d60

add qwen3

fe395ab

Add Smollm2 360m instruct fine-tuned on TaiwanChat

7308211

keep debug message

37f7787

add debug to show web resarch result

a2f07a4

give 1 second for web search to grab data

9ad3ffd

inject web search result if web search enabled

bc257ff

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0}

293686e

bugfixc: not using pipeline for response generation

939895d

Add original SmolLM2 135M Instruct for comparaison

423dc1a

Add SmolLM2-135M-Instruct-TaiwanChat

38fcc03

Add SmolLM2-135M TaiwanChat

0d642b7

Update README.md

34cf84a
verified

default to gemma-3-4b

88a6a62

model repo_id typo fix

89372fa

enable web search by default

6235e63

remove tinyllama which has bad response quality

a22cf42

make streaming response

5ea073d

apply history flatten before it goint to prompt

ef361b0

better management on system prompt

5f6306a

add accelerate

5ed3cb3

usue chat pipeline instead of model and tokenizer individually

ac8e9cc

bugfix to padding-related issues

f248fec

add attention mask

b6b3940

Clean model description

4731160

pin torch to 2.4.0

4c6b4c5

add sentencepiece tokenzier

4afc958

update requirements

51e3e3c