Spaces:

yusufs
/

vllm-inference

Paused

App Files Files

Ctrl+K

Ctrl+K

1 contributor

History: 52 commits

yusufs's picture

feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate

8c5a84b 4 months ago

.gitignore

19 Bytes

feat(download_model.py): remove download_model.py during build, it causing big image size 6 months ago
Dockerfile

1.32 kB

feat(runner.sh): using runner.sh to select llm in the run time 5 months ago
README.md

1.73 kB

feat(add-model): always download model during build, it will be cached in the consecutive builds 6 months ago
download_model.py

700 Bytes

feat(add-model): always download model during build, it will be cached in the consecutive builds 6 months ago
main.py

6.7 kB

feat(parse): parse output 6 months ago
openai_compatible_api_server.py

24.4 kB

feat(dep_sizes.txt): removes dep_sizes.txt during build, it not needed 6 months ago
poetry.lock

426 kB

feat(refactor): move the files to root 6 months ago
pyproject.toml

416 Bytes

feat(refactor): move the files to root 6 months ago
requirements.txt

9.99 kB

feat(first-commit): follow examples and tutorials 6 months ago
run-llama.sh

1.51 kB

fix(runner.sh): --enforce-eager not support values 4 months ago
run-sailor.sh

1.83 kB

fix(runner.sh): --enforce-eager not support values 4 months ago
runner.sh

1.79 kB

feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate 4 months ago