Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
yusufs
/
vllm-inference
like
0
Paused
App
Files
Files
Fetching metadata from the HF Docker repository...
8c5a84b
vllm-inference
Ctrl+K
Ctrl+K
1 contributor
History:
52 commits
yusufs
feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate
8c5a84b
4 months ago
.gitignore
Safe
19 Bytes
feat(download_model.py): remove download_model.py during build, it causing big image size
6 months ago
Dockerfile
Safe
1.32 kB
feat(runner.sh): using runner.sh to select llm in the run time
5 months ago
README.md
Safe
1.73 kB
feat(add-model): always download model during build, it will be cached in the consecutive builds
6 months ago
download_model.py
Safe
700 Bytes
feat(add-model): always download model during build, it will be cached in the consecutive builds
6 months ago
main.py
Safe
6.7 kB
feat(parse): parse output
6 months ago
openai_compatible_api_server.py
Safe
24.4 kB
feat(dep_sizes.txt): removes dep_sizes.txt during build, it not needed
6 months ago
poetry.lock
Safe
426 kB
feat(refactor): move the files to root
6 months ago
pyproject.toml
Safe
416 Bytes
feat(refactor): move the files to root
6 months ago
requirements.txt
Safe
9.99 kB
feat(first-commit): follow examples and tutorials
6 months ago
run-llama.sh
Safe
1.51 kB
fix(runner.sh): --enforce-eager not support values
4 months ago
run-sailor.sh
Safe
1.83 kB
fix(runner.sh): --enforce-eager not support values
4 months ago
runner.sh
Safe
1.79 kB
feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate
4 months ago