Model Card for PrunaAI/tiny-random-llama4-smashed
This model was created using the pruna library. Pruna is a model optimization framework built for developers, enabling you to deliver more efficient models with minimal implementation overhead.
Usage
First things first, you need to install the pruna library:
pip install pruna
You can then load this model using the following code:
from pruna import PrunaModel
loaded_model = PrunaModel.from_hub("PrunaAI/tiny-random-llama4-smashed")
After loading the model, you can use the inference methods of the original model.
Smash Configuration
The compression configuration of the model is stored in the smash_config.json
file.
{
"batcher": null,
"cacher": null,
"compiler": "torch_compile",
"pruner": null,
"quantizer": null,
"torch_compile_backend": "inductor",
"torch_compile_batch_size": 1,
"torch_compile_dynamic": null,
"torch_compile_fullgraph": true,
"torch_compile_make_portable": false,
"torch_compile_max_kv_cache_size": 400,
"torch_compile_mode": "default",
"torch_compile_seqlen_manual_cuda_graph": 100,
"max_batch_size": 1,
"device": "cpu",
"save_fns": [
"save_before_apply"
],
"load_fns": [
"transformers"
],
"reapply_after_load": {
"pruner": null,
"quantizer": null,
"cacher": null,
"compiler": "torch_compile",
"batcher": null
}
}
Model Configuration
The configuration of the model is stored in the config.json
file.
{}
π Join the Pruna AI community!
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support