Model Card for PrunaAI/tiny-random-llama4-smashed

This model was created using the pruna library. Pruna is a model optimization framework built for developers, enabling you to deliver more efficient models with minimal implementation overhead.

Usage

First things first, you need to install the pruna library:

pip install pruna

You can then load this model using the following code:

from pruna import PrunaModel

loaded_model = PrunaModel.from_hub("PrunaAI/tiny-random-llama4-smashed")

After loading the model, you can use the inference methods of the original model.

Smash Configuration

The compression configuration of the model is stored in the smash_config.json file.

{
    "batcher": null,
    "cacher": null,
    "compiler": "torch_compile",
    "pruner": null,
    "quantizer": null,
    "torch_compile_backend": "inductor",
    "torch_compile_batch_size": 1,
    "torch_compile_dynamic": null,
    "torch_compile_fullgraph": true,
    "torch_compile_make_portable": false,
    "torch_compile_max_kv_cache_size": 400,
    "torch_compile_mode": "default",
    "torch_compile_seqlen_manual_cuda_graph": 100,
    "max_batch_size": 1,
    "device": "cpu",
    "save_fns": [
        "save_before_apply"
    ],
    "load_fns": [
        "transformers"
    ],
    "reapply_after_load": {
        "pruner": null,
        "quantizer": null,
        "cacher": null,
        "compiler": "torch_compile",
        "batcher": null
    }
}

Model Configuration

The configuration of the model is stored in the config.json file.

{}

🌍 Join the Pruna AI community!

Twitter GitHub LinkedIn Discord Reddit

Downloads last month
14
Safetensors
Model size
6.52M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support