Spaces:

neuralmagic
/

README

Running

File size: 1,407 Bytes

c3d28e6
 
0f34e54
c3d28e6
0f34e54
c3d28e6
 
 
 
a2c8f17
5a9070c
092b9b4
 
7c8dafe
f9ac629
092b9b4
d700804
f9ac629
7101434
f9ac629
c3ddd3d

---
title: README
emoji: 💻
colorFrom: purple
colorTo: blue
sdk: static
pinned: false
---

# The Future of AI is Open

**If you are looking for compressed models to run with vLLM, they have been moved to the [RedHatAI](https://huggingface.co/RedHatAI) organization. We are looking forward to continue publishing optimized models for open source use!**

[Neural Magic](https://neuralmagic.com/) helps developers in accelerating deep learning performance using automated model compression technologies and inference engines.
Download our compression-aware inference engines and open source tools for fast model inference. 
* [vLLM](https://github.com/vllm-project/vllm/): A high-throughput and memory-efficient inference engine for at-scale deployment of performant open-source LLMs
* [LLM Compressor](https://github.com/vllm-project/llm-compressor/): HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM

![NM Workflow](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/QacT1zAnoidTKqRTY4NxH.png)

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.