File size: 6,912 Bytes
1282020
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c9319a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1282020
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c9319a
1282020
 
 
 
 
1c9319a
 
1282020
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE

language:
- multilingual
pipeline_tag: text-generation
tags:
- nlp
- code
- vision
- DirectML
- ONNX
- DML
- ONNXRuntime
- phi3
- nlp
- conversational
- custom_code
inference: false

---
# Phi-3-vision-128k-instruct ONNX models for CPU and CUDA
This repository hosts the optimized versions of [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/) to accelerate inference with ONNX Runtime.
This repository is a clone from [microsoft/Phi-3-vision-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu), with extra files necessary for deploying the model with OpenAI-API-Compatible endpoints through [`embeddedllm`](https://github.com/EmbeddedLLM/embeddedllm) pypi library.

## Usage on Windows (Intel / AMD / Nvidia / Qualcomm)
```powershell
conda create -n onnx python=3.10
conda activate onnx
winget install -e --id GitHub.GitLFS
pip install huggingface-hub[cli]
huggingface-cli download EmbeddedLLM/Phi-3-vision-128k-instruct-onnx --include='onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4' --local-dir .\Phi-3-vision-128k-instruct-onnx
pip install numpy==1.26.4
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py" -OutFile "phi3v.py"
pip install onnxruntime
pip install --pre onnxruntime-genai==0.3.0rc2
python phi3v.py -m .\Phi-3-vision-128k-instruct-onnx
```

# UPSTREAM README.md

# Phi-3-vision-128k-instruct ONNX

This repository hosts the optimized versions of [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/) to accelerate inference with DirectML and ONNX Runtime.

The Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.  
The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

## Intended Uses

**Primary use cases**

The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications with visual and text input capabilities which require 

1) memory/compute constrained environments;
2) latency bound scenarios;
3) general image understanding;
4) OCR;
5) chart and table understanding.

Our model is designed to accelerate research on efficient language and multimodal models, for use as a building block for generative AI powered features.

**Use case considerations**

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. 
Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. 

Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.

## ONNX Models

Here are some of the optimized configurations we have added:
- **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.

## Usage

### Installation and Setup

To use the Phi-3-vision-128k-instruct ONNX model on Windows with DirectML, follow these steps:

1. **Create and activate a Conda environment:**
```sh
conda create -n onnx python=3.10
conda activate onnx
```

2. **Install Git LFS:**
```sh
winget install -e --id GitHub.GitLFS
```

3. **Install Hugging Face CLI:**
```sh
pip install huggingface-hub[cli]
```

4. **Download the model:**
```sh
huggingface-cli download EmbeddedLLM/Phi-3-vision-128k-instruct-onnx --include="onnx/cpu_and_mobile/*" --local-dir .\Phi-3-vision-128k-instruct
```

5. **Install necessary Python packages:**
```sh
pip install numpy==1.26.4
pip install onnxruntime
pip install --pre onnxruntime-genai==0.3.0rc2
```

6. **Install Visual Studio 2015 runtime:**
```sh
conda install conda-forge::vs2015_runtime
```

7. **Download the example script:**
```sh
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
```

8. **Run the example script:**
```sh
python phi3-qa.py -m .\Phi-3-vision-128k-instruct
```

### Hardware Requirements

**Minimum Configuration:**
- **Windows:** DirectX 12-capable GPU (AMD/Nvidia/Intel)
- **CPU:** x86_64 / ARM64

**Tested Configurations:**
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
- **CPU:** AMD Ryzen CPU

## Hardware Supported

The model has been tested on:
- GPU SKU: RTX 4090 (DirectML)

Minimum Configuration Required:
- Windows: DirectX 12-capable GPU and a minimum of 10GB of combined RAM

### Model Description

- **Developed by:**  Microsoft
- **Model type:** ONNX
- **Language(s) (NLP):** Python, C, C++
- **License:** MIT
- **Model Description:** This is a conversion of the Phi-3 Vision 128K Instruct model for ONNX Runtime inference.

## Additional Details
- [**Phi-3 Small, Medium, and Vision Blog**](https://aka.ms/phi3_ONNXBuild24)
- [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
- [**Phi-3 Model Card**]( https://aka.ms/phi3-medium-4k-instruct)
- [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
- [**Phi-3 on Azure AI Studio**](https://aka.ms/phi3-azure-ai)
  
## License

The model is licensed under the [MIT license](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE).

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.