whackthejacker commited on
Commit
82433a0
·
verified ·
1 Parent(s): 9ea0df0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -1
README.md CHANGED
@@ -16,7 +16,149 @@ models:
16
  [![Python 3.9+](https://img.shields.io/badge/python-%3E%3D3.9-blue.svg)](https://www.python.org/downloads)
17
  [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
18
 
19
-
20
  ---
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
 
16
  [![Python 3.9+](https://img.shields.io/badge/python-%3E%3D3.9-blue.svg)](https://www.python.org/downloads)
17
  [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
18
 
 
19
  ---
20
 
21
+ # Mixture of Experts
22
+
23
+ Welcome to **Mixture of Experts** – a Hugging Face Space built to interact with advanced multimodal conversational AI using Gradio. This Space leverages the Aria-Chat model, which excels in handling open-ended, multi-round dialogs with text and image inputs.
24
+
25
+ ## Key Features
26
+
27
+ - **Multimodal Interaction:** Seamlessly integrate text and image inputs for rich, conversational experiences.
28
+ - **Advanced Conversational Abilities:** Benefit from Aria-Chat’s fine-tuned performance in generating coherent and context-aware responses.
29
+ - **Optimized Performance:** Designed for reliable, long-format outputs, reducing common pitfalls like incomplete markdown or endless list outputs.
30
+ - **Multilingual Support:** Optimized to handle multiple languages including Chinese, Spanish, French, and Japanese.
31
+
32
+ ## Quick Start
33
+
34
+ ### Installation
35
+
36
+ To run the Space locally or to integrate into your workflow, ensure you have the following dependencies installed:
37
+
38
+ ```bash
39
+ pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
40
+ pip install flash-attn --no-build-isolation
41
+
42
+ # Optionally, for improved inference performance:
43
+ pip install grouped_gemm==0.1.6
44
+
45
+ ```
46
+
47
+ Usage
48
+ Below is a simple code snippet demonstrating how to interact with the Aria-Chat model. Customize it further to suit your integration needs:
49
+
50
+ ```python
51
+ import requests
52
+ import torch
53
+ from PIL import Image
54
+ from transformers import AutoModelForCausalLM, AutoProcessor
55
+
56
+ model_id_or_path = "rhymes-ai/Aria-Chat"
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_id_or_path,
60
+ device_map="auto",
61
+ torch_dtype=torch.bfloat16,
62
+ trust_remote_code=True
63
+ )
64
+
65
+ processor = AutoProcessor.from_pretrained(
66
+ model_id_or_path,
67
+ trust_remote_code=True
68
+ )
69
+
70
+ # Example image input
71
+ image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
72
+ image = Image.open(requests.get(image_url, stream=True).raw)
73
+
74
+ # Prepare a conversation message
75
+ messages = [
76
+ {
77
+ "role": "user",
78
+ "content": [
79
+ {"text": None, "type": "image"},
80
+ {"text": "What is the image?", "type": "text"},
81
+ ],
82
+ }
83
+ ]
84
+
85
+ # Format text input with chat template
86
+ text = processor.apply_chat_template(messages, add_generation_prompt=True)
87
+ inputs = processor(text=text, images=image, return_tensors="pt")
88
+ inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
89
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
90
+
91
+ # Generate the response
92
+ with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
93
+ output = model.generate(
94
+ **inputs,
95
+ max_new_tokens=500,
96
+ stop_strings=["<|im_end|>"],
97
+ tokenizer=processor.tokenizer,
98
+ do_sample=True,
99
+ temperature=0.9,
100
+ )
101
+ output_ids = output[0][inputs["input_ids"].shape[1]:]
102
+ result = processor.decode(output_ids, skip_special_tokens=True)
103
+
104
+ print(result)
105
+ ```
106
+
107
+ ### Running the Space with Gradio
108
+ Our Space leverages Gradio for an interactive web interface. Once the required dependencies are installed, simply run your Space to:
109
+
110
+ - Interact in real time with the multimodal capabilities of Aria-Chat.
111
+ - Test various inputs including images and text for a dynamic conversational experience.
112
+
113
+ ## Advanced Usage
114
+ For more complex use cases:
115
+
116
+ - Fine-tuning: Check out our linked codebase for guidance on fine-tuning Aria-Chat on your custom datasets.
117
+ - vLLM Inference: Explore advanced inference options to optimize latency and throughput.
118
+
119
+ ### Credits & Citation
120
+ If you find this work useful, please consider citing the Aria-Chat model:
121
+
122
+ ```bibtex
123
+ Copy
124
+ Edit
125
+ @article{aria,
126
+ title={Aria: An Open Multimodal Native Mixture-of-Experts Model},
127
+ author={Dongxu Li and Yudong Liu and Haoning Wu and Yue Wang and Zhiqi Shen and Bowen Qu and Xinyao Niu and Guoyin Wang and Bei Chen and Junnan Li},
128
+ year={2024},
129
+ journal={arXiv preprint arXiv:2410.05993},
130
+ }
131
+ ```
132
+
133
+ ## License
134
+ This project is licensed under the Apache-2.0 License.
135
+
136
+ Happy chatting and expert mixing! If you encounter any issues or have suggestions, feel free to open an issue or contribute to the repository.Running the Space with Gradio
137
+ Our Space leverages Gradio for an interactive web interface. Once the required dependencies are installed, simply run your Space to:
138
+
139
+ - Interact in real time with the multimodal capabilities of Aria-Chat.
140
+ - Test various inputs including images and text for a dynamic conversational experience.
141
+
142
+
143
+ ## Advanced Usage
144
+ For more complex use cases:
145
+
146
+ - Fine-tuning: Check out our linked codebase for guidance on fine-tuning Aria-Chat on your custom datasets.
147
+ vLLM Inference: Explore advanced inference options to optimize latency and throughput.
148
+ ## Credits & Citation
149
+ If you find this work useful, please consider citing the Aria-Chat model:
150
+
151
+ bibtex
152
+ @article{aria,
153
+ title={Aria: An Open Multimodal Native Mixture-of-Experts Model},
154
+ author={Dongxu Li and Yudong Liu and Haoning Wu and Yue Wang and Zhiqi Shen and Bowen Qu and Xinyao Niu and Guoyin Wang and Bei Chen and Junnan Li},
155
+ year={2024},
156
+ journal={arXiv preprint arXiv:2410.05993},
157
+ }
158
+
159
+ ## License
160
+ This project is licensed under the Apache-2.0 License.
161
+
162
+ Happy chatting and expert mixing! If you encounter any issues or have suggestions, feel free to open an issue or contribute to the repository.
163
+
164
  An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).