prince-canuma commited on
Commit
93eb5fb
·
verified ·
1 Parent(s): 8f47f45

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - ar
5
+ - de
6
+ - en
7
+ - es
8
+ - fr
9
+ - hi
10
+ - id
11
+ - it
12
+ - pt
13
+ - th
14
+ - tl
15
+ - vi
16
+ base_model:
17
+ - meta-llama/Llama-4-Scout-17B-16E
18
+ tags:
19
+ - facebook
20
+ - meta
21
+ - pytorch
22
+ - llama
23
+ - llama-4
24
+ - mlx
25
+ extra_gated_prompt: '**LLAMA 4 COMMUNITY LICENSE AGREEMENT**
26
+
27
+ Llama 4 Version Effective Date: April 5, 2025
28
+
29
+ "**Agreement**" means the terms and conditions for use, reproduction, distribution
30
+ and modification of the Llama Materials set forth herein.
31
+
32
+ "**Documentation**" means the specifications, manuals and documentation accompanying
33
+ Llama 4 distributed by Meta at [https://www.llama.com/docs/overview](https://llama.com/docs/overview).
34
+
35
+ "**Licensee**" or "**you**" means you, or your employer or any other person or entity
36
+ (if you are entering into this Agreement on such person or entity’s behalf), of
37
+ the age required under applicable laws, rules or regulations to provide legal consent
38
+ and that has legal authority to bind your employer or such other person or entity
39
+ if you are entering in this Agreement on their behalf.
40
+
41
+ "**Llama 4**" means the foundational large language models and software and algorithms,
42
+ including machine-learning model code, trained model weights, inference-enabling
43
+ code, training-enabling code, fine-tuning enabling code and other elements of the
44
+ foregoing distributed by Meta at [https://www.llama.com/llama-downloads](https://www.llama.com/llama-downloads).
45
+
46
+ "**Llama Materials**" means, collectively, Meta’s proprietary Llama 4 and Documentation
47
+ (and any portion thereof) made available under this Agreement.
48
+
49
+ "**Meta**" or "**we**" means Meta Platforms Ireland Limited (if you are located
50
+ in or, if you are an entity, your principal place of business is in the EEA or Switzerland)
51
+ and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 
52
+
53
+ By clicking "I Accept" below or by using or distributing any portion or element
54
+ of the Llama Materials, you agree to be bound by this Agreement.
55
+
56
+ 1\. **License Rights and Redistribution**.
57
+
58
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable
59
+ and royalty-free limited license under Meta’s intellectual property or other rights
60
+ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy,
61
+ create derivative works of, and make modifications to the Llama Materials.  
62
+
63
+ b. Redistribution and Use.  
64
+
65
+ i. If you distribute or make available the Llama Materials (or any derivative works
66
+ thereof), or a product or service (including another AI model) that contains any
67
+ of them, you shall (A) provide a copy of this Agreement with any such Llama Materials;
68
+ and (B) prominently display "Built with Llama" on a related website, user interface,
69
+ blogpost, about page, or product documentation. If you use the Llama Materials or
70
+ any outputs or results of the Llama Materials to create, train, fine tune, or otherwise
71
+ improve an AI model, which is distributed or made available, you shall also include
72
+ "Llama" at the beginning of any such AI model name.
73
+
74
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee
75
+ as part of an integrated end user product, then Section 2 of this Agreement will
76
+ not apply to you. 
77
+
78
+ iii. You must retain in all copies of the Llama Materials that you distribute the
79
+ following attribution notice within a "Notice" text file distributed as a part of
80
+ such copies: "Llama 4 is licensed under the Llama 4 Community License, Copyright
81
+ © Meta Platforms, Inc. All Rights Reserved."
82
+
83
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
84
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use
85
+ Policy for the Llama Materials (available at [https://www.llama.com/llama4/use-policy](https://www.llama.com/llama4/use-policy)),
86
+ which is hereby incorporated by reference into this Agreement.    2\. **Additional
87
+ Commercial Terms**. If, on the Llama 4 version release date, the monthly active
88
+ users of the products or services made available by or for Licensee, or Licensee’s
89
+ affiliates, is greater than 700 million monthly active users in the preceding calendar
90
+ month, you must request a license from Meta, which Meta may grant to you in its
91
+ sole discretion, and you are not authorized to exercise any of the rights under
92
+ this Agreement unless or until Meta otherwise expressly grants you such rights.
93
+
94
+ 3**. Disclaimer of Warranty**. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS
95
+ AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES
96
+ OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
97
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY,
98
+ OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING
99
+ THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY
100
+ RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
101
+
102
+ 4\. **Limitation of Liability**. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE
103
+ UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY,
104
+ OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT,
105
+ SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META
106
+ OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
107
+
108
+ 5\. **Intellectual Property**.
109
+
110
+ a. No trademark licenses are granted under this Agreement, and in connection with
111
+ the Llama Materials, neither Meta nor Licensee may use any name or mark owned by
112
+ or associated with the other or any of its affiliates, except as required for reasonable
113
+ and customary use in describing and redistributing the Llama Materials or as set
114
+ forth in this Section 5(a). Meta hereby grants you a license to use "Llama" (the
115
+ "Mark") solely as required to comply with the last sentence of Section 1.b.i. You
116
+ will comply with Meta’s brand guidelines (currently accessible at [https://about.meta.com/brand/resources/meta/company-brand/](https://about.meta.com/brand/resources/meta/company-brand/)[)](https://en.facebookbrand.com/).
117
+ All goodwill arising out of your use of the Mark will inure to the benefit of Meta.
118
+
119
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for
120
+ Meta, with respect to any derivative works and modifications of the Llama Materials
121
+ that are made by you, as between you and Meta, you are and will be the owner of
122
+ such derivative works and modifications.
123
+
124
+ c. If you institute litigation or other proceedings against Meta or any entity (including
125
+ a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or
126
+ Llama 4 outputs or results, or any portion of any of the foregoing, constitutes
127
+ infringement of intellectual property or other rights owned or licensable by you,
128
+ then any licenses granted to you under this Agreement shall terminate as of the
129
+ date such litigation or claim is filed or instituted. You will indemnify and hold
130
+ harmless Meta from and against any claim by any third party arising out of or related
131
+ to your use or distribution of the Llama Materials.
132
+
133
+ 6\. **Term and Termination**. The term of this Agreement will commence upon your
134
+ acceptance of this Agreement or access to the Llama Materials and will continue
135
+ in full force and effect until terminated in accordance with the terms and conditions
136
+ herein. Meta may terminate this Agreement if you are in breach of any term or condition
137
+ of this Agreement. Upon termination of this Agreement, you shall delete and cease
138
+ use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of
139
+ this Agreement. 
140
+
141
+ 7\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed
142
+ under the laws of the State of California without regard to choice of law principles,
143
+ and the UN Convention on Contracts for the International Sale of Goods does not
144
+ apply to this Agreement. The courts of California shall have exclusive jurisdiction
145
+ of any dispute arising out of this Agreement.'
146
+ extra_gated_fields:
147
+ First Name: text
148
+ Last Name: text
149
+ Date of birth: date_picker
150
+ Country: country
151
+ Affiliation: text
152
+ Job title:
153
+ type: select
154
+ options:
155
+ - Student
156
+ - Research Graduate
157
+ - AI researcher
158
+ - AI developer/engineer
159
+ - Reporter
160
+ - Other
161
+ geo: ip_location
162
+ ? By clicking Submit below I accept the terms of the license and acknowledge that
163
+ the information I provide will be collected stored processed and shared in accordance
164
+ with the Meta Privacy Policy
165
+ : checkbox
166
+ extra_gated_description: The information you provide will be collected, stored, processed
167
+ and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).
168
+ extra_gated_button_content: Submit
169
+ extra_gated_heading: Please be sure to provide your full legal name, date of birth,
170
+ and full organization name with all corporate identifiers. Avoid the use of acronyms
171
+ and special characters. Failure to follow these instructions may prevent you from
172
+ accessing this model and others on Hugging Face. You will not have the ability to
173
+ edit this form after submission, so please ensure all information is accurate.
174
+ license: other
175
+ license_name: llama4
176
+ ---
177
+
178
+ # mlx-community/Llama-4-Scout-17B-16E-Instruct-4bit
179
+ This model was converted to MLX format from [`meta-llama/Llama-4-Scout-17B-16E-Instruct`]() using mlx-vlm version **0.1.21**.
180
+ Refer to the [original model card](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) for more details on the model.
181
+ ## Use with mlx
182
+
183
+ ```bash
184
+ pip install -U mlx-vlm
185
+ ```
186
+
187
+ ```bash
188
+ python -m mlx_vlm.generate --model mlx-community/Llama-4-Scout-17B-16E-Instruct-4bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>
189
+ ```
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n {%- if strftime_now is defined %}\n {%- set date_string = strftime_now(\"%d %b %Y\") %}\n {%- else %}\n {%- set date_string = \"26 Jul 2024\" %}\n {%- endif %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %} \n {%- if messages[0]['content'] is string %}\n {%- set system_message = messages[0]['content']|trim %}\n {%- else %}\n {#- FIXME: The processor requires an array, always. #}\n {%- set system_message = messages[0]['content'][0]['text']|trim %}\n {%- endif %}\n {%- set messages = messages[1:] %}\n {%- set user_supplied_system_message = true %}\n{%- else %}\n {%- set system_message = \"\" %}\n {%- set user_supplied_system_message = false %}\n{%- endif %}\n\n{#- System message if the user supplied one #}\n{%- if user_supplied_system_message %}\n {{- \"<|header_start|>system<|header_end|>\\n\\n\" }}\n {%- if tools is not none %}\n {{- \"Environment: ipython\\n\" }}\n {%- endif %}\n {%- if tools is not none and not tools_in_user_message %}\n {{- \"You have access to the following functions. To call a function, please respond with JSON for a function call.\" }}\n {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n {{- \"Do not use variables.\\n\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n {%- endif %}\n {{- system_message }}\n {{- \"<|eot|>\" }}\n{%- endif %}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n {#- Extract the first user message so we can plug it in here #}\n {%- if messages | length != 0 %}\n {%- set first_user_message = messages[0]['content']|trim %}\n {%- set messages = messages[1:] %}\n {%- else %}\n {{- raise_exception(\"Cannot put tools in the first user message when there's no first user message!\") }}\n{%- endif %}\n {{- '<|header_start|>user<|header_end|>\\n\\n' -}}\n {{- \"Given the following functions, please respond with a JSON for a function call \" }}\n {{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n {{- \"Do not use variables.\\n\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n {{- first_user_message + \"<|eot|>\"}}\n{%- endif %}\n\n{%- for message in messages %}\n {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}\n {{- '<|header_start|>' + message['role'] + '<|header_end|>\\n\\n' }}\n {%- if message['content'] is string %}\n {{- message['content'] }}\n {%- else %}\n {%- for content in message['content'] %}\n {%- if content['type'] == 'image' %}\n {{- '<|image|>' }}\n {%- elif content['type'] == 'text' %}\n {{- content['text'] }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- \"<|eot|>\" }}\n {%- elif 'tool_calls' in message and message.tool_calls|length > 0 %}\n {{- '<|header_start|>assistant<|header_end|>\\n\\n' -}}\n {{- '<|python_start|>' }}\n {%- if message['content'] is string %}\n {{- message['content'] }}\n {%- else %}\n {%- for content in message['content'] %}\n {%- if content['type'] == 'image' %}\n {{- '<|image|>' }}\n {%- elif content['type'] == 'text' %}\n {{- content['text'] }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '<|python_end|>' }}\n {%- for tool_call in message.tool_calls %}\n {{- '{\"name\": \"' + tool_call.function.name + '\", ' }}\n {{- '\"parameters\": ' }}\n {{- tool_call.function.arguments | tojson }}\n {{- \"}\" }}\n {%- endfor %}\n {{- \"<|eot|>\" }}\n {%- elif message.role == \"tool\" or message.role == \"ipython\" %}\n {{- \"<|header_start|>ipython<|header_end|>\\n\\n\" }}\n {%- if message.content is mapping or message.content is iterable %}\n {{- message.content | tojson }}\n {%- else %}\n {{- message.content }}\n {%- endif %}\n {{- \"<|eot|>\" }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|header_start|>assistant<|header_end|>\\n\\n' }}\n{%- endif %}\n"
3
+ }
config.json ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": false,
3
+ "add_cross_attention": false,
4
+ "architectures": [
5
+ "Llama4ForConditionalGeneration"
6
+ ],
7
+ "bad_words_ids": null,
8
+ "begin_suppress_tokens": null,
9
+ "boi_token_index": 200080,
10
+ "bos_token_id": null,
11
+ "chunk_size_feed_forward": 0,
12
+ "cross_attention_hidden_size": null,
13
+ "decoder_start_token_id": null,
14
+ "diversity_penalty": 0.0,
15
+ "do_sample": false,
16
+ "early_stopping": false,
17
+ "encoder_no_repeat_ngram_size": 0,
18
+ "eoi_token_index": 200081,
19
+ "eos_token_id": null,
20
+ "exponential_decay_length_penalty": null,
21
+ "finetuning_task": null,
22
+ "forced_bos_token_id": null,
23
+ "forced_eos_token_id": null,
24
+ "id2label": {
25
+ "0": "LABEL_0",
26
+ "1": "LABEL_1"
27
+ },
28
+ "image_token_index": 200092,
29
+ "is_decoder": false,
30
+ "is_encoder_decoder": false,
31
+ "label2id": {
32
+ "LABEL_0": 0,
33
+ "LABEL_1": 1
34
+ },
35
+ "length_penalty": 1.0,
36
+ "max_length": 20,
37
+ "min_length": 0,
38
+ "model_type": "llama4",
39
+ "no_repeat_ngram_size": 0,
40
+ "num_beam_groups": 1,
41
+ "num_beams": 1,
42
+ "num_return_sequences": 1,
43
+ "output_attentions": false,
44
+ "output_hidden_states": false,
45
+ "output_scores": false,
46
+ "pad_token_id": null,
47
+ "prefix": null,
48
+ "problem_type": null,
49
+ "pruned_heads": {},
50
+ "quantization": {
51
+ "group_size": 64,
52
+ "bits": 4
53
+ },
54
+ "remove_invalid_values": false,
55
+ "repetition_penalty": 1.0,
56
+ "return_dict": true,
57
+ "return_dict_in_generate": false,
58
+ "sep_token_id": null,
59
+ "suppress_tokens": null,
60
+ "task_specific_params": null,
61
+ "temperature": 1.0,
62
+ "text_config": {
63
+ "return_dict": true,
64
+ "output_hidden_states": false,
65
+ "output_attentions": false,
66
+ "torchscript": false,
67
+ "torch_dtype": "bfloat16",
68
+ "use_bfloat16": false,
69
+ "tf_legacy_loss": false,
70
+ "pruned_heads": {},
71
+ "tie_word_embeddings": false,
72
+ "chunk_size_feed_forward": 0,
73
+ "is_encoder_decoder": false,
74
+ "is_decoder": false,
75
+ "cross_attention_hidden_size": null,
76
+ "add_cross_attention": false,
77
+ "tie_encoder_decoder": false,
78
+ "max_length": 20,
79
+ "min_length": 0,
80
+ "do_sample": false,
81
+ "early_stopping": false,
82
+ "num_beams": 1,
83
+ "num_beam_groups": 1,
84
+ "diversity_penalty": 0.0,
85
+ "temperature": 1.0,
86
+ "top_k": 50,
87
+ "top_p": 1.0,
88
+ "typical_p": 1.0,
89
+ "repetition_penalty": 1.0,
90
+ "length_penalty": 1.0,
91
+ "no_repeat_ngram_size": 0,
92
+ "encoder_no_repeat_ngram_size": 0,
93
+ "bad_words_ids": null,
94
+ "num_return_sequences": 1,
95
+ "output_scores": false,
96
+ "return_dict_in_generate": false,
97
+ "forced_bos_token_id": null,
98
+ "forced_eos_token_id": null,
99
+ "remove_invalid_values": false,
100
+ "exponential_decay_length_penalty": null,
101
+ "suppress_tokens": null,
102
+ "begin_suppress_tokens": null,
103
+ "architectures": null,
104
+ "finetuning_task": null,
105
+ "id2label": {
106
+ "0": "LABEL_0",
107
+ "1": "LABEL_1"
108
+ },
109
+ "label2id": {
110
+ "LABEL_0": 0,
111
+ "LABEL_1": 1
112
+ },
113
+ "tokenizer_class": null,
114
+ "prefix": null,
115
+ "bos_token_id": 200000,
116
+ "pad_token_id": 200018,
117
+ "eos_token_id": [
118
+ 200001,
119
+ 200007,
120
+ 200008
121
+ ],
122
+ "sep_token_id": null,
123
+ "decoder_start_token_id": null,
124
+ "task_specific_params": null,
125
+ "problem_type": null,
126
+ "_name_or_path": "",
127
+ "_attn_implementation_autoset": true,
128
+ "attention_bias": false,
129
+ "for_llm_compressor": false,
130
+ "model_type": "llama4_text",
131
+ "attn_temperature_tuning": 4,
132
+ "attn_scale": 0.1,
133
+ "floor_scale": 8192,
134
+ "vocab_size": 202048,
135
+ "max_position_embeddings": 10485760,
136
+ "hidden_size": 5120,
137
+ "intermediate_size": 8192,
138
+ "intermediate_size_mlp": 16384,
139
+ "num_hidden_layers": 48,
140
+ "num_attention_heads": 40,
141
+ "rope_scaling": {
142
+ "factor": 8.0,
143
+ "high_freq_factor": 4.0,
144
+ "low_freq_factor": 1.0,
145
+ "original_max_position_embeddings": 8192,
146
+ "rope_type": "llama3"
147
+ },
148
+ "num_key_value_heads": 8,
149
+ "hidden_act": "silu",
150
+ "initializer_range": 0.02,
151
+ "rms_norm_eps": 1e-05,
152
+ "use_cache": true,
153
+ "rope_theta": 500000.0,
154
+ "attention_dropout": 0.0,
155
+ "head_dim": 128,
156
+ "use_qk_norm": true,
157
+ "num_experts_per_tok": 1,
158
+ "num_local_experts": 16,
159
+ "output_router_logits": false,
160
+ "router_aux_loss_coef": 0.001,
161
+ "router_jitter_noise": 0.0,
162
+ "no_rope_layers": [
163
+ 1,
164
+ 1,
165
+ 1,
166
+ 0,
167
+ 1,
168
+ 1,
169
+ 1,
170
+ 0,
171
+ 1,
172
+ 1,
173
+ 1,
174
+ 0,
175
+ 1,
176
+ 1,
177
+ 1,
178
+ 0,
179
+ 1,
180
+ 1,
181
+ 1,
182
+ 0,
183
+ 1,
184
+ 1,
185
+ 1,
186
+ 0,
187
+ 1,
188
+ 1,
189
+ 1,
190
+ 0,
191
+ 1,
192
+ 1,
193
+ 1,
194
+ 0,
195
+ 1,
196
+ 1,
197
+ 1,
198
+ 0,
199
+ 1,
200
+ 1,
201
+ 1,
202
+ 0,
203
+ 1,
204
+ 1,
205
+ 1,
206
+ 0,
207
+ 1,
208
+ 1,
209
+ 1,
210
+ 0
211
+ ],
212
+ "interleave_moe_layer_step": 1,
213
+ "moe_layers": [
214
+ 0,
215
+ 1,
216
+ 2,
217
+ 3,
218
+ 4,
219
+ 5,
220
+ 6,
221
+ 7,
222
+ 8,
223
+ 9,
224
+ 10,
225
+ 11,
226
+ 12,
227
+ 13,
228
+ 14,
229
+ 15,
230
+ 16,
231
+ 17,
232
+ 18,
233
+ 19,
234
+ 20,
235
+ 21,
236
+ 22,
237
+ 23,
238
+ 24,
239
+ 25,
240
+ 26,
241
+ 27,
242
+ 28,
243
+ 29,
244
+ 30,
245
+ 31,
246
+ 32,
247
+ 33,
248
+ 34,
249
+ 35,
250
+ 36,
251
+ 37,
252
+ 38,
253
+ 39,
254
+ 40,
255
+ 41,
256
+ 42,
257
+ 43,
258
+ 44,
259
+ 45,
260
+ 46,
261
+ 47
262
+ ],
263
+ "attention_chunk_size": 8192
264
+ },
265
+ "tf_legacy_loss": false,
266
+ "tie_encoder_decoder": false,
267
+ "tie_word_embeddings": false,
268
+ "tokenizer_class": null,
269
+ "top_k": 50,
270
+ "top_p": 1.0,
271
+ "torch_dtype": "bfloat16",
272
+ "torchscript": false,
273
+ "transformers_version": "4.51.0",
274
+ "typical_p": 1.0,
275
+ "use_bfloat16": false,
276
+ "vision_config": {
277
+ "hidden_size": 1408,
278
+ "hidden_act": "gelu",
279
+ "num_hidden_layers": 34,
280
+ "num_channels": 3,
281
+ "intermediate_size": 5632,
282
+ "image_size": 336,
283
+ "vision_output_dim": 4096,
284
+ "patch_size": 14,
285
+ "norm_eps": 1e-05,
286
+ "num_attention_heads": 16,
287
+ "initializer_range": 0.02,
288
+ "pixel_shuffle_ratio": 0.5,
289
+ "projector_input_dim": 4096,
290
+ "projector_output_dim": 4096,
291
+ "multi_modal_projector_bias": false,
292
+ "projector_dropout": 0.0,
293
+ "attention_dropout": 0.0,
294
+ "vision_feature_layer": -1,
295
+ "vision_feature_select_strategy": "default",
296
+ "rope_theta": 10000,
297
+ "return_dict": true,
298
+ "output_hidden_states": false,
299
+ "output_attentions": false,
300
+ "torchscript": false,
301
+ "torch_dtype": null,
302
+ "use_bfloat16": false,
303
+ "tf_legacy_loss": false,
304
+ "pruned_heads": {},
305
+ "tie_word_embeddings": true,
306
+ "chunk_size_feed_forward": 0,
307
+ "is_encoder_decoder": false,
308
+ "is_decoder": false,
309
+ "cross_attention_hidden_size": null,
310
+ "add_cross_attention": false,
311
+ "tie_encoder_decoder": false,
312
+ "max_length": 20,
313
+ "min_length": 0,
314
+ "do_sample": false,
315
+ "early_stopping": false,
316
+ "num_beams": 1,
317
+ "num_beam_groups": 1,
318
+ "diversity_penalty": 0.0,
319
+ "temperature": 1.0,
320
+ "top_k": 50,
321
+ "top_p": 1.0,
322
+ "typical_p": 1.0,
323
+ "repetition_penalty": 1.0,
324
+ "length_penalty": 1.0,
325
+ "no_repeat_ngram_size": 0,
326
+ "encoder_no_repeat_ngram_size": 0,
327
+ "bad_words_ids": null,
328
+ "num_return_sequences": 1,
329
+ "output_scores": false,
330
+ "return_dict_in_generate": false,
331
+ "forced_bos_token_id": null,
332
+ "forced_eos_token_id": null,
333
+ "remove_invalid_values": false,
334
+ "exponential_decay_length_penalty": null,
335
+ "suppress_tokens": null,
336
+ "begin_suppress_tokens": null,
337
+ "architectures": null,
338
+ "finetuning_task": null,
339
+ "id2label": {
340
+ "0": "LABEL_0",
341
+ "1": "LABEL_1"
342
+ },
343
+ "label2id": {
344
+ "LABEL_0": 0,
345
+ "LABEL_1": 1
346
+ },
347
+ "tokenizer_class": null,
348
+ "prefix": null,
349
+ "bos_token_id": null,
350
+ "pad_token_id": null,
351
+ "eos_token_id": null,
352
+ "sep_token_id": null,
353
+ "decoder_start_token_id": null,
354
+ "task_specific_params": null,
355
+ "problem_type": null,
356
+ "_name_or_path": "",
357
+ "_attn_implementation_autoset": true,
358
+ "model_type": "llama4_vision_model"
359
+ }
360
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 200000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 200001,
6
+ 200007,
7
+ 200008
8
+ ],
9
+ "pad_token_id": 200018,
10
+ "temperature": 0.6,
11
+ "top_p": 0.9,
12
+ "transformers_version": "4.51.0.dev0"
13
+ }
model-00001-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cfb79522bf1da7be7e6267654549a2b3e67c904367c8ea74a6aa9220d1cb696
3
+ size 5204727563
model-00002-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9333f87f8a0a22bacbabff096e4f8aa9c58e7ef20d63c78b96f9370236b7f2b
3
+ size 5332295001
model-00003-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db0870cd659f0526a88e570094ff4a7caff6a7c675c21c906d81b92eac039076
3
+ size 5355934969
model-00004-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e98a097aecbcf23366cbb180084e73d0506216bbdff544f92d54cfd9e25c36c
3
+ size 5037405874
model-00005-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:529c5f7b91c16be44e9bde8d23ec96a01788f512956d72b146d706152a1d0c48
3
+ size 5332295184
model-00006-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cde0ea88a3c910ba2347a5b817bb7614b2ecb281b0dfeafe3e9bc1ac00a24a5
3
+ size 5332295188
model-00007-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:049b36a8cc2f854f59586896c0500e3626c4c7ec72da1304f527ef85ef333ff0
3
+ size 5355935038
model-00008-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52e540204eaafaa897f998b1e6dc252aa8878e6d15ab756f419d4d84704e8081
3
+ size 5037405874
model-00009-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee1d32b9a17e34a2026d47b6eab2bffe5906e4d0a989fa5bdb45943d1ecb27ce
3
+ size 5332295182
model-00010-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6756290319fbb6729879fd87905632b9a1a165f34935f37ed0f651c8898b0592
3
+ size 5332295138
model-00011-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0aa091d9b3f81e6e478cc9da5cd0145bc8e90f1811005969ba63ee8d93ccfbce
3
+ size 5355935048
model-00012-of-00012.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f582c75b0fad85b8f598cf2c5bb4891466eb8265682da98ed2e69962c5d56dce
3
+ size 3106520161
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": null,
3
+ "data_format": "channels_first",
4
+ "default_to_square": true,
5
+ "device": null,
6
+ "do_center_crop": null,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.5,
13
+ 0.5,
14
+ 0.5
15
+ ],
16
+ "image_processor_type": "Llama4ImageProcessorFast",
17
+ "image_std": [
18
+ 0.5,
19
+ 0.5,
20
+ 0.5
21
+ ],
22
+ "input_data_format": null,
23
+ "max_patches": 16,
24
+ "processor_class": "Llama4Processor",
25
+ "resample": 2,
26
+ "rescale_factor": 0.00392156862745098,
27
+ "resize_to_max_canvas": false,
28
+ "return_tensors": null,
29
+ "size": {
30
+ "height": 336,
31
+ "width": 336
32
+ }
33
+ }
processor_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "fake_image_token": "<|image|>",
3
+ "image_token": "<|image|>",
4
+ "patch_size": 14,
5
+ "processor_class": "Llama4Processor"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|eot|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|finetune_right_pad_id|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:172c9eb4beafc72601690da3ccfcede5c2e6806a8d5ec1fca33e22acea8023a4
3
+ size 27948578
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff