alexmarques commited on
Commit
f9f848f
·
verified ·
1 Parent(s): 2d9653b

Add vision evals

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -334,6 +334,26 @@ Non-coding tasks were evaluated with [lm-evaluation-harness](https://github.com/
334
  --batch_size auto
335
  ```
336
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337
  **Coding**
338
 
339
  The commands below can be used for mbpp by simply replacing the dataset name.
@@ -366,7 +386,6 @@ evalplus.evaluate \
366
 
367
  ### Accuracy
368
 
369
- #### Open LLM Leaderboard evaluation scores
370
  <table>
371
  <tr>
372
  <th>Category
@@ -526,5 +545,27 @@ evalplus.evaluate \
526
  <td>100.7%
527
  </td>
528
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
529
  </table>
530
 
 
334
  --batch_size auto
335
  ```
336
 
337
+ **MMMU**
338
+ ```
339
+ lm_eval \
340
+ --model vllm \
341
+ --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w8a8",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
342
+ --tasks mmmu \
343
+ --apply_chat_template\
344
+ --batch_size auto
345
+ ```
346
+
347
+ **ChartQA**
348
+ ```
349
+ lm_eval \
350
+ --model vllm \
351
+ --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w8a8",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
352
+ --tasks chartqa \
353
+ --apply_chat_template\
354
+ --batch_size auto
355
+ ```
356
+
357
  **Coding**
358
 
359
  The commands below can be used for mbpp by simply replacing the dataset name.
 
386
 
387
  ### Accuracy
388
 
 
389
  <table>
390
  <tr>
391
  <th>Category
 
545
  <td>100.7%
546
  </td>
547
  </tr>
548
+ <tr>
549
+ <td rowspan="2" ><strong>Vision</strong>
550
+ </td>
551
+ <td>MMMU (0-shot)
552
+ </td>
553
+ <td>52.11
554
+ </td>
555
+ <td>53.11
556
+ </td>
557
+ <td>101.9%
558
+ </td>
559
+ </tr>
560
+ <tr>
561
+ <td>ChartQA (0-shot)
562
+ </td>
563
+ <td>81.36
564
+ </td>
565
+ <td>82.36
566
+ </td>
567
+ <td>101.2%
568
+ </td>
569
+ </tr>
570
  </table>
571