Spaces:
Sleeping
Sleeping
File size: 4,832 Bytes
5fdb69e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
{
"cells": [
{
"cell_type": "markdown",
"id": "00f05a05-d989-4bf7-b1f1-9418e25ecd58",
"metadata": {},
"source": [
"# The Product Pricer Continued\n",
"\n",
"I tested numerous frontier models from OpenAI, Anthropic, Google, and others via Groq API.\n",
"\n",
"Here are the results of all tests including ones from Day 3 and how the frontier models stacked up.\n",
"\n",
"They are ordered by Error from best to worst.\n",
"\n",
"I ran each model once on 2025-03-09.\n",
"\n",
"Main repo at [https://github.com/kellewic/llm](https://github.com/kellewic/llm)"
]
},
{
"cell_type": "markdown",
"id": "a69cc81a-e582-4d04-8e12-fd83e120a7d1",
"metadata": {},
"source": [
"| Rank | Model | Error ($) | RMSLE | Hits (%) | Chart Link |\n",
"|------|-----------------------------------|-----------|-------|----------|------------|\n",
"| 1 | **gemini-2.0-flash** | 73.48 | 0.56 | 56.4% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/gemini-2.0-flash.png) |\n",
"| 2 | **gpt-4o-2024-08-06** | 75.66 | 0.89 | 57.6% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/gpt-4o-2024-08-06.png) |\n",
"| 3 | **gemini-2.0-flash-lite** | 76.42 | 0.61 | 56.0% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/gemini-2.0-flash-lite.png) |\n",
"| 4 | **gpt-4o-mini (original)** | 81.61 | 0.60 | 51.6% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/gpt-4o-mini.png) |\n",
"| 5 | **claude-3-5-haiku-20241022** | 85.25 | 0.62 | 50.8% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-5-haiku-20241022.png) |\n",
"| 6 | **claude-3-5-sonnet-20241022** | 88.97 | 0.61 | 49.2% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-5-sonnet-20241022.png) |\n",
"| 7 | **claude-3-7-sonnet-20250219** | 89.41 | 0.62 | 55.2% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-7-sonnet-20250219.png) |\n",
"| 8 | **mistral-saba-24b** | 98.02 | 0.82 | 44.8% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/mistral-saba-24b.png) |\n",
"| 9 | **llama-3.3-70b-versatile** | 98.24 | 0.70 | 44.8% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/llama-3.3-70b-versatile.png) |\n",
"| 10 | **GPT-4o-mini (fine-tuned)** | 101.49 | 0.81 | 41.2% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_tuning/gpt_fine_tuned.png) |\n",
"| 11 | **Random Forest Regressor** | 105.10 | 0.89 | 37.6% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/random_forest_pricer.png) |\n",
"| 12 | **deepseek-r1-distill-llama-70b** | 109.09 | 0.67 | 48.4% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/deepseek-r1-distill-llama-70b.png) |\n",
"| 13 | **Linear SVR** | 110.91 | 0.92 | 29.2% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/svr_pricer.png) |\n",
"| 14 | **Word2Vec LR** | 113.14 | 1.05 | 22.8% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/word2vec_lr_pricer.png) |\n",
"| 15 | **Bag of Words LR** | 113.60 | 0.99 | 24.8% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/bow_lr_pricer.png) |\n",
"| 16 | **Human Performance** | 126.55 | 1.00 | 32.0% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/human_pricer.png) |\n",
"| 17 | **Average** | 137.17 | 1.19 | 15.2% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/average_pricer.png) |\n",
"| 18 | **Linear Regression** | 139.20 | 1.17 | 15.6% | [π](https://github.com/kellewic/llm/blob/main/basic_model_training/linear_regression_pricer.png) |\n",
"| 19 | **deepseek-r1-distill-qwen-32b** | 151.59 | 0.80 | 38.4% | [π](https://github.com/kellewic/llm/blob/main/frontier_model_test/deepseek-r1-distill-qwen-32b.png) |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|