Spaces:

mamogasr
/

llm_engineering

Sleeping

File size: 4,832 Bytes

5fdb69e

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "00f05a05-d989-4bf7-b1f1-9418e25ecd58",
   "metadata": {},
   "source": [
    "# The Product Pricer Continued\n",
    "\n",
    "I tested numerous frontier models from OpenAI, Anthropic, Google, and others via Groq API.\n",
    "\n",
    "Here are the results of all tests including ones from Day 3 and how the frontier models stacked up.\n",
    "\n",
    "They are ordered by Error from best to worst.\n",
    "\n",
    "I ran each model once on 2025-03-09.\n",
    "\n",
    "Main repo at [https://github.com/kellewic/llm](https://github.com/kellewic/llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a69cc81a-e582-4d04-8e12-fd83e120a7d1",
   "metadata": {},
   "source": [
    "| Rank | Model                             | Error ($) | RMSLE | Hits (%) | Chart Link |\n",
    "|------|-----------------------------------|-----------|-------|----------|------------|\n",
    "| 1    | **gemini-2.0-flash**              | 73.48     | 0.56  | 56.4%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/gemini-2.0-flash.png) |\n",
    "| 2    | **gpt-4o-2024-08-06**             | 75.66     | 0.89  | 57.6%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/gpt-4o-2024-08-06.png) |\n",
    "| 3    | **gemini-2.0-flash-lite**         | 76.42     | 0.61  | 56.0%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/gemini-2.0-flash-lite.png) |\n",
    "| 4    | **gpt-4o-mini (original)**        | 81.61     | 0.60  | 51.6%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/gpt-4o-mini.png) |\n",
    "| 5    | **claude-3-5-haiku-20241022**     | 85.25     | 0.62  | 50.8%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-5-haiku-20241022.png) |\n",
    "| 6    | **claude-3-5-sonnet-20241022**    | 88.97     | 0.61  | 49.2%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-5-sonnet-20241022.png) |\n",
    "| 7    | **claude-3-7-sonnet-20250219**    | 89.41     | 0.62  | 55.2%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/claude-3-7-sonnet-20250219.png) |\n",
    "| 8    | **mistral-saba-24b**              | 98.02     | 0.82  | 44.8%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/mistral-saba-24b.png) |\n",
    "| 9    | **llama-3.3-70b-versatile**       | 98.24     | 0.70  | 44.8%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/llama-3.3-70b-versatile.png) |\n",
    "| 10   | **GPT-4o-mini (fine-tuned)**      | 101.49    | 0.81  | 41.2%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_tuning/gpt_fine_tuned.png) |\n",
    "| 11   | **Random Forest Regressor**       | 105.10    | 0.89  | 37.6%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/random_forest_pricer.png) |\n",
    "| 12   | **deepseek-r1-distill-llama-70b** | 109.09    | 0.67  | 48.4%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/deepseek-r1-distill-llama-70b.png) |\n",
    "| 13   | **Linear SVR**                    | 110.91    | 0.92  | 29.2%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/svr_pricer.png) |\n",
    "| 14   | **Word2Vec LR**                   | 113.14    | 1.05  | 22.8%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/word2vec_lr_pricer.png) |\n",
    "| 15   | **Bag of Words LR**               | 113.60    | 0.99  | 24.8%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/bow_lr_pricer.png) |\n",
    "| 16   | **Human Performance**             | 126.55    | 1.00  | 32.0%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/human_pricer.png) |\n",
    "| 17   | **Average**                       | 137.17    | 1.19  | 15.2%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/average_pricer.png) |\n",
    "| 18   | **Linear Regression**             | 139.20    | 1.17  | 15.6%    | [📊](https://github.com/kellewic/llm/blob/main/basic_model_training/linear_regression_pricer.png) |\n",
    "| 19   | **deepseek-r1-distill-qwen-32b**  | 151.59    | 0.80  | 38.4%    | [📊](https://github.com/kellewic/llm/blob/main/frontier_model_test/deepseek-r1-distill-qwen-32b.png) |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}