Spaces:

mamogasr
/

llm_engineering

Sleeping

File size: 11,030 Bytes

5fdb69e

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "db8736a7-ed94-441c-9556-831fa57b5a10",
   "metadata": {},
   "source": [
    "# The Product Pricer Continued\n",
    "\n",
    "A model that can estimate how much something costs, from its description.\n",
    "\n",
    "## Enter The Frontier!\n",
    "\n",
    "And now - we put Frontier Models to the test.\n",
    "\n",
    "### 2 important points:\n",
    "\n",
    "It's important to appreciate that we aren't Training the frontier models. We're only providing them with the Test dataset to see how they perform. They don't gain the benefit of the 400,000 training examples that we provided to the Traditional ML models.\n",
    "\n",
    "HAVING SAID THAT...\n",
    "\n",
    "It's entirely possible that in their monstrously large training data, they've already been exposed to all the products in the training AND the test set. So there could be test \"contamination\" here which gives them an unfair advantage. We should keep that in mind."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "681c717b-4c24-4ac3-a5f3-3c5881d6e70a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import os\n",
    "import re\n",
    "import math\n",
    "import json\n",
    "import random\n",
    "from dotenv import load_dotenv\n",
    "from huggingface_hub import login\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pickle\n",
    "from collections import Counter\n",
    "from openai import OpenAI\n",
    "from anthropic import Anthropic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36d05bdc-0155-4c72-a7ee-aa4e614ffd3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# environment\n",
    "\n",
    "load_dotenv(override=True)\n",
    "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
    "os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')\n",
    "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4dd3aad2-6f99-433c-8792-e461d2f06622",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Log in to HuggingFace\n",
    "\n",
    "hf_token = os.environ['HF_TOKEN']\n",
    "login(hf_token, add_to_git_credential=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6985bdc7-fa45-49a3-ae97-84bdeb9b2083",
   "metadata": {},
   "outputs": [],
   "source": [
    "# moved our Tester into a separate package\n",
    "# call it with Tester.test(function_name, test_dataset)\n",
    "\n",
    "from items import Item\n",
    "from testing import Tester"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0a6fb86-74a4-403c-ab25-6db2d74e9d2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()\n",
    "claude = Anthropic()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c830ed3e-24ee-4af6-a07b-a1bfdcd39278",
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c9b05f4-c9eb-462c-8d86-de9140a2d985",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's avoid curating all our data again! Load in the pickle files:\n",
    "\n",
    "with open('train.pkl', 'rb') as file:\n",
    "    train = pickle.load(file)\n",
    "\n",
    "with open('test.pkl', 'rb') as file:\n",
    "    test = pickle.load(file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5856173-e68c-4975-a769-5f1736e227a5",
   "metadata": {},
   "source": [
    "# Before we look at the Frontier\n",
    "\n",
    "## There is one more model we could consider"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3e81ee0-828a-4af8-9ccf-177af6c78a0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write the test set to a CSV\n",
    "\n",
    "import csv\n",
    "with open('human_input.csv', 'w', encoding=\"utf-8\") as csvfile:\n",
    "    writer = csv.writer(csvfile)\n",
    "    for t in test[:250]:\n",
    "        writer.writerow([t.test_prompt(), 0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aeafac31-1a10-4029-b190-030378e2fe01",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read it back in\n",
    "\n",
    "human_predictions = []\n",
    "with open('human_output.csv', 'r', encoding=\"utf-8\") as csvfile:\n",
    "    reader = csv.reader(csvfile)\n",
    "    for row in reader:\n",
    "        human_predictions.append(float(row[1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9709da2-28f0-419e-af71-4ef6c02246ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "def human_pricer(item):\n",
    "    idx = test.index(item)\n",
    "    return human_predictions[idx]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1ba3b3e-4b08-4f0b-9e51-ebb03a86085d",
   "metadata": {},
   "outputs": [],
   "source": [
    "Tester.test(human_pricer, test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "066fef03-8338-4526-9df3-89b649ad4f0a",
   "metadata": {},
   "source": [
    "## First, the humble but mighty GPT-4o-mini\n",
    "\n",
    "It's called mini, but it packs a punch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66ea68e8-ab1b-4f0d-aba4-a59574d8f85e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First let's work on a good prompt for a Frontier model\n",
    "# Notice that I'm removing the \" to the nearest dollar\"\n",
    "# When we train our own models, we'll need to make the problem as easy as possible, \n",
    "# but a Frontier model needs no such simplification.\n",
    "\n",
    "def messages_for(item):\n",
    "    system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n",
    "    user_prompt = item.test_prompt().replace(\" to the nearest dollar\",\"\").replace(\"\\n\\nPrice is $\",\"\")\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_message},\n",
    "        {\"role\": \"user\", \"content\": user_prompt},\n",
    "        {\"role\": \"assistant\", \"content\": \"Price is $\"}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "add7bc0a-71fb-49cc-a49b-9548fd0fe949",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ff92d61-0d27-4b0d-8b32-c9891016509b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Try this out\n",
    "\n",
    "messages_for(test[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1af1888-f94a-4106-b0d8-8a70939eec4e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A utility function to extract the price from a string\n",
    "\n",
    "def get_price(s):\n",
    "    s = s.replace('$','').replace(',','')\n",
    "    match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n",
    "    return float(match.group()) if match else 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f138c5b7-bcc1-4085-aced-68dad1bf36b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "get_price(\"The price is roughly $99.99 because blah blah\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "501a2a7a-69c8-451b-bbc0-398bcb9e1612",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The function for gpt-4o-mini\n",
    "\n",
    "def gpt_4o_mini(item):\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4o-mini\", \n",
    "        messages=messages_for(item),\n",
    "        seed=42,\n",
    "        max_tokens=5\n",
    "    )\n",
    "    reply = response.choices[0].message.content\n",
    "    return get_price(reply)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "843d88b4-364a-431b-b48b-8a7c1f68b786",
   "metadata": {},
   "outputs": [],
   "source": [
    "test[0].price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36bdd2c9-1859-4f99-a09f-3ec83b845b30",
   "metadata": {},
   "outputs": [],
   "source": [
    "Tester.test(gpt_4o_mini, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f49e90d6-6749-4eb8-9347-5922b189d379",
   "metadata": {},
   "outputs": [],
   "source": [
    "def gpt_4o_frontier(item):\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4o-2024-08-06\", \n",
    "        messages=messages_for(item),\n",
    "        seed=42,\n",
    "        max_tokens=5\n",
    "    )\n",
    "    reply = response.choices[0].message.content\n",
    "    return get_price(reply)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "766e697e-55bf-4521-b301-3b07d20045e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The function for gpt-4o - the August model\n",
    "# Note that it cost me about 1-2 cents to run this (pricing may vary by region)\n",
    "# You can skip this and look at my results instead\n",
    "\n",
    "Tester.test(gpt_4o_frontier, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53d941cb-5b73-44ea-b893-3a0ce9997066",
   "metadata": {},
   "outputs": [],
   "source": [
    "def claude_3_point_5_sonnet(item):\n",
    "    messages = messages_for(item)\n",
    "    system_message = messages[0]['content']\n",
    "    messages = messages[1:]\n",
    "    response = claude.messages.create(\n",
    "        model=\"claude-3-5-sonnet-20240620\",\n",
    "        max_tokens=5,\n",
    "        system=system_message,\n",
    "        messages=messages\n",
    "    )\n",
    "    reply = response.content[0].text\n",
    "    return get_price(reply)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11dba25d-f562-40f9-9855-40b715b7fc86",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The function for Claude 3.5 Sonnet\n",
    "# It also cost me about 1-2 cents to run this (pricing may vary by region)\n",
    "# You can skip this and look at my results instead\n",
    "\n",
    "Tester.test(claude_3_point_5_sonnet, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77428dfb-d8f4-4477-8265-77b4b0badd39",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}