Spaces:

Snehil-Shah
/

Multimodal-Image-Search-Engine

Running

App Files Files Community

Snehil Shah commited on Jan 1, 2024

Commit

eda15d0

1 Parent(s): 1c0f2c7

Set up the dataset, transformer, and the vector db

Browse files

Files changed (1) hide show

images.ipynb +236 -33

images.ipynb CHANGED Viewed

@@ -1,35 +1,238 @@
 {
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# My notebook"
-   ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# hello world\n",
-    "print(\"hello world\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# EON"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

 {
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/Snehil-Shah/MultiModal-Vector-Semantic-Search-Engine/blob/main/images.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "aH0U6JkEbAcg"
+      },
+      "source": [
+        "# Image to Semantic Embeddings\n",
+        "\n",
+        "**Aim**: Encode around 50k jpg/jpeg images into vector embeddings using a vision tranformer model and upsert them into a vector database for clustering and querying"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "CFLaAyqCbAch"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install jupyter pandas qdrant_client pyarrow datasets"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "j5o4d0jbbAci"
+      },
+      "source": [
+        "# Load Dataset\n",
+        "This is the Open Images Dataset by CVDFoundation which hosts over 9 mil images. We will be working with a smaller subset.\n",
+        "\n",
+        "The dataset currently is a tsv file, with the first column representing a URL to a hosted jpg/jpeg image."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import pandas as pd\n",
+        "data = pd.read_csv('open-images-dataset-validation.tsv', sep='\\t', header=None).reset_index()\n",
+        "print(data.shape, data.head(), sep=\"\\n\")"
+      ],
+      "metadata": {
+        "id": "j97T0MIBeEDe",
+        "outputId": "df823427-2859-40f6-c171-f92b5a84361b",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        }
+      },
+      "execution_count": 98,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "(41620, 4)\n",
+            "   index                                                  0        1  \\\n",
+            "0      0  https://c2.staticflickr.com/6/5606/15611395595...  2038323   \n",
+            "1      1  https://c6.staticflickr.com/3/2808/10351094034...  1762125   \n",
+            "2      2  https://c2.staticflickr.com/9/8089/8416776003_...  9059623   \n",
+            "3      3  https://farm3.staticflickr.com/568/21452126474...  2306438   \n",
+            "4      4  https://farm4.staticflickr.com/1244/677743874_...  6571968   \n",
+            "\n",
+            "                          2  \n",
+            "0  I4V4qq54NBEFDwBqPYCkDA==  \n",
+            "1  38x6O2LAS75H1vUGVzIilg==  \n",
+            "2  4ksF8TuGWGcKul6Z/6pq8g==  \n",
+            "3  R+6Cs525mCUT6RovHPWREg==  \n",
+            "4  JnkYas7iDJu+pb81tfqVow==  \n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Download the images\n",
+        "We need the image data locally to feed it to the model"
+      ],
+      "metadata": {
+        "id": "M-Esbnhy6KTU"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import urllib\n",
+        "import os\n",
+        "\n",
+        "def download_file(url):\n",
+        "    basename = os.path.basename(url)\n",
+        "    target_path = f\"./images/{basename}\"\n",
+        "    if not os.path.exists(target_path):\n",
+        "        try:\n",
+        "            urllib.request.urlretrieve(url, target_path)\n",
+        "        except urllib.error.HTTPError:\n",
+        "            return None\n",
+        "    return target_path"
+      ],
+      "metadata": {
+        "id": "cK_63ubnieI6"
+      },
+      "execution_count": 99,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# The Model\n",
+        "We will be using a pre-trained model. Contrastive Language-Image Pre-training (CLIP) model developed by OpenAI is a multi-modal Vision Transformer model that can extract the visual features from the image into vector embeddings\n",
+        "\n",
+        "We will be storing these vector embeddings in a vector space database, where images will be clustered based on their semantic information ready for querying"
+      ],
+      "metadata": {
+        "id": "0WrAbzxP6khy"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from sentence_transformers import SentenceTransformer\n",
+        "model = SentenceTransformer(\"clip-ViT-B-32\")"
+      ],
+      "metadata": {
+        "id": "pHYk-KdmlJxz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# The Vector Database\n",
+        "\n",
+        "Qdrant is an open-source vector database, where we can store vector embeddings and query nearest neighbours of a given embedding to create a recommendation/semantic search engine\n",
+        "\n",
+        "We start by initializing the Qdrant client and connecting to the cluster hosted on Qdrant Cloud"
+      ],
+      "metadata": {
+        "id": "2h7jMch58ADV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from qdrant_client import QdrantClient\n",
+        "from qdrant_client.http import models as rest\n",
+        "from google.colab import userdata\n",
+        "\n",
+        "qdrant_client = QdrantClient(\n",
+        "    url = userdata.get('QDRANT_CLUSTER_URL'),\n",
+        "    api_key = userdata.get('QDRANT_CLUSTER_API_KEY'),\n",
+        ")\n",
+        "qdrant_client.recreate_collection(\n",
+        "   collection_name=\"images\",\n",
+        "   vectors_config = rest.VectorParams(size=512, distance = rest.Distance.COSINE),\n",
+        ")"
+      ],
+      "metadata": {
+        "id": "nAObCg-yrzpC"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Function to upsert an embedding to the collection"
+      ],
+      "metadata": {
+        "id": "zGbMrsDL_HH-"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def upsert_to_db(id, vector, payload):\n",
+        "  qdrant_client.upsert(\n",
+        "   collection_name=\"images\",\n",
+        "   points=[\n",
+        "      rest.PointStruct(\n",
+        "            id=id,\n",
+        "            vector=vector.tolist(),\n",
+        "            payload=payload\n",
+        "      )\n",
+        "   ]\n",
+        ")"
+      ],
+      "metadata": {
+        "id": "mjTRm85dr13p"
+      },
+      "execution_count": 76,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "for i, link in data.iloc[:, :2].iterrows():\n",
+        "  img = download_file(link[0])\n",
+        "  if(img):\n",
+        "    embedding = model.encode(str(img))\n",
+        "    upsert_to_db(i,embedding, {\"link\":link[0]})\n",
+        "    print(f\"upserted {i}\")"
+      ],
+      "metadata": {
+        "id": "MvFEc4MgwSLW"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "language_info": {
+      "name": "python"
+    },
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
   },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}