Update app.py
Browse files
app.py
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
"""
|
2 |
|
3 |
import gradio as gr
|
4 |
import numpy as np
|
@@ -89,10 +89,10 @@ examples = [
|
|
89 |
|
90 |
with gr.Blocks() as demo:
|
91 |
gr.Markdown("""
|
92 |
-
#
|
93 |
-
The demo below showcases the
|
94 |
Given an image and a description of an object or region,
|
95 |
-
|
96 |
This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
|
97 |
in order to pick the best points.
|
98 |
In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
|
@@ -104,16 +104,16 @@ This demo uses GPT-4V, so it requires an OpenAI API key.
|
|
104 |
To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
|
105 |
|
106 |
Hyperparameters to set:
|
107 |
-
* N Samples for Initialization - how many initial points are sampled for the first
|
108 |
* N Samples for Optimiazation - how many points are sampled for subsequent iterations.
|
109 |
* N Iterations - how many optimization iterations to perform.
|
110 |
-
* N Ensemble Recursions - how many ensembles for recursive
|
111 |
|
112 |
Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
|
113 |
|
114 |
-
After
|
115 |
-
There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one
|
116 |
-
The Info textbox will show the final selected pixel coordinate that
|
117 |
""".strip())
|
118 |
|
119 |
gr.Markdown(
|
|
|
1 |
+
"""PIVOT Demo."""
|
2 |
|
3 |
import gradio as gr
|
4 |
import numpy as np
|
|
|
89 |
|
90 |
with gr.Blocks() as demo:
|
91 |
gr.Markdown("""
|
92 |
+
# PIVOT: Prompting with Iterative Visual Optimization
|
93 |
+
The demo below showcases a version of the PIVOT algorithm, which uses iterative visual prompts to optimize and guide the reasoning of Vision-Langauge-Models (VLMs).
|
94 |
Given an image and a description of an object or region,
|
95 |
+
PIVOT iteratively searches for the point in the image that best corresponds to the description.
|
96 |
This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
|
97 |
in order to pick the best points.
|
98 |
In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
|
|
|
104 |
To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
|
105 |
|
106 |
Hyperparameters to set:
|
107 |
+
* N Samples for Initialization - how many initial points are sampled for the first PIVOT iteration.
|
108 |
* N Samples for Optimiazation - how many points are sampled for subsequent iterations.
|
109 |
* N Iterations - how many optimization iterations to perform.
|
110 |
+
* N Ensemble Recursions - how many ensembles for recursive PIVOT.
|
111 |
|
112 |
Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
|
113 |
|
114 |
+
After PIVOT finishes, the image gallery below will visualize PIVOT results throughout all the iterations.
|
115 |
+
There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one PIVOT picked.
|
116 |
+
The Info textbox will show the final selected pixel coordinate that PIVOT converged to.
|
117 |
""".strip())
|
118 |
|
119 |
gr.Markdown(
|