pivot-iterative-visual-optimization commited on
Commit
2a9234c
·
verified ·
1 Parent(s): 3ced320

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -9
app.py CHANGED
@@ -1,4 +1,4 @@
1
- """Visual Iterative Prompting Demo."""
2
 
3
  import gradio as gr
4
  import numpy as np
@@ -89,10 +89,10 @@ examples = [
89
 
90
  with gr.Blocks() as demo:
91
  gr.Markdown("""
92
- # Visual Iterative Prompting Demo
93
- The demo below showcases the Visual Iterative Prompting (VIP) algorithm.
94
  Given an image and a description of an object or region,
95
- VIP leverages a Vision-Language Model (VLM) to iteratively search for the point in the image that best corresponds to the description.
96
  This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
97
  in order to pick the best points.
98
  In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
@@ -104,16 +104,16 @@ This demo uses GPT-4V, so it requires an OpenAI API key.
104
  To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
105
 
106
  Hyperparameters to set:
107
- * N Samples for Initialization - how many initial points are sampled for the first VIP iteration.
108
  * N Samples for Optimiazation - how many points are sampled for subsequent iterations.
109
  * N Iterations - how many optimization iterations to perform.
110
- * N Ensemble Recursions - how many ensembles for recursive VIP.
111
 
112
  Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
113
 
114
- After VIP finishes, the image gallery below will visualize VIP results throughout all the iterations.
115
- There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one VIP picked.
116
- The Info textbox will show the final selected pixel coordinate that VIP converged to.
117
  """.strip())
118
 
119
  gr.Markdown(
 
1
+ """PIVOT Demo."""
2
 
3
  import gradio as gr
4
  import numpy as np
 
89
 
90
  with gr.Blocks() as demo:
91
  gr.Markdown("""
92
+ # PIVOT: Prompting with Iterative Visual Optimization
93
+ The demo below showcases a version of the PIVOT algorithm, which uses iterative visual prompts to optimize and guide the reasoning of Vision-Langauge-Models (VLMs).
94
  Given an image and a description of an object or region,
95
+ PIVOT iteratively searches for the point in the image that best corresponds to the description.
96
  This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
97
  in order to pick the best points.
98
  In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
 
104
  To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
105
 
106
  Hyperparameters to set:
107
+ * N Samples for Initialization - how many initial points are sampled for the first PIVOT iteration.
108
  * N Samples for Optimiazation - how many points are sampled for subsequent iterations.
109
  * N Iterations - how many optimization iterations to perform.
110
+ * N Ensemble Recursions - how many ensembles for recursive PIVOT.
111
 
112
  Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
113
 
114
+ After PIVOT finishes, the image gallery below will visualize PIVOT results throughout all the iterations.
115
+ There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one PIVOT picked.
116
+ The Info textbox will show the final selected pixel coordinate that PIVOT converged to.
117
  """.strip())
118
 
119
  gr.Markdown(