Spaces:
Running
Running
Commit
·
7234ad2
1
Parent(s):
26718bc
update links
Browse files
app.py
CHANGED
@@ -209,7 +209,7 @@ with demo:
|
|
209 |
gr.Markdown(
|
210 |
"#### LLMs are pretty good at reporting their uncertainty. We just need to ask the right way.")
|
211 |
gr.Markdown("Using our uncertainty metric informed by applying causal inference techniques in \
|
212 |
-
[
|
213 |
we are able to identify likely spurious correlations and exploit them in \
|
214 |
the scenario of gender underspecified tasks. (Note that introspecting softmax probabilities alone is insufficient, as in the sentences \
|
215 |
below, LLMs may report a softmax prob of ~0.9 despite the task being underspecified.)")
|
@@ -224,7 +224,7 @@ with demo:
|
|
224 |
model_name = gr.Radio(
|
225 |
MODEL_NAMES,
|
226 |
type="value",
|
227 |
-
label="Pick a preloaded BERT-like model for uncertainty evaluation (note:
|
228 |
)
|
229 |
own_model_name = gr.Textbox(
|
230 |
label=f"...Or, if you selected an '{OWN_MODEL_NAME}' model, put any Hugging Face pipeline model name \
|
|
|
209 |
gr.Markdown(
|
210 |
"#### LLMs are pretty good at reporting their uncertainty. We just need to ask the right way.")
|
211 |
gr.Markdown("Using our uncertainty metric informed by applying causal inference techniques in \
|
212 |
+
[Exploiting Selection Bias on Underspecified Tasks in Large Language Models](https://arxiv.org/abs/2210.00131 ), \
|
213 |
we are able to identify likely spurious correlations and exploit them in \
|
214 |
the scenario of gender underspecified tasks. (Note that introspecting softmax probabilities alone is insufficient, as in the sentences \
|
215 |
below, LLMs may report a softmax prob of ~0.9 despite the task being underspecified.)")
|
|
|
224 |
model_name = gr.Radio(
|
225 |
MODEL_NAMES,
|
226 |
type="value",
|
227 |
+
label="Pick a preloaded BERT-like model for uncertainty evaluation (note: RoBERTa-large performance is best)...",
|
228 |
)
|
229 |
own_model_name = gr.Textbox(
|
230 |
label=f"...Or, if you selected an '{OWN_MODEL_NAME}' model, put any Hugging Face pipeline model name \
|