File size: 3,308 Bytes
7acf14e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Working with Advanced Pipeline Examples

This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.

## Loading the Two-Step Justified Confidence Example

1. Navigate to the "Tossup Agents" tab at the top of the interface.

2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".

3. Click "Import Pipeline" to load the example into the interface.

## Understanding the Two-Step Pipeline Structure

The loaded pipeline has two distinct steps:

1. **Step A: Answer Generator**
   - Uses OpenAI/gpt-4o-mini
   - Takes question text as input
   - Generates an answer candidate
   - Uses a focused system prompt for answer generation only

2. **Step B: Confidence Evaluator**
   - Uses Cohere/command-r-plus
   - Takes the question text AND the generated answer from Step A
   - Evaluates confidence and provides justification
   - Uses a specialized system prompt for confidence evaluation

This separation of concerns allows each model to focus on a specific task:
- The first model concentrates solely on generating the most accurate answer
- The second model evaluates how confident we should be in that answer

## Modifying the Pipeline for Better Performance

Here are some ways to enhance the pipeline:

1. **Upgrade the Answer Generator**:
   - Click on Step A in the interface
   - Change the model from gpt-4o-mini to a more powerful model like gpt-4o
   - Modify the system prompt to include more specific instructions about quizbowl answer formatting

2. **Improve the Confidence Evaluator**:
   - Click on Step B
   - Add specific domain knowledge to the system prompt
   - For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
   - Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.

## Running and Testing Your Modified Pipeline

1. After making your modifications, scroll down to adjust the buzzer settings:
   - Consider changing the confidence threshold based on the performance of your enhanced model
   - You might want to lower it slightly if you've improved the confidence evaluator

2. Test your modified pipeline:
   - Select a Question ID or use the provided sample question
   - Click "Run on Tossup Question"
   - Observe the answer, confidence score, and justification

3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing

## Advantages of Multi-Step Pipelines

Multi-step pipelines offer several benefits:

1. **Specialized Models**: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)

2. **Focused Prompting**: Each step can have a targeted system prompt optimized for its specific task

3. **Chain of Thought**: Build sophisticated reasoning by connecting steps in a logical sequence

4. **Better Confidence Calibration**: Dedicated confidence evaluation typically results in more reliable buzzing

5. **Transparency**: The justification output helps you understand why the model made certain decisions