File size: 3,308 Bytes
7acf14e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# Working with Advanced Pipeline Examples
This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.
## Loading the Two-Step Justified Confidence Example
1. Navigate to the "Tossup Agents" tab at the top of the interface.
2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".
3. Click "Import Pipeline" to load the example into the interface.
## Understanding the Two-Step Pipeline Structure
The loaded pipeline has two distinct steps:
1. **Step A: Answer Generator**
- Uses OpenAI/gpt-4o-mini
- Takes question text as input
- Generates an answer candidate
- Uses a focused system prompt for answer generation only
2. **Step B: Confidence Evaluator**
- Uses Cohere/command-r-plus
- Takes the question text AND the generated answer from Step A
- Evaluates confidence and provides justification
- Uses a specialized system prompt for confidence evaluation
This separation of concerns allows each model to focus on a specific task:
- The first model concentrates solely on generating the most accurate answer
- The second model evaluates how confident we should be in that answer
## Modifying the Pipeline for Better Performance
Here are some ways to enhance the pipeline:
1. **Upgrade the Answer Generator**:
- Click on Step A in the interface
- Change the model from gpt-4o-mini to a more powerful model like gpt-4o
- Modify the system prompt to include more specific instructions about quizbowl answer formatting
2. **Improve the Confidence Evaluator**:
- Click on Step B
- Add specific domain knowledge to the system prompt
- For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
- Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.
## Running and Testing Your Modified Pipeline
1. After making your modifications, scroll down to adjust the buzzer settings:
- Consider changing the confidence threshold based on the performance of your enhanced model
- You might want to lower it slightly if you've improved the confidence evaluator
2. Test your modified pipeline:
- Select a Question ID or use the provided sample question
- Click "Run on Tossup Question"
- Observe the answer, confidence score, and justification
3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing
## Advantages of Multi-Step Pipelines
Multi-step pipelines offer several benefits:
1. **Specialized Models**: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)
2. **Focused Prompting**: Each step can have a targeted system prompt optimized for its specific task
3. **Chain of Thought**: Build sophisticated reasoning by connecting steps in a logical sequence
4. **Better Confidence Calibration**: Dedicated confidence evaluation typically results in more reliable buzzing
5. **Transparency**: The justification output helps you understand why the model made certain decisions |