|
# Working with Advanced Pipeline Examples |
|
|
|
This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions. |
|
|
|
## Loading the Two-Step Justified Confidence Example |
|
|
|
1. Navigate to the "Tossup Agents" tab at the top of the interface. |
|
|
|
2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml". |
|
|
|
3. Click "Import Pipeline" to load the example into the interface. |
|
|
|
## Understanding the Two-Step Pipeline Structure |
|
|
|
The loaded pipeline has two distinct steps: |
|
|
|
1. **Step A: Answer Generator** |
|
- Uses OpenAI/gpt-4o-mini |
|
- Takes question text as input |
|
- Generates an answer candidate |
|
- Uses a focused system prompt for answer generation only |
|
|
|
2. **Step B: Confidence Evaluator** |
|
- Uses Cohere/command-r-plus |
|
- Takes the question text AND the generated answer from Step A |
|
- Evaluates confidence and provides justification |
|
- Uses a specialized system prompt for confidence evaluation |
|
|
|
This separation of concerns allows each model to focus on a specific task: |
|
- The first model concentrates solely on generating the most accurate answer |
|
- The second model evaluates how confident we should be in that answer |
|
|
|
## Modifying the Pipeline for Better Performance |
|
|
|
Here are some ways to enhance the pipeline: |
|
|
|
1. **Upgrade the Answer Generator**: |
|
- Click on Step A in the interface |
|
- Change the model from gpt-4o-mini to a more powerful model like gpt-4o |
|
- Modify the system prompt to include more specific instructions about quizbowl answer formatting |
|
|
|
2. **Improve the Confidence Evaluator**: |
|
- Click on Step B |
|
- Add specific domain knowledge to the system prompt |
|
- For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores." |
|
- Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification. |
|
|
|
## Running and Testing Your Modified Pipeline |
|
|
|
1. After making your modifications, scroll down to adjust the buzzer settings: |
|
- Consider changing the confidence threshold based on the performance of your enhanced model |
|
- You might want to lower it slightly if you've improved the confidence evaluator |
|
|
|
2. Test your modified pipeline: |
|
- Select a Question ID or use the provided sample question |
|
- Click "Run on Tossup Question" |
|
- Observe the answer, confidence score, and justification |
|
|
|
3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing |
|
|
|
## Advantages of Multi-Step Pipelines |
|
|
|
Multi-step pipelines offer several benefits: |
|
|
|
1. **Specialized Models**: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning) |
|
|
|
2. **Focused Prompting**: Each step can have a targeted system prompt optimized for its specific task |
|
|
|
3. **Chain of Thought**: Build sophisticated reasoning by connecting steps in a logical sequence |
|
|
|
4. **Better Confidence Calibration**: Dedicated confidence evaluation typically results in more reliable buzzing |
|
|
|
5. **Transparency**: The justification output helps you understand why the model made certain decisions |