Working with Advanced Pipeline Examples

This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.

Loading the Two-Step Justified Confidence Example

Navigate to the "Tossup Agents" tab at the top of the interface.
Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".
Click "Import Pipeline" to load the example into the interface.

Understanding the Two-Step Pipeline Structure

The loaded pipeline has two distinct steps:

Step A: Answer Generator
- Uses OpenAI/gpt-4o-mini
- Takes question text as input
- Generates an answer candidate
- Uses a focused system prompt for answer generation only
Step B: Confidence Evaluator
- Uses Cohere/command-r-plus
- Takes the question text AND the generated answer from Step A
- Evaluates confidence and provides justification
- Uses a specialized system prompt for confidence evaluation

This separation of concerns allows each model to focus on a specific task:

The first model concentrates solely on generating the most accurate answer
The second model evaluates how confident we should be in that answer

Modifying the Pipeline for Better Performance

Here are some ways to enhance the pipeline:

Upgrade the Answer Generator:
- Click on Step A in the interface
- Change the model from gpt-4o-mini to a more powerful model like gpt-4o
- Modify the system prompt to include more specific instructions about quizbowl answer formatting
Improve the Confidence Evaluator:
- Click on Step B
- Add specific domain knowledge to the system prompt
- For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
- Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.

Running and Testing Your Modified Pipeline

After making your modifications, scroll down to adjust the buzzer settings:
- Consider changing the confidence threshold based on the performance of your enhanced model
- You might want to lower it slightly if you've improved the confidence evaluator
Test your modified pipeline:
- Select a Question ID or use the provided sample question
- Click "Run on Tossup Question"
- Observe the answer, confidence score, and justification
Check the "Buzz Confidence" chart to see how confidence evolved during question processing

Advantages of Multi-Step Pipelines

Multi-step pipelines offer several benefits:

Specialized Models: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)
Focused Prompting: Each step can have a targeted system prompt optimized for its specific task
Chain of Thought: Build sophisticated reasoning by connecting steps in a logical sequence
Better Confidence Calibration: Dedicated confidence evaluation typically results in more reliable buzzing
Transparency: The justification output helps you understand why the model made certain decisions