quizbowl-submission / docs /advanced-pipeline-examples.md
Maharshi Gor
Configure Git LFS for PNG files
7acf14e
|
raw
history blame
3.31 kB

Working with Advanced Pipeline Examples

This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.

Loading the Two-Step Justified Confidence Example

  1. Navigate to the "Tossup Agents" tab at the top of the interface.

  2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".

  3. Click "Import Pipeline" to load the example into the interface.

Understanding the Two-Step Pipeline Structure

The loaded pipeline has two distinct steps:

  1. Step A: Answer Generator

    • Uses OpenAI/gpt-4o-mini
    • Takes question text as input
    • Generates an answer candidate
    • Uses a focused system prompt for answer generation only
  2. Step B: Confidence Evaluator

    • Uses Cohere/command-r-plus
    • Takes the question text AND the generated answer from Step A
    • Evaluates confidence and provides justification
    • Uses a specialized system prompt for confidence evaluation

This separation of concerns allows each model to focus on a specific task:

  • The first model concentrates solely on generating the most accurate answer
  • The second model evaluates how confident we should be in that answer

Modifying the Pipeline for Better Performance

Here are some ways to enhance the pipeline:

  1. Upgrade the Answer Generator:

    • Click on Step A in the interface
    • Change the model from gpt-4o-mini to a more powerful model like gpt-4o
    • Modify the system prompt to include more specific instructions about quizbowl answer formatting
  2. Improve the Confidence Evaluator:

    • Click on Step B
    • Add specific domain knowledge to the system prompt
    • For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
    • Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.

Running and Testing Your Modified Pipeline

  1. After making your modifications, scroll down to adjust the buzzer settings:

    • Consider changing the confidence threshold based on the performance of your enhanced model
    • You might want to lower it slightly if you've improved the confidence evaluator
  2. Test your modified pipeline:

    • Select a Question ID or use the provided sample question
    • Click "Run on Tossup Question"
    • Observe the answer, confidence score, and justification
  3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing

Advantages of Multi-Step Pipelines

Multi-step pipelines offer several benefits:

  1. Specialized Models: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)

  2. Focused Prompting: Each step can have a targeted system prompt optimized for its specific task

  3. Chain of Thought: Build sophisticated reasoning by connecting steps in a logical sequence

  4. Better Confidence Calibration: Dedicated confidence evaluation typically results in more reliable buzzing

  5. Transparency: The justification output helps you understand why the model made certain decisions