Spaces:

qanta-challenge
/

quizbowl-submission

Running

App Files Files Community

quizbowl-submission / docs /advanced-pipeline-examples.md

Maharshi Gor

Configure Git LFS for PNG files

7acf14e about 1 month ago

preview code

raw

history blame

3.31 kB

	# Working with Advanced Pipeline Examples

	This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.

	## Loading the Two-Step Justified Confidence Example

	1. Navigate to the "Tossup Agents" tab at the top of the interface.

	2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".

	3. Click "Import Pipeline" to load the example into the interface.

	## Understanding the Two-Step Pipeline Structure

	The loaded pipeline has two distinct steps:

	1. Step A: Answer Generator
	- Uses OpenAI/gpt-4o-mini
	- Takes question text as input
	- Generates an answer candidate
	- Uses a focused system prompt for answer generation only

	2. Step B: Confidence Evaluator
	- Uses Cohere/command-r-plus
	- Takes the question text AND the generated answer from Step A
	- Evaluates confidence and provides justification
	- Uses a specialized system prompt for confidence evaluation

	This separation of concerns allows each model to focus on a specific task:
	- The first model concentrates solely on generating the most accurate answer
	- The second model evaluates how confident we should be in that answer

	## Modifying the Pipeline for Better Performance

	Here are some ways to enhance the pipeline:

	1. Upgrade the Answer Generator:
	- Click on Step A in the interface
	- Change the model from gpt-4o-mini to a more powerful model like gpt-4o
	- Modify the system prompt to include more specific instructions about quizbowl answer formatting

	2. Improve the Confidence Evaluator:
	- Click on Step B
	- Add specific domain knowledge to the system prompt
	- For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
	- Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.

	## Running and Testing Your Modified Pipeline

	1. After making your modifications, scroll down to adjust the buzzer settings:
	- Consider changing the confidence threshold based on the performance of your enhanced model
	- You might want to lower it slightly if you've improved the confidence evaluator

	2. Test your modified pipeline:
	- Select a Question ID or use the provided sample question
	- Click "Run on Tossup Question"
	- Observe the answer, confidence score, and justification

	3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing

	## Advantages of Multi-Step Pipelines

	Multi-step pipelines offer several benefits:

	1. Specialized Models: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)

	2. Focused Prompting: Each step can have a targeted system prompt optimized for its specific task

	3. Chain of Thought: Build sophisticated reasoning by connecting steps in a logical sequence

	4. Better Confidence Calibration: Dedicated confidence evaluation typically results in more reliable buzzing

	5. Transparency: The justification output helps you understand why the model made certain decisions