# Quizbowl Agent Goals and Evaluation ## Objectives ### Tossup Agents - Respond to questions with the best guess with calibrated confidence - Buzz at the earliest possible moment with sufficient information - Avoid incorrect buzzes - Maintain consistent performance across topics ### Bonus Agents - Answer parts correctly with accurate confidence estimation - Provide clear explanation of reasoning which will be used by human team members to validate / pick the suggested answer. - Adapt to varying difficulty levels (easy, medium, hard) ## Performance Metrics ### Tossup Metrics - **Accuracy**: Percentage of correct answers - **Average Buzz Position**: How early in the question you buzz (earlier is better) - **Confidence Calibration**: How well confidence score matches actual performance - **Score**: Points earned based on buzz position and correctness ### Bonus Metrics - **Accuracy**: Percentage of correct answers across all parts - **Confidence Calibration**: How well confidence score matches actual performance - **Explanation Quality**: Relevance and clarity of reasoning ## Evaluating Your Agent ### Testing Baseline Performance 1. Run the default agent configuration 2. Record metrics (accuracy, confidence, buzz position) 3. Identify specific weaknesses in performance ### Validating Improvements After each enhancement: 1. Run the agent on the same development set of questions 2. Compare metrics to previous version 3. Check for improvements in weak areas ### Final Evaluation Criteria Your final agent will be evaluated on: 1. Overall accuracy across diverse questions 2. Optimal buzz timing (neither too early nor too late) 3. Confidence threshold calibration 4. Explanation quality (for bonus agents)