File size: 6,315 Bytes
fe5c39d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# SPO | Self-Supervised Prompt Optimization <img src="../../docs/resources/spo/SPO-logo.png" width="60" height="60" style="vertical-align: middle; margin-left: 10px; position: relative; top: -5px;">


An automated prompt engineering tool for Large Language Models (LLMs), designed for universal domain adaptation.

A next-generation prompt engineering system implementing **Self-Supervised Prompt Optimization (SPO)**. Achieves state-of-the-art performance with 17.8-90.9Γ— higher cost efficiency than conventional methods. πŸš€

<p align="center">
<a href=""><img src="../../docs/resources/spo/SPO-method.png" alt="Framework of SPO" title="Framework of SPO <sub>1</sub>" width="80%"></a>
</p>

## ✨ Core Advantages

- πŸ’Έ **Ultra-Low Cost** - _$0.15 per task optimization_
- 🏷️ **Zero Supervision** - _No ground truth/human feedback required_
- ⚑ **Universal Adaptation** - _Closed & open-ended tasks supported_
- πŸ”„ **Self-Evolving** - _Auto-optimization via LLM-as-judge mechanism_

[Read our paper on arXiv](https://arxiv.org/pdf/2502.06855)

## πŸ“Š Experiment

###  Closed Tasks
<p align="center">
<a href=""><img src="../../docs/resources/spo/SPO-closed_task_table.png" alt="SPO closed task table" title="SPO closed task table <sub>1</sub>" width="80%"></a>
<a href=""><img src="../../docs/resources/spo/SPO-closed_task_figure.png" alt="SPO closed task figure" title="SPO closed task figure <sub>1</sub>" width="80%"></a>
</p>

*SPO demonstrates superior cost efficiency, requiring only 1.1% to 5.6% of the cost of state-of-the-art methods while maintaining competitive performance.*

### Open-ended Tasks
<p align="center">
<a href=""><img src="../../docs/resources/spo/SPO-open_ended_task_figure.png" alt="Open-ended task figure" title="Open-ended task figure <sub>1</sub>" width="80%"></a>
</p>

*SPO significantly improves model performance across all model configurations in open-ended tasks.*

## πŸš€ Quick Start

### 1. Configure Your API Key βš™οΈ

Configure LLM parameters in `config/config2.yaml` (see `examples/spo/config2.example.yaml` for reference)
### 2. Define Your Iteration template πŸ“

Create a Iteration template file `metagpt/ext/spo/settings/task_name.yaml`:
```yaml
prompt: |
  Please solve the following problem.

requirements: |
  ...

count: None

faq:
  - question: |
      ...
    answer: |
      ...

  - question: |
      ...
    answer: |
      ...
```

Notes:
- `prompt`: Initial prompt for iteration
- `requirements`: Desired effects/outcomes (e.g., generate more thinking, use more humorous language)
- `count`: Target word count for the generated prompt (e.g., 50). Set to None for no limit
- `faq`: QA pairs used for iteration, can include appropriate number of pairs (typically 3)
  - `question`: Questions from the dataset used for iteration
  - `answer`: Corresponding answers. Can contain desired thinking patterns or responses instead of actual answers, or can be left empty. See `metagpt/ext/spo/settings/Navigate.yaml` for reference

### 3. Implement the PromptOptimizer πŸ”§

You have three ways to run the PromptOptimizer:

#### Option 1: Python Script

```python
from metagpt.ext.spo.components.optimizer import PromptOptimizer
from metagpt.ext.spo.utils.llm_client import SPO_LLM

if __name__ == "__main__":
  # Initialize LLM settings
  SPO_LLM.initialize(
    optimize_kwargs={"model": "claude-3-5-sonnet-20240620", "temperature": 0.7},
    evaluate_kwargs={"model": "gpt-4o-mini", "temperature": 0.3},
    execute_kwargs={"model": "gpt-4o-mini", "temperature": 0}
  )

  # Create and run optimizer
  optimizer = PromptOptimizer(
    optimized_path="workspace",  # Output directory
    initial_round=1,  # Starting round
    max_rounds=10,  # Maximum optimization rounds
    template="Poem.yaml",  # Template file
    name="Poem",  # Project name
  )

  optimizer.optimize()
```

#### Option 2: Command Line Interface

```bash
python -m examples.spo.optimize
```

Available command line options:
```
--opt-model            Model for optimization (default: claude-3-5-sonnet-20240620)
--opt-temp            Temperature for optimization (default: 0.7)
--eval-model          Model for evaluation (default: gpt-4o-mini)
--eval-temp          Temperature for evaluation (default: 0.3)
--exec-model          Model for execution (default: gpt-4o-mini)
--exec-temp          Temperature for execution (default: 0)
--workspace          Output directory path (default: workspace)
--initial-round      Initial round number (default: 1)
--max-rounds        Maximum number of rounds (default: 10)
--template          Template file name (default: Poem.yaml)
--name              Project name (default: Poem)
```

For help:
```bash
python -m examples.spo.optimize --help
```

#### Option 3: Streamlit Web Interface

For a more user-friendly experience, you can use the Streamlit web interface to configure and run the optimizer.

First, install Streamlit:
```bash
pip install "streamlit~=1.42.0"
```

Then run the web interface:
```bash 
python -m streamlit run metagpt/ext/spo/app.py
```

### 4. View Results
```
workspace
  └── Project_name
      └── prompts
          β”œβ”€β”€ results.json 
          β”œβ”€β”€ round_1
          β”‚   β”œβ”€β”€ answers.txt
          β”‚   └── prompt.txt
          β”œβ”€β”€ round_2
          β”‚   β”œβ”€β”€ answers.txt
          β”‚   └── prompt.txt
          β”œβ”€β”€ round_3
          β”‚   β”œβ”€β”€ answers.txt
          β”‚   └── prompt.txt
          β”œβ”€β”€ ...
          └── round_n
              β”œβ”€β”€ answers.txt
              └── prompt.txt
```

- `results.json`: Stores whether each iteration round was judged successful and other related information
- `prompt.txt`: The optimized prompt for the corresponding round
- `answers.txt`: The output results generated using the prompt for the corresponding round

## Citation

If you use SPO in your research, please cite our paper:

```
@misc{xiang2025spo,
      title={Self-Supervised Prompt Optimization}, 
      author={Jinyu Xiang and Jiayi Zhang and Zhaoyang Yu and Fengwei Teng and Jinhao Tu and Xinbing Liang and Sirui Hong and Chenglin Wu and Yuyu Luo},
      year={2025},
      eprint={2502.06855},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.06855}, 
}
```