File size: 3,855 Bytes
8918ac7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Quick Demo Guide

This document provides a comprehensive guide to help you quickly understand the main features of VenusFactory and perform fine-tuning, evaluation, and prediction on a demo dataset for protein solubility prediction.

## 1. Environment Preparation

Before starting, please ensure that you have successfully installed **VenusFactory** and correctly configured the corresponding environment and Python dependencies. If not yet installed, please refer to the **✈️ Requirements** section in [README.md](README.md) for installation instructions.

## 2. Launch Web Interface

Enter the following command in the command line to launch the Web UI:

```bash

python src/webui.py

```

## 3. Training (Training Tab)

### 3.1 Select Pre-trained Model

Choose a suitable pre-trained model from the Protein Language Model dropdown. It is recommended to start with ESM2-8M, which has lower computational cost and is suitable for beginners.

### 3.2 Select Dataset

In the Dataset Configuration section, select the Demo_Solubility dataset (default option). Click the Preview Dataset button to preview the dataset content.



### 3.3 Set Task Parameters



- Problem Type, Number of Labels, and Metrics options will be automatically filled when selecting a Pre-defined Dataset.



- For Batch Processing Mode, it is recommended to select Batch Token Mode to avoid uneven batch processing due to high variance in protein sequence lengths.



- Batch Token is recommended to be set to 4000. If you encounter CUDA memory errors, you can reduce this value accordingly.



### 3.4 Choose Training Method



In the Training Parameters section:



- Training Method is a key selection. This Demo dataset does not currently support the SES-Adapter method (due to lack of structural sequence information).



- You can choose the Freeze method to only fine-tune the classification head, or use the LoRA method for efficient parameter fine-tuning.



### 3.5 Start Training



- Click Preview Command to preview the command line script.



- Click Start to begin training. The Web interface will display model statistics and real-time training monitoring.



- After training is complete, the interface will show the model's Metrics on the test set to evaluate model performance.



## 4. Evaluation (Evaluation Tab)



### 4.1 Select Model Path



In the **Model Path** option, enter the path of the trained model (under the `ckpt` root directory). Ensure that the selected **PLM** and **method** are consistent with those used during training.



### 4.2 Evaluation Dataset Loading Rules



- The evaluation system will automatically load the test set of the corresponding dataset.

- If the test set cannot be found, data will be loaded in the order of **validation set → training set**.

- For custom datasets uploaded to Hugging Face:

  - **If only a single CSV file is uploaded**, the evaluation system will automatically load that file, regardless of naming.

  - **If training, validation, and test sets are uploaded**, please ensure accurate file naming.



### 4.3 Start Evaluation



Click **Start Evaluation** to begin the evaluation.



> **Example Model**  

> This project provides a model **demo_provided.pt** that has already been trained on the **Demo_Solubility** dataset using the **Freeze** method, which can be used directly for evaluation.



## 5. Prediction (Prediction Tab)



### 5.1 Single Sequence Prediction



Enter a single amino acid sequence to directly predict its solubility.



### 5.2 Batch Prediction



- By uploading a CSV file, you can predict the solubility of proteins in batch and download the results (in CSV format).



## 6. Download (Download Tab)



For detailed instructions and examples regarding the **Download Tab**, please refer to the **Download** section in the **Manual Tab**.