VenusFactory / src /web /manual /DownloadManual_EN.md
2dogey's picture
Upload folder using huggingface_hub
8918ac7 verified
# VenusFactory Download Tab User Guide
## InterPro Metadata
**Description**: Downloads protein domain information from InterPro database.
**Source**: [InterPro Database](https://www.ebi.ac.uk/interpro/)
**Download Options**:
- Single ID: Download data for a specific InterPro domain (e.g., IPR000001)
- From JSON: Batch download using a JSON file containing multiple InterPro entries
**Output Format**:
```
download/interpro_domain/
└── IPR000001/
β”œβ”€β”€ detail.json # Detailed protein information
β”œβ”€β”€ meta.json # Metadata including accession and protein count
└── uids.txt # List of UniProt IDs associated with this domain
```
## RCSB Metadata
**Description**: Downloads structural metadata from the RCSB Protein Data Bank.
**Source**: [RCSB PDB](https://www.rcsb.org/)
**Download Options**:
- Single ID: Download metadata for a specific PDB entry (e.g., 1a0j)
- From File: Batch download using a text file containing PDB IDs
**Output Format**:
```
download/rcsb_metadata/
└── 1a0j.json # Contains structure metadata including:
# - Resolution
# - Experimental method
# - Publication info
# - Chain information
```
## UniProt Sequences
**Description**: Downloads protein sequences from UniProt database.
**Source**: [UniProt](https://www.uniprot.org/)
**Download Options**:
- Single ID: Download sequence for a specific UniProt entry (e.g., P00734)
- From File: Batch download using a text file containing UniProt IDs
- Merge Option: Combine all sequences into a single FASTA file
**Output Format**:
```
download/uniprot_sequences/
β”œβ”€β”€ P00734.fasta # Individual FASTA files (when not merged)
└── merged.fasta # Combined sequences (when merge option is selected)
```
## RCSB Structures
**Description**: Downloads 3D structure files from RCSB Protein Data Bank.
**Source**: [RCSB PDB](https://www.rcsb.org/)
**Download Options**:
- Single ID: Download structure for a specific PDB entry
- From File: Batch download using a text file containing PDB IDs
- File Types:
* cif: mmCIF format (recommended)
* pdb: Legacy PDB format
* xml: PDBML/XML format
* sf: Structure factors
* mr: NMR restraints
- Unzip Option: Automatically decompress downloaded files
**Output Format**:
```
download/rcsb_structures/
β”œβ”€β”€ 1a0j.pdb # Uncompressed structure file (with unzip)
└── 1a0j.pdb.gz # Compressed structure file (without unzip)
```
## AlphaFold2 Structures
**Description**: Downloads predicted protein structures from AlphaFold Protein Structure Database.
**Source**: [AlphaFold DB](https://alphafold.ebi.ac.uk/)
**Download Options**:
- Single ID: Download structure for a specific UniProt entry
- From File: Batch download using a text file containing UniProt IDs
- Index Level: Organize files in subdirectories based on ID prefix
**Output Format**:
```
download/alphafold2_structures/
└── P/ # With index_level=1
└── P0/ # With index_level=2
└── P00734.pdb # AlphaFold predicted structure
```
## Common Features
- **Error Handling**: All components support error file generation
- **Output Directory**: Customizable output paths
- **Batch Processing**: Support for multiple IDs via file input
- **Progress Tracking**: Real-time download progress and status updates
## Input File Formats
1. **PDB ID List** (for RCSB downloads):
```
1a0j
4hhb
1hho
```
2. **UniProt ID List** (for UniProt and AlphaFold):
```
P00734
P61823
Q8WZ42
```
3. **InterPro JSON** (for batch InterPro downloads):
```json
[
{
"metadata": {
"accession": "IPR000001"
}
},
{
"metadata": {
"accession": "IPR000002"
}
}
]
```
## Error Files
When enabled, failed downloads are logged to `failed.txt` in the output directory:
```
P00734 - Download failed: 404 Not Found
1a0j - Connection timeout
```