Spaces:
Runtime error
Runtime error
# VenusFactory Download Tab User Guide | |
## InterPro Metadata | |
**Description**: Downloads protein domain information from InterPro database. | |
**Source**: [InterPro Database](https://www.ebi.ac.uk/interpro/) | |
**Download Options**: | |
- Single ID: Download data for a specific InterPro domain (e.g., IPR000001) | |
- From JSON: Batch download using a JSON file containing multiple InterPro entries | |
**Output Format**: | |
``` | |
download/interpro_domain/ | |
βββ IPR000001/ | |
βββ detail.json # Detailed protein information | |
βββ meta.json # Metadata including accession and protein count | |
βββ uids.txt # List of UniProt IDs associated with this domain | |
``` | |
## RCSB Metadata | |
**Description**: Downloads structural metadata from the RCSB Protein Data Bank. | |
**Source**: [RCSB PDB](https://www.rcsb.org/) | |
**Download Options**: | |
- Single ID: Download metadata for a specific PDB entry (e.g., 1a0j) | |
- From File: Batch download using a text file containing PDB IDs | |
**Output Format**: | |
``` | |
download/rcsb_metadata/ | |
βββ 1a0j.json # Contains structure metadata including: | |
# - Resolution | |
# - Experimental method | |
# - Publication info | |
# - Chain information | |
``` | |
## UniProt Sequences | |
**Description**: Downloads protein sequences from UniProt database. | |
**Source**: [UniProt](https://www.uniprot.org/) | |
**Download Options**: | |
- Single ID: Download sequence for a specific UniProt entry (e.g., P00734) | |
- From File: Batch download using a text file containing UniProt IDs | |
- Merge Option: Combine all sequences into a single FASTA file | |
**Output Format**: | |
``` | |
download/uniprot_sequences/ | |
βββ P00734.fasta # Individual FASTA files (when not merged) | |
βββ merged.fasta # Combined sequences (when merge option is selected) | |
``` | |
## RCSB Structures | |
**Description**: Downloads 3D structure files from RCSB Protein Data Bank. | |
**Source**: [RCSB PDB](https://www.rcsb.org/) | |
**Download Options**: | |
- Single ID: Download structure for a specific PDB entry | |
- From File: Batch download using a text file containing PDB IDs | |
- File Types: | |
* cif: mmCIF format (recommended) | |
* pdb: Legacy PDB format | |
* xml: PDBML/XML format | |
* sf: Structure factors | |
* mr: NMR restraints | |
- Unzip Option: Automatically decompress downloaded files | |
**Output Format**: | |
``` | |
download/rcsb_structures/ | |
βββ 1a0j.pdb # Uncompressed structure file (with unzip) | |
βββ 1a0j.pdb.gz # Compressed structure file (without unzip) | |
``` | |
## AlphaFold2 Structures | |
**Description**: Downloads predicted protein structures from AlphaFold Protein Structure Database. | |
**Source**: [AlphaFold DB](https://alphafold.ebi.ac.uk/) | |
**Download Options**: | |
- Single ID: Download structure for a specific UniProt entry | |
- From File: Batch download using a text file containing UniProt IDs | |
- Index Level: Organize files in subdirectories based on ID prefix | |
**Output Format**: | |
``` | |
download/alphafold2_structures/ | |
βββ P/ # With index_level=1 | |
βββ P0/ # With index_level=2 | |
βββ P00734.pdb # AlphaFold predicted structure | |
``` | |
## Common Features | |
- **Error Handling**: All components support error file generation | |
- **Output Directory**: Customizable output paths | |
- **Batch Processing**: Support for multiple IDs via file input | |
- **Progress Tracking**: Real-time download progress and status updates | |
## Input File Formats | |
1. **PDB ID List** (for RCSB downloads): | |
``` | |
1a0j | |
4hhb | |
1hho | |
``` | |
2. **UniProt ID List** (for UniProt and AlphaFold): | |
``` | |
P00734 | |
P61823 | |
Q8WZ42 | |
``` | |
3. **InterPro JSON** (for batch InterPro downloads): | |
```json | |
[ | |
{ | |
"metadata": { | |
"accession": "IPR000001" | |
} | |
}, | |
{ | |
"metadata": { | |
"accession": "IPR000002" | |
} | |
} | |
] | |
``` | |
## Error Files | |
When enabled, failed downloads are logged to `failed.txt` in the output directory: | |
``` | |
P00734 - Download failed: 404 Not Found | |
1a0j - Connection timeout | |
``` |