Spaces:
Runtime error
Runtime error
File size: 4,157 Bytes
8918ac7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# VenusFactory Download Tab User Guide
## InterPro Metadata
**Description**: Downloads protein domain information from InterPro database.
**Source**: [InterPro Database](https://www.ebi.ac.uk/interpro/)
**Download Options**:
- Single ID: Download data for a specific InterPro domain (e.g., IPR000001)
- From JSON: Batch download using a JSON file containing multiple InterPro entries
**Output Format**:
```
download/interpro_domain/
βββ IPR000001/
βββ detail.json # Detailed protein information
βββ meta.json # Metadata including accession and protein count
βββ uids.txt # List of UniProt IDs associated with this domain
```
## RCSB Metadata
**Description**: Downloads structural metadata from the RCSB Protein Data Bank.
**Source**: [RCSB PDB](https://www.rcsb.org/)
**Download Options**:
- Single ID: Download metadata for a specific PDB entry (e.g., 1a0j)
- From File: Batch download using a text file containing PDB IDs
**Output Format**:
```
download/rcsb_metadata/
βββ 1a0j.json # Contains structure metadata including:
# - Resolution
# - Experimental method
# - Publication info
# - Chain information
```
## UniProt Sequences
**Description**: Downloads protein sequences from UniProt database.
**Source**: [UniProt](https://www.uniprot.org/)
**Download Options**:
- Single ID: Download sequence for a specific UniProt entry (e.g., P00734)
- From File: Batch download using a text file containing UniProt IDs
- Merge Option: Combine all sequences into a single FASTA file
**Output Format**:
```
download/uniprot_sequences/
βββ P00734.fasta # Individual FASTA files (when not merged)
βββ merged.fasta # Combined sequences (when merge option is selected)
```
## RCSB Structures
**Description**: Downloads 3D structure files from RCSB Protein Data Bank.
**Source**: [RCSB PDB](https://www.rcsb.org/)
**Download Options**:
- Single ID: Download structure for a specific PDB entry
- From File: Batch download using a text file containing PDB IDs
- File Types:
* cif: mmCIF format (recommended)
* pdb: Legacy PDB format
* xml: PDBML/XML format
* sf: Structure factors
* mr: NMR restraints
- Unzip Option: Automatically decompress downloaded files
**Output Format**:
```
download/rcsb_structures/
βββ 1a0j.pdb # Uncompressed structure file (with unzip)
βββ 1a0j.pdb.gz # Compressed structure file (without unzip)
```
## AlphaFold2 Structures
**Description**: Downloads predicted protein structures from AlphaFold Protein Structure Database.
**Source**: [AlphaFold DB](https://alphafold.ebi.ac.uk/)
**Download Options**:
- Single ID: Download structure for a specific UniProt entry
- From File: Batch download using a text file containing UniProt IDs
- Index Level: Organize files in subdirectories based on ID prefix
**Output Format**:
```
download/alphafold2_structures/
βββ P/ # With index_level=1
βββ P0/ # With index_level=2
βββ P00734.pdb # AlphaFold predicted structure
```
## Common Features
- **Error Handling**: All components support error file generation
- **Output Directory**: Customizable output paths
- **Batch Processing**: Support for multiple IDs via file input
- **Progress Tracking**: Real-time download progress and status updates
## Input File Formats
1. **PDB ID List** (for RCSB downloads):
```
1a0j
4hhb
1hho
```
2. **UniProt ID List** (for UniProt and AlphaFold):
```
P00734
P61823
Q8WZ42
```
3. **InterPro JSON** (for batch InterPro downloads):
```json
[
{
"metadata": {
"accession": "IPR000001"
}
},
{
"metadata": {
"accession": "IPR000002"
}
}
]
```
## Error Files
When enabled, failed downloads are logged to `failed.txt` in the output directory:
```
P00734 - Download failed: 404 Not Found
1a0j - Connection timeout
``` |