File size: 4,157 Bytes
8918ac7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# VenusFactory Download Tab User Guide

## InterPro Metadata
**Description**: Downloads protein domain information from InterPro database.

**Source**: [InterPro Database](https://www.ebi.ac.uk/interpro/)

**Download Options**:
- Single ID: Download data for a specific InterPro domain (e.g., IPR000001)
- From JSON: Batch download using a JSON file containing multiple InterPro entries

**Output Format**:
```

download/interpro_domain/

└── IPR000001/

    β”œβ”€β”€ detail.json    # Detailed protein information

    β”œβ”€β”€ meta.json      # Metadata including accession and protein count

    └── uids.txt       # List of UniProt IDs associated with this domain

```

## RCSB Metadata
**Description**: Downloads structural metadata from the RCSB Protein Data Bank.

**Source**: [RCSB PDB](https://www.rcsb.org/)

**Download Options**:
- Single ID: Download metadata for a specific PDB entry (e.g., 1a0j)
- From File: Batch download using a text file containing PDB IDs

**Output Format**:
```

download/rcsb_metadata/

└── 1a0j.json         # Contains structure metadata including:

                     # - Resolution

                     # - Experimental method

                     # - Publication info

                     # - Chain information

```

## UniProt Sequences
**Description**: Downloads protein sequences from UniProt database.

**Source**: [UniProt](https://www.uniprot.org/)

**Download Options**:
- Single ID: Download sequence for a specific UniProt entry (e.g., P00734)
- From File: Batch download using a text file containing UniProt IDs
- Merge Option: Combine all sequences into a single FASTA file

**Output Format**:
```

download/uniprot_sequences/

β”œβ”€β”€ P00734.fasta      # Individual FASTA files (when not merged)

└── merged.fasta      # Combined sequences (when merge option is selected)

```

## RCSB Structures
**Description**: Downloads 3D structure files from RCSB Protein Data Bank.

**Source**: [RCSB PDB](https://www.rcsb.org/)

**Download Options**:
- Single ID: Download structure for a specific PDB entry
- From File: Batch download using a text file containing PDB IDs
- File Types:
    * cif: mmCIF format (recommended)
    * pdb: Legacy PDB format
    * xml: PDBML/XML format
    * sf: Structure factors
    * mr: NMR restraints
- Unzip Option: Automatically decompress downloaded files

**Output Format**:
```

download/rcsb_structures/

β”œβ”€β”€ 1a0j.pdb          # Uncompressed structure file (with unzip)

└── 1a0j.pdb.gz       # Compressed structure file (without unzip)

```

## AlphaFold2 Structures
**Description**: Downloads predicted protein structures from AlphaFold Protein Structure Database.

**Source**: [AlphaFold DB](https://alphafold.ebi.ac.uk/)

**Download Options**:
- Single ID: Download structure for a specific UniProt entry
- From File: Batch download using a text file containing UniProt IDs
- Index Level: Organize files in subdirectories based on ID prefix

**Output Format**:
```

download/alphafold2_structures/

└── P/               # With index_level=1

    └── P0/          # With index_level=2

        └── P00734.pdb  # AlphaFold predicted structure

```

## Common Features
- **Error Handling**: All components support error file generation
- **Output Directory**: Customizable output paths
- **Batch Processing**: Support for multiple IDs via file input
- **Progress Tracking**: Real-time download progress and status updates

## Input File Formats
1. **PDB ID List** (for RCSB downloads):
```

1a0j

4hhb

1hho

```

2. **UniProt ID List** (for UniProt and AlphaFold):
```

P00734

P61823

Q8WZ42

```

3. **InterPro JSON** (for batch InterPro downloads):
```json

[

    {

        "metadata": {

            "accession": "IPR000001"

        }

    },

    {

        "metadata": {

            "accession": "IPR000002"

        }

    }

]

```

## Error Files
When enabled, failed downloads are logged to `failed.txt` in the output directory:
```

P00734 - Download failed: 404 Not Found

1a0j - Connection timeout

```