VenusFactory / src /web /manual /DownloadManual_EN.md
2dogey's picture
Upload folder using huggingface_hub
8918ac7 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade

VenusFactory Download Tab User Guide

InterPro Metadata

Description: Downloads protein domain information from InterPro database.

Source: InterPro Database

Download Options:

  • Single ID: Download data for a specific InterPro domain (e.g., IPR000001)
  • From JSON: Batch download using a JSON file containing multiple InterPro entries

Output Format:

download/interpro_domain/
└── IPR000001/
    β”œβ”€β”€ detail.json    # Detailed protein information
    β”œβ”€β”€ meta.json      # Metadata including accession and protein count
    └── uids.txt       # List of UniProt IDs associated with this domain

RCSB Metadata

Description: Downloads structural metadata from the RCSB Protein Data Bank.

Source: RCSB PDB

Download Options:

  • Single ID: Download metadata for a specific PDB entry (e.g., 1a0j)
  • From File: Batch download using a text file containing PDB IDs

Output Format:

download/rcsb_metadata/
└── 1a0j.json         # Contains structure metadata including:
                     # - Resolution
                     # - Experimental method
                     # - Publication info
                     # - Chain information

UniProt Sequences

Description: Downloads protein sequences from UniProt database.

Source: UniProt

Download Options:

  • Single ID: Download sequence for a specific UniProt entry (e.g., P00734)
  • From File: Batch download using a text file containing UniProt IDs
  • Merge Option: Combine all sequences into a single FASTA file

Output Format:

download/uniprot_sequences/
β”œβ”€β”€ P00734.fasta      # Individual FASTA files (when not merged)
└── merged.fasta      # Combined sequences (when merge option is selected)

RCSB Structures

Description: Downloads 3D structure files from RCSB Protein Data Bank.

Source: RCSB PDB

Download Options:

  • Single ID: Download structure for a specific PDB entry
  • From File: Batch download using a text file containing PDB IDs
  • File Types:
    • cif: mmCIF format (recommended)
    • pdb: Legacy PDB format
    • xml: PDBML/XML format
    • sf: Structure factors
    • mr: NMR restraints
  • Unzip Option: Automatically decompress downloaded files

Output Format:

download/rcsb_structures/
β”œβ”€β”€ 1a0j.pdb          # Uncompressed structure file (with unzip)
└── 1a0j.pdb.gz       # Compressed structure file (without unzip)

AlphaFold2 Structures

Description: Downloads predicted protein structures from AlphaFold Protein Structure Database.

Source: AlphaFold DB

Download Options:

  • Single ID: Download structure for a specific UniProt entry
  • From File: Batch download using a text file containing UniProt IDs
  • Index Level: Organize files in subdirectories based on ID prefix

Output Format:

download/alphafold2_structures/
└── P/               # With index_level=1
    └── P0/          # With index_level=2
        └── P00734.pdb  # AlphaFold predicted structure

Common Features

  • Error Handling: All components support error file generation
  • Output Directory: Customizable output paths
  • Batch Processing: Support for multiple IDs via file input
  • Progress Tracking: Real-time download progress and status updates

Input File Formats

  1. PDB ID List (for RCSB downloads):
1a0j
4hhb
1hho
  1. UniProt ID List (for UniProt and AlphaFold):
P00734
P61823
Q8WZ42
  1. InterPro JSON (for batch InterPro downloads):
[
    {
        "metadata": {
            "accession": "IPR000001"
        }
    },
    {
        "metadata": {
            "accession": "IPR000002"
        }
    }
]

Error Files

When enabled, failed downloads are logged to failed.txt in the output directory:

P00734 - Download failed: 404 Not Found
1a0j - Connection timeout