Spaces:
Build error
Build error
File size: 3,648 Bytes
230c9a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
==================================
Model Weights Download
==================================
Before using the PDF-Extract-Kit, we need to download the required model weights. You can download all models or specific model files (e.g., formula detection MFD) according to your needs.
[Recommended] Method 1: ``snapshot_download``
========================================
HuggingFace
------------
``huggingface_hub.snapshot_download`` supports downloading specific model weights from the HuggingFace Hub and allows multithreading. You can use the following code to download model weights in parallel:
.. code:: python
from huggingface_hub import snapshot_download
snapshot_download(repo_id='opendatalab/pdf-extract-kit-1.0', local_dir='./', max_workers=20)
If you want to download a single algorithm model (e.g., the YOLO model for the formula detection task), use the following code:
.. code:: python
from huggingface_hub import snapshot_download
snapshot_download(repo_id='opendatalab/pdf-extract-kit-1.0', local_dir='./', allow_patterns='models/MFD/YOLO/*')
.. note::
Here, ``repo_id`` represents the name of the model on HuggingFace Hub, ``local_dir`` indicates the desired local storage path, ``max_workers`` specifies the maximum number of parallel downloads, and ``allow_patterns`` specifies the files you want to download.
.. tip::
If ``local_dir`` is not specified, it will be downloaded to the default cache path of HuggingFace (``~/.cache/huggingface/hub``). To change the default cache path, modify the relevant environment variables:
.. code:: console
$ # Default is `~/.cache/huggingface/`
$ export HF_HOME=Comming soon!
.. tip::
If the download speed is slow (e.g., unable to reach maximum bandwidth), try setting ``export HF_HUB_ENABLE_HF_TRANSFER=1`` for higher download speeds.
ModelScope
-----------
``modelscope.snapshot_download`` supports downloading specified model weights. You can use the following command to download the model:
.. code:: python
from modelscope import snapshot_download
snapshot_download(model_id='opendatalab/pdf-extract-kit-1.0', cache_dir='./')
If you want to download a single algorithm model (e.g., the YOLO model for the formula detection task), use the following code:
.. code:: python
from modelscope import snapshot_download
snapshot_download(repo_id='opendatalab/pdf-extract-kit-1.0', local_dir='./', allow_patterns='models/MFD/YOLO/*')
.. note::
Here, ``model_id`` represents the name of the model in the ModelScope library, ``cache_dir`` indicates the desired local storage path, and ``allow_patterns`` specifies the files you want to download.
.. note::
``modelscope.snapshot_download`` does not support multithreaded parallel downloads.
.. tip::
If ``cache_dir`` is not specified, it will be downloaded to the default cache path of ModelScope (``~/.cache/huggingface/hub``).
To change the default cache path, modify the relevant environment variables:
.. code:: console
$ # Default is ~/.cache/modelscope/hub/
$ export MODELSCOPE_CACHE=XXXX
Method 2: Git LFS
===================
The remote model repositories of HuggingFace and ModelScope are Git repositories managed by Git LFS. Therefore, we can use ``git clone`` to download the weights:
.. code:: console
$ git lfs install
$ # From HuggingFace
$ git lfs clone https://huggingface.co/opendatalab/pdf-extract-kit-1.0
$ # From ModelScope
$ git clone https://www.modelscope.cn/opendatalab/pdf-extract-kit-1.0.git |