docker_mineru / README.md
marcosremar2's picture
Update PDF to Markdown converter API
3d9ca9a
metadata
title: PDF to Markdown Converter
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860

PDF to Markdown Converter API

A FastAPI-based service that converts PDF documents to Markdown format using the marker library.

Features

  • Convert PDF files to Markdown format
  • GPU-accelerated processing with CUDA support
  • Simple RESTful API
  • Docker containerization

Setup and Installation

Prerequisites

  • Docker
  • Docker Compose
  • NVIDIA Container Toolkit (for GPU support)

Building and Running the Container

  1. Clone this repository:
git clone <repository-url>
cd docker_mineru
  1. Build and start the container:
docker-compose up -d
  1. The API will be available at: http://localhost:7860

API Usage

Health Check

GET /health

Returns the current status of the service and whether CUDA is available.

Convert PDF to Markdown

POST /convert

Upload a PDF file to convert it to Markdown.

Example cURL request:

curl -X POST "http://localhost:7860/convert" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_file.pdf"

Response:

{
  "filename": "your_file.pdf",
  "status": "success",
  "markdown_content": "# Your PDF content in Markdown...",
  "output_file": "/output/your_file.md"
}

Accessing the API Documentation

Once the API is running, you can access the following:

  • Swagger UI: http://localhost:7860/docs
  • ReDoc: http://localhost:7860/redoc

Hugging Face Spaces Deployment

This application is also deployed on Hugging Face Spaces. You can access it at: https://huggingface.co/spaces/marcosremar2/docker_mineru