Skip to content

matija2209/ocr-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLM-OCR Docker Image for RunPod Serverless

Docker image for running GLM-OCR (0.9B parameter OCR model) on RunPod Serverless using vLLM.

Model weights are baked into the image at build time for fast cold starts.

What's included

  • Base image: vllm/vllm-openai:nightly
  • Model: zai-org/GLM-OCR (MIT License)
  • Transformers: v5+ dev branch (required by GLM-OCR)
  • Serving: vLLM on port 8080

Deploy on RunPod Serverless

  1. Create a new Serverless endpoint on RunPod.
  2. Select Build from GitHub repo and point it to this repository.
  3. No container start command is needed — the CMD in the Dockerfile handles it.
  4. (Optional) Set HF_TOKEN as an environment variable in RunPod's UI for faster model downloads during builds.

Usage

GLM-OCR supports two prompt types:

Document parsing

Extract raw content from documents using these prompts:

Task Prompt
Text Text Recognition:
Formula Formula Recognition:
Table Table Recognition:

Information extraction

Extract structured data by providing a JSON schema as the prompt. Example:

Please output the information in the image in the following JSON format:
{
    "name": "",
    "date": "",
    "total": ""
}

API example (RunPod Queue endpoint)

This worker is a RunPod Serverless Queue worker, so requests must be sent to RunPod's /run or /runsync endpoint and wrapped in input.

If you send raw OpenAI payloads directly, the worker logs: Job has missing field(s): id or input.

curl -X POST "https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync" \
  -H "Authorization: Bearer <RUNPOD_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "model": "zai-org/GLM-OCR",
      "messages": [
        {
          "role": "user",
          "content": [
            {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/9/99/ReceiptSwiss.jpg"}},
            {"type": "text", "text": "Text Recognition:"}
          ]
        }
      ]
    }
  }'

Build locally

docker build -t glm-ocr .
docker run --gpus all -p 8080:8080 glm-ocr

License

This Dockerfile is provided as-is. GLM-OCR is released under the MIT License. The vLLM base image has its own license terms.

About

Dockerfile for GLM-OCR on RunPod (vLLM + updated Transformers)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors