GLM-OCR Docker Image for RunPod Serverless

Docker image for running GLM-OCR (0.9B parameter OCR model) on RunPod Serverless using vLLM.

Model weights are baked into the image at build time for fast cold starts.

What's included

Base image: vllm/vllm-openai:nightly
Model: zai-org/GLM-OCR (MIT License)
Transformers: v5+ dev branch (required by GLM-OCR)
Serving: vLLM on port 8080

Deploy on RunPod Serverless

Create a new Serverless endpoint on RunPod.
Select Build from GitHub repo and point it to this repository.
No container start command is needed — the CMD in the Dockerfile handles it.
(Optional) Set HF_TOKEN as an environment variable in RunPod's UI for faster model downloads during builds.

Usage

GLM-OCR supports two prompt types:

Document parsing

Extract raw content from documents using these prompts:

Task	Prompt
Text	`Text Recognition:`
Formula	`Formula Recognition:`
Table	`Table Recognition:`

Information extraction

Extract structured data by providing a JSON schema as the prompt. Example:

Please output the information in the image in the following JSON format:
{
    "name": "",
    "date": "",
    "total": ""
}

API example (RunPod Queue endpoint)

This worker is a RunPod Serverless Queue worker, so requests must be sent to RunPod's /run or /runsync endpoint and wrapped in input.

If you send raw OpenAI payloads directly, the worker logs: Job has missing field(s): id or input.

curl -X POST "https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync" \
  -H "Authorization: Bearer <RUNPOD_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "model": "zai-org/GLM-OCR",
      "messages": [
        {
          "role": "user",
          "content": [
            {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/9/99/ReceiptSwiss.jpg"}},
            {"type": "text", "text": "Text Recognition:"}
          ]
        }
      ]
    }
  }'

Build locally

docker build -t glm-ocr .
docker run --gpus all -p 8080:8080 glm-ocr

License

This Dockerfile is provided as-is. GLM-OCR is released under the MIT License. The vLLM base image has its own license terms.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
ISSUES.md		ISSUES.md
README.md		README.md
handler.py		handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLM-OCR Docker Image for RunPod Serverless

What's included

Deploy on RunPod Serverless

Usage

Document parsing

Information extraction

API example (RunPod Queue endpoint)

Build locally

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GLM-OCR Docker Image for RunPod Serverless

What's included

Deploy on RunPod Serverless

Usage

Document parsing

Information extraction

API example (RunPod Queue endpoint)

Build locally

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages