Docker image for running GLM-OCR (0.9B parameter OCR model) on RunPod Serverless using vLLM.
Model weights are baked into the image at build time for fast cold starts.
- Base image:
vllm/vllm-openai:nightly - Model: zai-org/GLM-OCR (MIT License)
- Transformers: v5+ dev branch (required by GLM-OCR)
- Serving: vLLM on port 8080
- Create a new Serverless endpoint on RunPod.
- Select Build from GitHub repo and point it to this repository.
- No container start command is needed — the
CMDin the Dockerfile handles it. - (Optional) Set
HF_TOKENas an environment variable in RunPod's UI for faster model downloads during builds.
GLM-OCR supports two prompt types:
Extract raw content from documents using these prompts:
| Task | Prompt |
|---|---|
| Text | Text Recognition: |
| Formula | Formula Recognition: |
| Table | Table Recognition: |
Extract structured data by providing a JSON schema as the prompt. Example:
Please output the information in the image in the following JSON format:
{
"name": "",
"date": "",
"total": ""
}
This worker is a RunPod Serverless Queue worker, so requests must be sent to
RunPod's /run or /runsync endpoint and wrapped in input.
If you send raw OpenAI payloads directly, the worker logs:
Job has missing field(s): id or input.
curl -X POST "https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync" \
-H "Authorization: Bearer <RUNPOD_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"input": {
"model": "zai-org/GLM-OCR",
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/9/99/ReceiptSwiss.jpg"}},
{"type": "text", "text": "Text Recognition:"}
]
}
]
}
}'docker build -t glm-ocr .
docker run --gpus all -p 8080:8080 glm-ocrThis Dockerfile is provided as-is. GLM-OCR is released under the MIT License. The vLLM base image has its own license terms.