A Docker image for serving Voxtral-Mini-4B-Realtime via vLLM, with an OpenAI-compatible API.
vllm/vllm-openaias the base image- Extra dependencies:
soxr,librosa,soundfile,transformers - Serves
mistralai/Voxtral-Mini-4B-Realtime-2602on port8000
Pull and run the latest image:
docker run -p 8000:8000 ghcr.io/virtuos/vllm-voxtral:latestservices:
voxtral:
image: ghcr.io/virtuos/vllm-voxtral:latest
entrypoint:
- vllm
- serve
- mistralai/Voxtral-Mini-4B-Realtime-2602
- --compilation_config
- '{"cudagraph_mode": "PIECEWISE"}'
ports:
- "8000:8000"
environment:
- VLLM_DISABLE_COMPILE_CACHE=1
volumes:
- huggingface-cache:/root/.cache/huggingface
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
huggingface-cache:# with default (latest) vLLM version
docker build -t voxtral-mini .
# with a specific vLLM version
docker build --build-arg VLLM_VERSION=v0.17.1 -t voxtral-mini .Daily at 6am UTC or by manual trigger .github/workflows/build-push-image.yml checks for a new vLLM release and rebuilds if needed.
- Docker with GPU support (nvidia-container-toolkit)
MIT
virtUOS, Osnabrueck University