Background
previously, vulkan in docker gives the way to run vulkan based apps in Docker; this post is about how to deploy a GPU-based app in docker swarm. Docker swarm has the ability to deploy apps(service) in scalability.
Docker registry
Docker Registry is acted as a local Docker Hub, so the nodes in the LAN can share images.
update docker daemon with insecure-registries
modify
/etc/docker/daemon.json
in worker node:"insecure-registries": ["192.168.0.10:5000"]
systemctl restart docker
start registry service in manager node
docker service create –name registry –publish published=5000,target=5000 registry:2
access docker registry on both manager node and worker node :
$ curl http://192.168.0.10:5000/v2/ #on manager node
$ curl http://192.168.0.10:5000/v2/ #on worker node
insecure registry is only for test; for product, it has to with secure connection, check the official doc about deploy a registry server
upload images to this local registry hub
docker tag stackdemo 192.168.0.10:5000/stackdemo
docker push 192.168.0.10:5000/stackdemo:latest
curl http://192.168.0.10:5000/v2/_catalog
on worker run:
docker pull 192.168.0.10:5000/stackdemo
docker run -p 8000:8000 192.168.0.10:5000/stackdemo
the purpose of local registry
is to build a local docker image file server, to share in the cluster server.
Deploy compose
docker-compose build
docker-compose build
is used to build the images. docker-compose up
will run the image, if not exiting, will build the image first. for lgsvl app, the running has a few parameters, so directly run docker-compose up
will report no protocol
error.
run vkcube in docker-compose
docker-compose v2 does support runtime=nvidia
, by appending the following to /etc/docker/daemon.json
:
|
|
to run vkcube in compose by:
xhost +si:localuser:root
docker-compose up
the docker-compose.yml is :
|
|
however, currently composev3 doesn’t support NVIDIA runtime, who is required to run stack deploy.
support v3 compose with nvidia runtime
as discussed at #issue: support for NVIDIA GPUs under docker compose:
|
|
update daemon.json with node-generic-resources
, an official sample of compose resource can be reviewed. but so far, it only reports error:
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected`
deploy compose_V3 to swarm
docker compose v3 has two run options, if triggered by docker-compose up
, it is in standalone mode, will all services in the stack is host in current node; if triggered through docker stack deploy
and current node is the manager of the swarm cluster, the services will be hosted in the swarm. btw, docker compose v2
only support standalone mode.
take an example from the official doc: deploy a stack to swarm:
|
|
after deploy stackdemo
in swarm, check on both manager node and worker node:
curl http://192.168.0.13:8000
curl http://192.168.0.10:8000
docker service runtime
docker run
can support runtime env through -e
in CLI or env-file
, but actually docker service
doesn’t have runtime env support. docker compose v3
give the possiblity to configure the runtime env and deploy the service to clusters, but so far v3 compose doesn’t support runtime=nvidia
, so not helpful.
I tried to run vkcube, lgsvl with docker service:
docker service create --name vkcc --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix:$DISPLAY --mount src="/.X11-unix",dst="/tmp/.X11-unix" vkcube
docker service create --name lgsvl -p 8080:8080 --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix$DISPLAY --mount src="X11-unix",dst="/tmp/.X11-unix" lgsvl
for vkcube
, the service converged, but no GUI display; for lgsvl
, the service failed.
Docker deploy
docker deploy is used to deploy a complete application stack to the swarm, which accepts the stack application in compose file, docker depoy is in experimental, which can be trigger in /etc/docker/daemon.json
, check to enable experimental features
a sample from jianshu docker-compose.yml
:
|
|
a few commands to look into swarm services:
|
|
summary
at this moment, it’s not possible to use v3 compose.yml to support runtime=nvidia
, so using v3 compose.yml to depoly a gpu-based service in swarm is blocked. the nature swarm way maybe the right solution.
refer
https configure for docker registry in LAN
alex: deploy compose(v3) to swarm
swarm mode with docker service
inspect a service on the swarm
enable compose for nvidia-docker
compose issue: to support nvidia under Docker compose