deploy lgsvl in docker swarm

Background

previously, vulkan in docker gives the way to run vulkan based apps in Docker; this post is about how to deploy a GPU-based app in docker swarm. Docker swarm has the ability to deploy apps(service) in scalability.

Docker registry

Docker Registry is acted as a local Docker Hub, so the nodes in the LAN can share images.

update docker daemon with insecure-registries

  • modify /etc/docker/daemon.json in worker node:

    "insecure-registries": ["192.168.0.10:5000"] 
    
  • systemctl restart docker

  • start registry service in manager node

    docker service create –name registry –publish published=5000,target=5000 registry:2

access docker registry on both manager node and worker node :

$ curl http://192.168.0.10:5000/v2/   #on manager node 
$ curl http://192.168.0.10:5000/v2/   #on worker node 

insecure registry is only for test; for product, it has to with secure connection, check the official doc about deploy a registry server

upload images to this local registry hub

docker tag  stackdemo  192.168.0.10:5000/stackdemo
docker push  192.168.0.10:5000/stackdemo:latest
curl  http://192.168.0.10:5000/v2/_catalog

on worker run:

docker pull 192.168.0.10:5000/stackdemo 
docker run -p 8000:8000  192.168.0.10:5000/stackdemo  

the purpose of local registry is to build a local docker image file server, to share in the cluster server.

Deploy compose

docker-compose build

docker-compose build is used to build the images. docker-compose up will run the image, if not exiting, will build the image first. for lgsvl app, the running has a few parameters, so directly run docker-compose up will report no protocol error.

run vkcube in docker-compose

docker-compose v2 does support runtime=nvidia, by appending the following to /etc/docker/daemon.json:

1
2
3
4
5
6
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}

to run vkcube in compose by:

xhost +si:localuser:root
docker-compose up

the docker-compose.yml is :

1
2
3
4
5
6
7
8
9
10
11
12
13
version: '2.3'
services:
vkcube-test:
runtime: nvidia
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
environment:
- NVIDIA_VISIBLE_DEVICES=0
- DISPLAY
# image: nvidia/cuda:9.0-base
image: vkcube
# build: .

however, currently composev3 doesn’t support NVIDIA runtime, who is required to run stack deploy.

support v3 compose with nvidia runtime

as discussed at #issue: support for NVIDIA GPUs under docker compose:

1
2
3
4
5
6
7
8
9
services:
my_app:
deploy:
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: 'gpu'
value: 2

update daemon.json with node-generic-resources, an official sample of compose resource can be reviewed. but so far, it only reports error:

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected`

deploy compose_V3 to swarm

docker compose v3 has two run options, if triggered by docker-compose up, it is in standalone mode, will all services in the stack is host in current node; if triggered through docker stack deploy and current node is the manager of the swarm cluster, the services will be hosted in the swarm. btw, docker compose v2 only support standalone mode.

take an example from the official doc: deploy a stack to swarm:

1
2
3
4
5
6
7
8
9
docker service create --name registry --publish published=5000,target=5000 registry:2
docker-compose up -d
docker-compose ps
docker-compose down --volumes
docker-compose push #push to local registry
docker stack deploy
docker stack services stackdemo
docker stack rm stackdemo
docker service rm registry

after deploy stackdemo in swarm, check on both manager node and worker node:

curl http://192.168.0.13:8000
curl http://192.168.0.10:8000

docker service runtime

docker run can support runtime env through -e in CLI or env-file, but actually docker service doesn’t have runtime env support. docker compose v3 give the possiblity to configure the runtime env and deploy the service to clusters, but so far v3 compose doesn’t support runtime=nvidia, so not helpful.

I tried to run vkcube, lgsvl with docker service:

docker service create --name vkcc --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix:$DISPLAY --mount src="/.X11-unix",dst="/tmp/.X11-unix"  vkcube
docker service create --name lgsvl  -p 8080:8080 --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix$DISPLAY --mount src="X11-unix",dst="/tmp/.X11-unix"  lgsvl

for vkcube, the service converged, but no GUI display; for lgsvl, the service failed.

Docker deploy

docker deploy is used to deploy a complete application stack to the swarm, which accepts the stack application in compose file, docker depoy is in experimental, which can be trigger in /etc/docker/daemon.json, check to enable experimental features

a sample from jianshu docker-compose.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
version: "3"
services:
nginx:
image: nginx:alpine
ports:
- 80:80
deploy:
mode: replicated
replicas: 4
visualizer:
image: dockersamples/visualizer
ports:
- "9001:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
replicas: 1
placement:
constraints: [node.role == manager]
portainer:
image: portainer/portainer
ports:
- "9000:9000"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
replicas: 1
placement:
constraints: [node.role == manager]

a few commands to look into swarm services:

1
2
3
4
5
docker stack deploy -c docker-compose.yml stack-demo
docker stack services stack-demo
docker service inspect --pretty stack-demo # inspect service in the swarm
docker service ps <service-id> # check which nodes are running the service
docker ps #on the special node where the task is running, to see details about the container

summary

at this moment, it’s not possible to use v3 compose.yml to support runtime=nvidia, so using v3 compose.yml to depoly a gpu-based service in swarm is blocked. the nature swarm way maybe the right solution.

refer

run as an insecure registry

https configure for docker registry in LAN

a docker proxy for your LAN

alex: deploy compose(v3) to swarm

monitor docker swarm

docker swarm visulizer

swarm mode with docker service

inspect a service on the swarm

voting example

enable compose for nvidia-docker

nvidia-docker-compose

compose issue: to support nvidia under Docker compose

potential solution for composev3 with runtime

swarmkit: generic_resources

Docker ARG, ENV, .env – a complete guide