background
previously tried to run Vulkan in virtual display, which failed as I understand virtual display configure didn’t fit well with Vulkan. so this solution is direct display to allow each node has plugged monitor(which is called PC cluster). for future in cloud support, current solution won’t work. and earlier, also tried to deploy lgsvl in docker swarm, which so far can work with Vulkan as well, after a little bit understand X11
a few demo test can run as followning:
deploy glxgears/OpenGL in PC cluster
|
|
deploy vkcube/Vulkan in PC cluster
|
|
deploy service with “node.role==worker”
|
|
deploy service in whole swarm
|
|
which deploy vkcube service in both manager and worker node:
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service converged
understand .Xauthority
as docker service can run with --mount
arguments, which give the first try to copy .Xauthority
to manager node, but .X11-unix
is not copyable, which is not a normal file, but a socket.
in docker service create
, when create a OpenGL/vulkan service in one remote worker node, and using $DISPLAY=:0
, which means the display happens at the remote worker node. so in this way, the remote worker node is played the Xserver
role; and since the vulkan service is actually run in the remote worker node, so the remote worker node is Xclient
?
assuming the lower implement of docker swarm service
is based on ssh
, then when the manager node start the service, it will build the ssh tunnel to the remote worker node, and with the $DISPLAY variable as null; even if the docker swarm can start the ssh tunnel with -X
, which by default, will use the manager node’s $DISPLAY=localhost:10.0
Xauthority cookie
is used to grant access to Xserver, so first make sure which machine is the Xserver, then the Xauthority should be included in that Xserer host machine. a few testes:
|
|
so which cookie is is for this $DISPLAY ? it shouldb be the one on ubuntu/unix:10
;
|
|
then it use the cookie: worker/unix:0
.
deploy lgsvl service in swarm
|
|
which gives:
overall progress: 0 out of 2 tasks
1/2: container cannot be disconnected from host network or connected to host network
2/2: container cannot be disconnected from host network or connected to host network
basically, the service is deployed in ingress network
by default, but as well, the service is configured with host network
. so it conflict.
swarm network
the routing mesh is the default internal balancer in swarm network; the other choice is to deploy serivce directly on the node, namely bypassing routing mesh
, which ask the service run in global mode
and with pubished port setting as mode=host
, which should be the same as --network host
in replicas mode
.
the limitation of bypassing routing mesh, is only one task on one node, and access the published port can only require the service from this special node, which doesn’ make sense in cloud env.
|
|
tips: –mount src=”X11-unix”,dst=”/tmp/.X11-unix” is kind of cached. so once docker image has ran in worker node, then it doesn’t need to pass this parameter again, but once worker node restart, it need this parameter again
in summary about swarm netowrk, the routing mesh should be the right solution for cloud deployment. so how to bypass --network host
?
the reason of host network
is the lgsvl server and webUI can works in same host; if not, there is kind of cross-domain security failures, which actually is another topic, namely, how to host lgsvl and webUI/React in different hosts.
need some study of webUI in next post.