redis task queue (2)

background

currently, we add job_queue list inside Dockerfile by COPY the job_queue folder from local host to the docker image, which is not dynamically well, and can’t support additionaly scenarios.

to design a redisJQ service that can used in swarm/k8s env, need consider DNS to make the servie available and shared volume to share the data in job queue to other services.

ceph rbd driver for docker

ceph can store files in three ways:

  • rbd, block storage, which usually used with virtualization kvm

  • object storage, through radosgw api, or access by boto3 APIs.

  • cephfs, mount ceph as file system

the first idea is from local host mount to remote volume(e.g. ceph storage) mount. there are a few popular rbd-drive plugins:

check ceph rbd driver to understand more details.

to support rbd-driver plugin in docker, the ceph server also need support block device driver, which sometime is not avaiable, as most small ceph team would support one or another type, either objec storage or block storage. and that’s our situation. so we can’t go with rbd-driver plugin.

another way is to use docker volume cephfs, similar reason our ceph team doesn’t support cephfs.

ceph object storage access

as the ceph team can support boto3 API to access ceph, which gives us the one and only way to access scenarios: boto3.

basically the redis_JQ first download all scenaio files from remote ceph through boto3 APIs, then scan the downloaded files into JQ, then feed into the python executors in front.

s3 client

1
2
3
4
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
/usr/local/bin/aws --version

access files in folders in s3 bucket

1
2
3
4
5
6
7
8
9
10
def download_files(self, bucket_name, folder_name):
files_with_prefix = self.s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_name)
scenario_basename = "/pythonAPI/job_queue/scenario"
i = 0
for file_ in files_with_prefix["Contents"]:
scenario_name = scenario_basename + "%d"%i + ".py"
print(scenario_name)
self.download_file(bucket_name, file_['Key'], scenario_name, False)
time.sleep(0.01)
i += 1

manage python modules

during the project, we really need take care python packages installed by apt-get, pip and conda, if not there will conflicts among different version of modules:

1
2
3
4
import websockets
Trackbac:
File "/usr/lib/python3/dist-packages/websockets/compatibility.py", line 8
asyncio_ensure_future = asyncio.async # Python < 3.5

so it’s better to use conda or python virtual-env to separate the different running envs. and install packages by conda install is better choice, than the global apt-get install:

basic of python import

  • module, any *.py file, where its name is the file name

  • package, any folder containing a file named __init__.py in i, its name is the name of the folder.

When a module named module1 is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named module1.py or a folder named module1 in a list of directories given by the variable sys.path

sys.path is initialized from 3 locations:

  • the directory containing the input script, or the current directory

  • PYTHONPATH

  • the installation-dependent default

if using export PYTHONPATH directly, it works. but once defined in ~/.bashrc it doesn’t actually triggered in conda env.
it simpler to add the root directory of the project to the PYTHONPATH environment variable and then running all the scripts from that directory’s level and changing the import statements accordingly. import search for your packages in specific places, listed in sys.path. and The current directory is always appended to this list

redis service

the common error: redis.exceptions.ConnectionError: Error 111 connecting to 10.20.181.132:6379. Connection refused. , which basically means the system can’t connect to redis server, due to by default redis only allow localhost to access. so we need configure non-localhost IP to access redis db.

  • check redis-server running status
1
2
3
ps aux | grep redis-server
netstat -tunple | grep 6379
redis-cli info
  • shutdown redis-server
1
sudo kill -9 $pid

redis-server & redis-cli

redis-server start redis server with a default config file at /etc/redis/redis.config

a few item in the configure file need take care:

the default setting is to bind 127.0.0.1,which means redis db is stored and only can be access through localhost. for our case, to allow hostIP(10.20.181.132), or even any IP to access, need :

1
bind 0.0.0.0
  • redislog, default place at /var/log/redis/redis-server.log
  • requirepass, for security issues, please consider this item

  • login client with hostIP

1
redis-cli -h 10.20.181.132

log in redis-cli first, then run the following:

1
2
3
4
LPUSH your_list_name item1
LPUSH your_list_name item2
LLEN your_list_name
EXISTS your_list_name

redis service in docker

the following is an example from create a redis service

  • connect to the redis container directly
1
docker run -it redis-image /usr/bin/redis-server /etc/redis/myconfig.conf

in this way, redis service will use its docker VIP, which can be checked from:

1
2
docker ps
docker inspect <container_id>

which will give somehing like:

1
2
3
"bridge": {
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",

then the redis-server can connect by:

1
redis-cli -h 172.17.0.2
  • connect to the host os
1
docker run -it -p 6379 redis-image /usr/bin/redis-server /etc/redis/myconfig.conf

the redis container has exported 6379, which may map to another port on host os, check:

1
2
3
docker ps
docker port <container_id> 6379 #gives the <exernal_port> on host
rdis-cli -h 10.20.181.132 -p <external_port>
  • run redis service with host network
1
docker run -it --network=host redis-image /usr/bin/redis-server /etc/redis/myconfig.conf

in this way, there is no bridge network, or docker VIP. the host IP and port is directly used. so the following works

1
redis-cli -h 10.20.181.132 -p 6379

A good way now, is to map host redis_port to container redis_port, and use the second way to access redis.

1
docker run -it -p 6379:6379 redisjq /bin/bash

tips, need to confirm 6379 port at host machine is free.

share volumes among multi volumes

the problem is redisjq service download all scenarios scripts in its own docker container, and only store the scenario name in redis db. when redis_worker access the db, there is no real python scripts. so need to share this job-queue to all redis_workers

mount volume

1
docker run -it -p 6379:6379 --mount source=jq-vol,target=/job_queue redisjq /bin/bash

start pythonapi to access the shared volume

1
docker run -it --mount source=jq-vol,target=/pythonAPI/job_queue redispythonapi /bin/bash

refer

qemu/kvm & ceph: rbd drver in qemu

基于 Ceph RBD 实现 Docker 集群的分布式存储

rexray/rbd 参考

access cephFS inside docker container without mounting cephFS in host

how to use folders in s3 bucket

the definitive guide to python import statements