ceph & boto3 introduction

ceph intro

A Ceph Storage Cluster requires at least one Ceph Monitor, Ceph Manager, and Ceph OSD (Object Storage Daemon)

ceph storage cluster

The Ceph File System, Ceph Object Storage and Ceph Block Devices read data from and write data to the Ceph Storage Cluster.

cluster operations

data placement

pools, which are logical groups for storing objects
Placement Groups(PG), PGs are fragments of a logical object pool
CRUSH maps, provide the physical topology of the cluster to the CRUSH algorithm to determine where the data for an object and its replicas should be stored, and how to do so across failure domains for added data safety among other things.
Balancer, a feature that will automatically optimize the distribution of PGs across devices to achieve a balanced data distribution

Librados APIs workflow

apis can interact with: Ceph monitor as well as OSD.

get libs

configure a cluster handling

the client app must invoke librados and connected to a Ceph Monitor
librados retrieves the cluster map
when the client app wants to read or write data, it creates an I/O context and bind to a pool
with the I/O context, the client provides the object name to librados, for locating the data.
then the client application can read or write data

Thus, the first steps in using the cluster from your app are to 1) create a cluster handle that your app will use to connect to the storage cluster, and then 2) use that handle to connect. To connect to the cluster, the app must supply a monitor address, a username and an authentication key (cephx is enabled by defaul

an easy way, in Ceph configuration file:

1
2
3

[global]
mon host = 192.168.0.1
keyring = /etc/ceph/ceph.client.admin.keyring

import rados
try:
        cluster = rados.Rados(conffile='')
except TypeError as e:
        print 'Argument validation error: ', e
        raise e
print "Created cluster handle."
try:
        cluster.connect()
except Exception as e:
        print "connection error: ", e
        raise e
finally:
        print "Connected to the cluster."

python api, has default admin as id, ceph as cluster name, and ceph.conf as confffile value

creating an I/O context

RADOS enables you to interact both synchronously and asynchronously. Once your app has an I/O Context, read/write operations only require you to know the object/xattr name.

print "\n\nI/O Context and Object Operations"
print "================================="
print "\nCreating a context for the 'data' pool"
if not cluster.pool_exists('data'):
        raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'."
ioctx.write("hw", "Hello World!")
print "Writing XATTR 'lang' with value 'en_US' to object 'hw'"
ioctx.set_xattr("hw", "lang", "en_US")
print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'."
ioctx.write("bm", "Bonjour tout le monde!")
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'"
ioctx.set_xattr("bm", "lang", "fr_FR")
print "\nContents of object 'hw'\n------------------------"
print ioctx.read("hw")
print "\n\nGetting XATTR 'lang' from object 'hw'"
print ioctx.get_xattr("hw", "lang")
print "\nContents of object 'bm'\n------------------------"
print ioctx.read("bm")
print "Getting XATTR 'lang' from object 'bm'"
print ioctx.get_xattr("bm", "lang")
print "\nRemoving object 'hw'"
ioctx.remove_object("hw")
print "Removing object 'bm'"
ioctx.remove_object("bm")

closing sessions

print "\nClosing the connection."
ioctx.close()
print "Shutting down the handle."
cluster.shutdown()

librados in Python

data level operations

configure a cluster handle

To connect to the Ceph Storage Cluster, your application needs to know where to find the Ceph Monitor. Provide this information to your application by specifying the path to your Ceph configuration file, which contains the location of the initial Ceph monitors.

1 2	import rados, sys cluster = rados.Rados(conffile='ceph.conf')

connect to the cluster

cluster.connect()
print "\nCluster ID: " + cluster.get_fsid()
print "\n\nCluster Statistics"
print "=================="
cluster_stats = cluster.get_cluster_stats()
for key, value in cluster_stats.iteritems():
        print key, value

manage pools

cluster.create_pool('test')
pools = cluster.list_pools()
for pool in pools:
        print pool
cluster.delete_pool('test')

I/O context

to read from or write to Ceph Storage cluster, requires ioctx.

1
2
3

ioctx = cluster.open_ioctx($ioctx_name)
#
ioctx = cluster.open_ioctx2($pool_id)

ioctx_name is the name of the pool, pool_id is the ID of the pool

read, write, remove objects

1
2
3

ioctx.write_full("hw", "hello")
ioctx.read("hw")
ioctx.remove_object("hw")

with extended attris

ioctx.set_xattr("hw", "lang" "en_US")
ioctx.get_xattr("hw", "lang")
``` 
* list objs 
```python 
obj_itr = ioctx.list_objects()
while True:
	try:
		rados_obj = obj_itr.next()
		print(rados_obj.read())

RADOS S3 api

Ceph supports RESTful API that is compatible with basic data access model of Amazon S3 api.

PUT /{bucket}/{object} HTTP/1.1
DELETE /{bucket}/{object} HTTP/1.1
GET /{bucket}/{object} HTTP/1.1
HEAD /{bucket}/{object} HTTP/1.1  //get object info 
GET /{bucket}/{object}?acl HTTP/1.1

Amazon S3

Simple Storage Serivce(s3) is used as file/object storage system to store and share files across Internet. it can store any type of objects, with simple key-value. elastic computing cluster(ec2) is Amazon’s computing service. there are three class in S3: servivce, bucket, object.

there are two ways to access S3, through SDK(boto), or raw Restful API(GET, PUT). the following is SDK way.

create a bucket

Put(), Get(), return all objects in the bucket. Bucket is a storage location to hold files(objects).

def create_bucket(bucket_name, region=None):
	try:
		if region is None:
			s3_client = boto3.client('s3')
			s3.client.create_bucket(Bucket=bucket_name)
		else:
            s3_client = boto3.client('s3', region_name=region)
            location = {'LocationConstraint': region}
            s3_client.create_bucket(Bucket=bucket_name,
            CreateBucketConfiguration=location)
    except ClientError as e:
        logging.error(e)
        return False
    return True			
response = s3_client.list_buckets()
for bucket in response['Buckets']:
	print bucket

upload files

basically to upload a file to an S3 bucket.


def upload_file(file_name, bucket, object_name=None):
    if object_name is None:
        object_name = file_name
    # Upload the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

upload_file() handles large files by splitting them into smaller chunks and uploading each chunk in parallel. which is same logic as ceph to hide the lower-level detail of data splitting and transfering.

upload_fileobj() accepts a readable file-like object, which should be openedin bin mode, not text mode:

1
2
3

s3 = boto3.client('s3')
with open("FILE_NAME", "rb") as f:
    s3.upload_fileobj(f, "BUCKET_NAME", "OBJECT_NAME")

all Client, Bucket and Object have these two methods.

download files

download_file() a pair of upload files method, which accepts the names of bucket and object to download, and the filename to save the downloaded file to.

1 2	s3 = boto3.client('s3') s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')

same as upload file, there is a read-only open method: download_fileobj().

bucket policy

s3 = boto3.client('s3')
result = s3.get_bucket_policy(Bucket='BUCKET_NAME')
bucket_policy = json.dumps(bucket_policy)
s3.put_bucket_policy(Bucket=bucket_name, Policy=bucket_policy)
s3.delete_bucket_policy(Bucket='BUCKET_NAME')
control_list = s3.get_bucket_acl(Bucket='BUCKEET_NAME')