ceph intro
A Ceph Storage Cluster requires at least one Ceph Monitor, Ceph Manager, and Ceph OSD (Object Storage Daemon)
ceph storage cluster
The Ceph File System, Ceph Object Storage and Ceph Block Devices read data from and write data to the Ceph Storage Cluster.
pools, which are logical groups for storing objects
Placement Groups(PG), PGs are fragments of a logical object pool
CRUSH maps, provide the physical topology of the cluster to the CRUSH algorithm to determine where the data for an object and its replicas should be stored, and how to do so across failure domains for added data safety among other things.
Balancer, a feature that will automatically optimize the distribution of PGs across devices to achieve a balanced data distribution
Librados APIs workflow
apis can interact with: Ceph monitor as well as OSD.
configure a cluster handling
- the client app must invoke
librados
and connected to a Ceph Monitor - librados retrieves the cluster map
- when the client app wants to read or write data, it creates an I/O context and bind to a pool
- with the I/O context, the client provides the object name to
librados
, for locating the data. - then the client application can read or write data
Thus, the first steps in using the cluster from your app are to 1) create a cluster handle that your app will use to connect to the storage cluster, and then 2) use that handle to connect. To connect to the cluster, the app must supply a monitor address, a username and an authentication key (cephx is enabled by defaul
an easy way, in Ceph configuration file:
|
|
|
|
python api, has default admin
as id, ceph
as cluster name, and ceph.conf
as confffile value
creating an I/O context
RADOS enables you to interact both synchronously and asynchronously. Once your app has an I/O Context, read/write operations only require you to know the object/xattr name.
|
|
closing sessions
|
|
librados in Python
data level operations
- configure a cluster handle
To connect to the Ceph Storage Cluster, your application needs to know where to find the Ceph Monitor. Provide this information to your application by specifying the path to your Ceph configuration file, which contains the location of the initial Ceph monitors.
|
|
- connect to the cluster
|
|
- manage pools
|
|
- I/O context
to read from or write to Ceph Storage cluster, requires ioctx.
|
|
ioctx_name
is the name of the pool, pool_id
is the ID of the pool
- read, write, remove objects
|
|
- with extended attris
|
|
RADOS S3 api
Ceph supports RESTful API that is compatible with basic data access model of Amazon S3 api.
|
|
Amazon S3
Simple Storage Serivce(s3) is used as file/object storage system to store and share files across Internet. it can store any type of objects, with simple key-value. elastic computing cluster(ec2) is Amazon’s computing service. there are three class in S3: servivce, bucket, object.
there are two ways to access S3, through SDK(boto), or raw Restful API(GET, PUT). the following is SDK way.
create a bucket
Put(), Get(), return all objects in the bucket. Bucket is a storage location to hold files(objects).
|
|
upload files
basically to upload a file to an S3 bucket.
|
|
upload_file()
handles large files by splitting them into smaller chunks and uploading each chunk in parallel. which is same logic as ceph to hide the lower-level detail of data splitting and transfering.
upload_fileobj()
accepts a readable file-like object, which should be openedin bin mode, not text mode:
|
|
all Client
, Bucket
and Object
have these two methods.
download files
download_file() a pair of upload files method, which accepts the names of bucket and object to download, and the filename to save the downloaded file to.
|
|
same as upload file, there is a read-only open method: download_fileobj()
.
bucket policy
|
|
get objects
|
|
error: botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: Unknown
read objects through url (restful api)
through url path to get a file/object: https://<bucket-name>.s3.amazonaws.com/<key>