Object storage with OpenShift Container Storage

OpenShift Container Storage (OCS) from Red Hat deploys Ceph in your OpenShift cluster (or allows you to integrate with an external Ceph cluster). In addition to the file- and block- based volume services provided by Ceph, OCS includes two S3-api compatible object storage implementations.

The first option is the Ceph Object Gateway (radosgw), Ceph’s native object storage interface. The second option called the “Multicloud Object Gateway”, which is in fact a piece of software named Noobaa, a storage abstraction layer that was acquired by Red Hat in 2018. In this article I’d like to demonstrate how to take advantage of these storage options.

What is object storage?#

The storage we interact with regularly on our local computers is block storage: data is stored as a collection of blocks on some sort of storage device. Additional layers – such as a filesystem driver – are responsible for assembling those blocks into something useful.

Object storage, on the other hand, manages data as objects: a single unit of data and associated metadata (such as access policies). An object is identified by some sort of unique id. Object storage generally provides an API that is largely independent of the physical storage layer; data may live on a variety of devices attached to a variety of systems, and you don’t need to know any of those details in order to access the data.

The most well known example of object storage service Amazon’s S3 service (“Simple Storage Service”), first introduced in 2006. The S3 API has become a de-facto standard for object storage implementations. The two services we’ll be discussing in this article provide S3-compatible APIs.

Creating buckets#

The fundamental unit of object storage is called a “bucket”.

Creating a bucket with OCS works a bit like creating a persistent volume, although instead of starting with a PersistentVolumeClaim you instead start with an ObjectBucketClaim ("OBC"). An OBC looks something like this when using RGW:

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: example-rgw
spec:
  generateBucketName: example-rgw
  storageClassName: ocs-storagecluster-ceph-rgw

Or like this when using Noobaa (note the different value for storageClassName):

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: example-noobaa
spec:
  generateBucketName: example-noobaa
  storageClassName: openshift-storage.noobaa.io

With OCS 4.5, your out-of-the-box choices for storageClassName will be ocs-storagecluster-ceph-rgw, if you choose to use Ceph Radosgw, or openshift-storage.noobaa.io, if you choose to use the Noobaa S3 endpoint.

Before we continue, I’m going to go ahead and create these resources in my OpenShift environment. To do so, I’m going to use Kustomize to deploy the resources described in the following kustomization.yml file:

namespace: oddbit-ocs-example

resources:
  - obc-noobaa.yml
  - obc-rgw.yml

Running kustomize build | oc apply -f- from the directory containing this file populates the specified namespace with the two ObjectBucketClaims mentioned above:

$ kustomize build | oc apply -f-
objectbucketclaim.objectbucket.io/example-noobaa created
objectbucketclaim.objectbucket.io/example-rgw created

Verifying that things seem healthy:

$ oc get objectbucketclaim
NAME             STORAGE-CLASS                 PHASE   AGE
example-noobaa   openshift-storage.noobaa.io   Bound   2m59s
example-rgw      ocs-storagecluster-ceph-rgw   Bound   2m59s

Each ObjectBucketClaim will result in a OpenShift creating a new ObjectBucket resource (which, like PersistentVolume resources, are not namespaced). The ObjectBucket resource will be named obc-<namespace-name>-<objectbucketclaim-name>.

$ oc get objectbucket obc-oddbit-ocs-example-example-rgw obc-oddbit-ocs-example-example-noobaa
NAME                                    STORAGE-CLASS                 CLAIM-NAMESPACE      CLAIM-NAME       RECLAIM-POLICY   PHASE   AGE
obc-oddbit-ocs-example-example-rgw      ocs-storagecluster-ceph-rgw   oddbit-ocs-example   example-rgw      Delete           Bound   67m
obc-oddbit-ocs-example-example-noobaa   openshift-storage.noobaa.io   oddbit-ocs-example   example-noobaa   Delete           Bound   67m

Each ObjectBucket resource corresponds to a bucket in the selected object storage backend.

Because buckets exist in a flat namespace, the OCS documentation recommends always using generateName in the claim, rather than explicitly setting bucketName, in order to avoid unexpected conflicts. This means that the generated buckets will have a named prefixed by the value in generateName, followed by a random string:

$ oc get objectbucketclaim example-rgw -o jsonpath='{.spec.bucketName}'
example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661

$ oc get objectbucketclaim example-noobaa -o jsonpath='{.spec.bucketName}'
example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef

Along with the bucket itself, OpenShift will create a Secret and a ConfigMap resource – named after your OBC – with the metadata necessary to access the bucket.

The Secret contains AWS-style credentials for authenticating to the S3 API:

$ oc get secret example-rgw -o yaml | oc neat
apiVersion: v1
data:
  AWS_ACCESS_KEY_ID: ...
  AWS_SECRET_ACCESS_KEY: ...
kind: Secret
metadata:
  labels:
    bucket-provisioner: openshift-storage.ceph.rook.io-bucket
  name: example-rgw
  namespace: oddbit-ocs-example
type: Opaque

(I’m using the neat filter here to remove extraneous metadata that OpenShift returns when you request a resource.)

The ConfigMap contains a number of keys that provide you (or your code) with the information necessary to access the bucket. For the RGW bucket:

$ oc get configmap example-rgw -o yaml | oc neat
apiVersion: v1
data:
  BUCKET_HOST: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc.cluster.local
  BUCKET_NAME: example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661
  BUCKET_PORT: "80"
  BUCKET_REGION: us-east-1
kind: ConfigMap
metadata:
  labels:
    bucket-provisioner: openshift-storage.ceph.rook.io-bucket
  name: example-rgw
  namespace: oddbit-ocs-example

And for the Noobaa bucket:

$ oc get configmap example-noobaa -o yaml | oc neat
apiVersion: v1
data:
  BUCKET_HOST: s3.openshift-storage.svc
  BUCKET_NAME: example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef
  BUCKET_PORT: "443"
kind: ConfigMap
metadata:
  labels:
    app: noobaa
    bucket-provisioner: openshift-storage.noobaa.io-obc
    noobaa-domain: openshift-storage.noobaa.io
  name: example-noobaa
  namespace: oddbit-ocs-example

Note that BUCKET_HOST contains the internal S3 API endpoint. You won’t be able to reach this from outside the cluster. We’ll tackle that in just a bit.

Accessing a bucket from a pod#

The easiest way to expose the credentials in a pod is to map the keys from both the ConfigMap and Secret as environment variables using the envFrom directive, like this:

apiVersion: v1
kind: Pod
metadata:
  name: bucket-example
spec:
  containers:
    - image: myimage
      env:
        - name: AWS_CA_BUNDLE
          value: /run/secrets/kubernetes.io/serviceaccount/service-ca.crt
      envFrom:
        - configMapRef:
            name: example-rgw
        - secretRef:
            name: example-rgw
      [...]

Note that we’re also setting AWS_CA_BUNDLE here, which you’ll need if the internal endpoint referenced by $BUCKET_HOST is using SSL.

Inside the pod, we can run, for example, aws commands as long as we provide an appropriate s3 endpoint. We can inspect the value of BUCKET_PORT to determine if we need http or https:

$ [ "$BUCKET_PORT" = 80 ] && schema=http || schema=https
$ aws s3 --endpoint $schema://$BUCKET_HOST ls
2021-02-10 04:30:31 example-rgw-8710aa46-a47a-4a8b-8edd-7dabb7d55469

Python’s boto3 module can also make use of the same environment variables:

>>> import boto3
>>> import os
>>> bucket_host = os.environ['BUCKET_HOST']
>>> schema = 'http' if os.environ['BUCKET_PORT'] == '80' else 'https'
>>> s3 = boto3.client('s3', endpoint_url=f'{schema}://{bucket_host}')
>>> s3.list_buckets()['Buckets']
[{'Name': 'example-noobaa-...', 'CreationDate': datetime.datetime(...)}]

External connections to S3 endpoints#

External access to services in OpenShift is often managed via routes. If you look at the routes available in your openshift-storage namespace, you’ll find the following:

$ oc -n openshift-storage get route
NAME          HOST/PORT                                               PATH   SERVICES                                           PORT         TERMINATION   WILDCARD
noobaa-mgmt   noobaa-mgmt-openshift-storage.apps.example.com          noobaa-mgmt                                        mgmt-https   reencrypt     None
s3            s3-openshift-storage.apps.example.com                   s3                                                 s3-https     reencrypt     None

The s3 route provides external access to your Noobaa S3 endpoint. You’ll note that in the list above there is no route registered for radosgw¹. There is a service registered for Radosgw named rook-ceph-rgw-ocs-storagecluster-cephobjectstore, so we can expose that service to create an external route by running something like:

oc create route edge rgw --service rook-ceph-rgw-ocs-storagecluster-cephobjectstore

This will create a route with “edge” encryption (TLS termination is handled by the default ingress router):

$ oc -n openshift storage get route
NAME          HOST/PORT                                               PATH   SERVICES                                           PORT         TERMINATION   WILDCARD
noobaa-mgmt   noobaa-mgmt-openshift-storage.apps.example.com          noobaa-mgmt                                        mgmt-https   reencrypt     None
rgw           rgw-openshift-storage.apps.example.com                  rook-ceph-rgw-ocs-storagecluster-cephobjectstore   http         edge          None
s3            s3-openshift-storage.apps.example.com                   s3                                                 s3-https     reencrypt     None

Accessing a bucket from outside the cluster#

Once we know the Route to our S3 endpoint, we can use the information in the Secret and ConfigMap created for us when we provisioned the storage. We just need to replace the BUCKET_HOST with the hostname in the route, and we need to use SSL over port 443 regardless of what BUCKET_PORT tells us.

We can extract the values into variables using something like the following shell script, which takes care of getting the appropriate route from the openshift-storage namespace, base64-decoding the values in the Secret, and replacing the BUCKET_HOST value:

#!/bin/sh

bucket_host=$(oc get configmap $1 -o json | jq -r .data.BUCKET_HOST)
service_name=$(cut -f1 -d. <<<$bucket_host)
service_ns=$(cut -f2 -d. <<<$bucket_host)

# get the externally visible hostname provided by the route
public_bucket_host=$(
  oc -n $service_ns get route -o json |
    jq -r  '.items[]|select(.spec.to.name=="'"$service_name"'")|.spec.host'
)

# dump configmap and secret as shell variables, replacing the
# value of BUCKET_HOST in the process.
(
  oc get configmap $1 -o json |
    jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.])"'
  oc get secret $1 -o json |
    jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.]|@base64d)"'
) | sed -e 's/^/export /' -e '/BUCKET_HOST/ s/=.*/='$public_bucket_host'/'

If we call the script getenv.sh and run it like this:

$ sh getenv.sh example-rgw

It will produce output like this:

export BUCKET_HOST="s3-openshift-storage.apps.cnv.massopen.cloud"
export BUCKET_NAME="example-noobaa-2e1bca2f-ff49-431a-99b8-d7d63a8168b0"
export BUCKET_PORT="443"
export BUCKET_REGION=""
export BUCKET_SUBREGION=""
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

We could accomplish something similar in Python with the following, which shows how to use the OpenShift dynamic client to interact with OpenShift:

import argparse
import base64

import kubernetes
import openshift.dynamic


def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument('-n', '--namespace', required=True)
    p.add_argument('obcname')
    return p.parse_args()


args = parse_args()
k8s_client = kubernetes.config.new_client_from_config()
dyn_client = openshift.dynamic.DynamicClient(k8s_client)

v1_configmap = dyn_client.resources.get(api_version='v1', kind='ConfigMap')
v1_secret = dyn_client.resources.get(api_version='v1', kind='Secret')
v1_service = dyn_client.resources.get(api_version='v1', kind='Service')
v1_route = dyn_client.resources.get(api_version='route.openshift.io/v1', kind='Route')

configmap = v1_configmap.get(name=args.obcname, namespace=args.namespace)
secret = v1_secret.get(name=args.obcname, namespace=args.namespace)

env = dict(configmap.data)
env.update({k: base64.b64decode(v).decode() for k, v in secret.data.items()})

svc_name, svc_ns = env['BUCKET_HOST'].split('.')[:2]
routes = v1_route.get(namespace=svc_ns)
for route in routes.items:
    if route.spec.to.name == svc_name:
        break

env['BUCKET_PORT'] = 443
env['BUCKET_HOST'] = route['spec']['host']

for k, v in env.items():
    print(f'export {k}="{v}"')

If we run it like this:

python genenv.py -n oddbit-ocs-example example-noobaa

It will produce output largely identical to what we saw above with the shell script.

If we load those variables into the environment:

$ eval $(sh getenv.sh example-rgw)

We can perform the same operations we executed earlier from inside the pod:

$ aws s3 --endpoint https://$BUCKET_HOST ls
2021-02-10 14:34:12 example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661

note that this may have changed in the recent OCS 4.6 release ↩︎