Persistent Storage with OpenEBS on Kubernetes

Today we will explore persistent storage for Cassandra on Kubernetes with OpenEBS.

If you want to follow along, you can sign up with them to get your account created:

What will be doing

We will be deploying a k3s distribution of kubernetes on Civo, deploy a cassandra cluster, write some data to the cluster and then test the persistence by deleting all the cassandra pods

About OpenEBS

Taken from their Github Page, OpenEBS describes them as the leading open-source container attached storage, built using cloud native architecture, simplifies running stateful applications on kubernetes.

Deploy Kubernetes

Create a new civo k3s cluster with 3 nodes:

$ civo kubernetes create demo-cluster --size=g2.small --nodes=3 --wait

Append the kubernetes config to your kubeconfig file:

$ civo kubernetes config demo-cluster --save
Merged config into ~/.kube/config

Switch the context to the new cluster:

$ kubectx demo-cluster
Switched to context "demo-cluster".

Install OpenEBS

Deploy OpenEBS to your Kubernetes cluster by applying this, which can be found from their github page:

$ kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

Give it some time for all the pods to check in and then have a look if all the pods under the openebs namespace is ready:

$ kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS   AGE
openebs-provisioner-c68bfd6d4-5n7kk            1/1     Running   0          5m52s
openebs-ndm-df25v                              1/1     Running   0          5m51s
openebs-snapshot-operator-7ffd685677-h8jpf     2/2     Running   0          5m52s
openebs-admission-server-889d78f96-t64m6       1/1     Running   0          5m50s
openebs-ndm-44k6n                              1/1     Running   0          5m51s
openebs-ndm-hpmg8                              1/1     Running   0          5m51s
openebs-localpv-provisioner-67bddc8568-shkqr   1/1     Running   0          5m49s
openebs-ndm-operator-5db67cd5bb-mfg5m          1/1     Running   0          5m50s
maya-apiserver-7f664b95bb-l87bm                1/1     Running   0          5m53s

We can see a list of storage classes that comes with OpenEBS:

$ kubectl get sc
NAME                        PROVISIONER                                                AGE
local-path (default)        rancher.io/local-path                                      28m
openebs-jiva-default        openebs.io/provisioner-iscsi                               5m43s
openebs-snapshot-promoter   volumesnapshot.external-storage.k8s.io/snapshot-promoter   5m41s
openebs-hostpath            openebs.io/local                                           5m41s
openebs-device              openebs.io/local                                           5m41s

Deploy Cassandra

Deploy a Cassandra Service to your Kubernetes Cluster:

$ kubectl apply -f https://raw.githubusercontent.com/ruanbekker/blog-assets/master/civo.com-openebs-kubernetes/manifests/cassandra/cassandra-service.yaml

The Cassandra Stateful Set has 3 replicas which is defined in the manifest as replicas: 3. I am also using the Jiva storage engine defined as volume.beta.kubernetes.io/storage-class: openebs-jiva-default which is best suited for running replicated block storage on nodes that make use of ephemeral storage on the Kubernetes worker nodes.

Let's deploy Cassandra to your Kubernetes Cluster:

$ kubectl apply -f https://raw.githubusercontent.com/ruanbekker/blog-assets/master/civo.com-openebs-kubernetes/manifests/cassandra/cassandra-statefulset.yaml

Give it some time for all the pods to check in, then have a look at the pods:

$ kubectl get pods
NAME                                                             READY   STATUS    RESTARTS   AGE
pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93-rep-ffb5b9f9-7nwfl      1/1     Running   0          5m39s
pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93-rep-ffb5b9f9-wr4hl      1/1     Running   0          5m39s
pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93-ctrl-6f95fd555-ldd86    2/2     Running   0          5m53s
pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93-rep-ffb5b9f9-2v75w      1/1     Running   1          5m39s
cassandra-0                                                      1/1     Running   0          5m53s
pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2-ctrl-bc4769cb4-c8stc    2/2     Running   0          3m40s
pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2-rep-7565997776-w2gqs    1/1     Running   0          3m37s
pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2-rep-7565997776-jk6c2    1/1     Running   0          3m37s
pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2-rep-7565997776-qq9bb    1/1     Running   1          3m37s
cassandra-1                                                      1/1     Running   0          3m40s
pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456-ctrl-55887dc66f-zjbx8   2/2     Running   0          117s
pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456-rep-7fbc8785b8-cqvdb    1/1     Running   0          114s
pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456-rep-7fbc8785b8-hclgs    1/1     Running   0          114s
pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456-rep-7fbc8785b8-85nqt    1/1     Running   1          114s
cassandra-2                                                      0/1     Running   0          118s

When we look at our persistent volumes:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS           REASON   AGE
pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93   5G         RWO            Delete           Bound    default/cassandra-data-cassandra-0   openebs-jiva-default            12m
pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2   5G         RWO            Delete           Bound    default/cassandra-data-cassandra-1   openebs-jiva-default            10m
pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456   5G         RWO            Delete           Bound    default/cassandra-data-cassandra-2   openebs-jiva-default            8m49s

And our persistent volume claims:

$ kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
cassandra-data-cassandra-0   Bound    pvc-b29a4e4b-20d7-4fd4-83d9-a155ab5c1a93   5G         RWO            openebs-jiva-default   13m
cassandra-data-cassandra-1   Bound    pvc-2fe5a138-6c1d-4bcd-a0e4-e653793abea2   5G         RWO            openebs-jiva-default   10m
cassandra-data-cassandra-2   Bound    pvc-a1ff3db9-d2fd-43a8-9095-c3529fe2e456   5G         RWO            openebs-jiva-default   9m11s

Interact with Cassandra

View the cluster status:

$ kubectl exec cassandra-0 -- nodetool status
Datacenter: dc1-civo-k3s
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.1.9   83.16 KiB  32           63.4%             f3debec3-77c7-438e-bf86-38630fa01c14  rack1-civo-k3s
UN  192.168.0.10  65.6 KiB   32           66.1%             3f78ec71-7485-477a-a095-44aa8fade0bd  rack1-civo-k3s
UN  192.168.2.12  84.73 KiB  32           70.5%             0160da34-5975-43d9-ab89-4c92cfe4ceeb  rack1-civo-k3s

Now let's connect to our cassandra cluster:

$ kubectl exec -it cassandra-0 -- cqlsh cassandra 9042 --cqlversion="3.4.2"
Connected to civo-k3s at cassandra:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh>

Then create a keyspace, table and write some dummy data to our cluster:

cqlsh> create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor': 2 };
cqlsh> use test;

cqlsh:test> create table people (id uuid, name varchar, age int, PRIMARY KEY ((id), name));

cqlsh:test> insert into people (id, name, age) values(uuid(), 'ruan', 28);
cqlsh:test> insert into people (id, name, age) values(uuid(), 'samantha', 25);
cqlsh:test> insert into people (id, name, age) values(uuid(), 'stefan', 29);
cqlsh:test> insert into people (id, name, age) values(uuid(), 'james', 25);
cqlsh:test> insert into people (id, name, age) values(uuid(), 'michelle', 30);
cqlsh:test> insert into people (id, name, age) values(uuid(), 'tim', 32);

Now let's read the data:

cqlsh:test> select * from people;

 id                                   | name     | age
--------------------------------------+----------+-----
 e3a2261d-77d3-4ed6-9daf-51e65ccf618f |      tim |  32
 594b24ee-e5bc-44cf-87cf-afed48a74df9 | samantha |  25
 ca65c207-8dd0-4dc0-99e3-1a0301128921 | michelle |  30
 1eeb2b77-57d0-4e10-b5d7-6dc58c43006e |    james |  25
 3cc12f26-395d-41d1-a954-f80f2c2ff88d |     ruan |  28
 755d52f2-5a27-4431-8591-7d34bacf7bee |   stefan |  29

(6 rows)

Test Data Persistence

Now, it's time to delete our pods and see if our data is persisted. First we will look at our pods to determine how long they are running and on which node they are running:

$ kubectl get pods --selector app=cassandra -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP             NODE               NOMINATED NODE   READINESS GATES
cassandra-0   1/1     Running   0          35m   192.168.0.10   kube-master-75a2   <none>           <none>
cassandra-1   1/1     Running   0          33m   192.168.1.9    kube-node-5d0a     <none>           <none>
cassandra-2   1/1     Running   0          30m   192.168.2.12   kube-node-aca8     <none>           <none>

Let's delete all our cassandra pods:

$ kubectl delete pod/cassandra-{0..2}
pod "cassandra-0" deleted
pod "cassandra-1" deleted
pod "cassandra-2" deleted

As we can see the first pod is busy starting:

$ kubectl get pods --selector app=cassandra -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP             NODE             NOMINATED NODE   READINESS GATES
cassandra-0   0/1     Running   0          37s   192.168.2.13   kube-node-aca8   <none>           <none>

Give it some time to allow all three pods to check in:

$ kubectl get pods --selector app=cassandra -o wide
NAME          READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
cassandra-0   1/1     Running   0          4m36s   192.168.2.13   kube-node-aca8     <none>           <none>
cassandra-1   1/1     Running   0          3m24s   192.168.0.17   kube-master-75a2   <none>           <none>
cassandra-2   1/1     Running   0          2m3s    192.168.1.12   kube-node-5d0a     <none>           <none>

Right, now that all pods has checked in, let's connect to our cluster and see if the data is still there:

$ kubectl exec -it cassandra-0 -- cqlsh cassandra 9042 --cqlversion="3.4.2"
Connected to civo-k3s at cassandra:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.

cqlsh> use test;
cqlsh:test> select * from people;

 id                                   | name     | age
--------------------------------------+----------+-----
 e3a2261d-77d3-4ed6-9daf-51e65ccf618f |      tim |  32
 594b24ee-e5bc-44cf-87cf-afed48a74df9 | samantha |  25
 ca65c207-8dd0-4dc0-99e3-1a0301128921 | michelle |  30
 1eeb2b77-57d0-4e10-b5d7-6dc58c43006e |    james |  25
 3cc12f26-395d-41d1-a954-f80f2c2ff88d |     ruan |  28
 755d52f2-5a27-4431-8591-7d34bacf7bee |   stefan |  29

(6 rows)
cqlsh:test> exit;

And as you can see the data is being persisted.

Thank You

I am super impressed with OpenEBS, have a look at their docs and also have a look at their examples on their github page.

Let me know what you think. If you liked my content, feel free to visit me at ruan.dev or follow me on twitter at @ruanbekker