Kubernetes Storage Primitives for Stateful Workloads
Oct 28, 2017
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is Kubernetes?
Kubernetes
Kubernetes is the greek word for “Helmsman”; also the root of the word “Governor”.
What Kubernetes is/does:
- Container orchestrator
- Support multiple container runtimes (including runC from Docker)
- Support cloud and bare-metal cluster
- Inspired and informed by Google’s experience
- OpenSource, written in Go
Kubernetes Manage applications, not machines!
Separation of Concerns
You can separate your information system in 4 layers
- Application
- Cluster (Kubernetes is here!)
- Kernel/OS
- Hardware
Ideally, each layer should be replaceable in a transparent way. Kubernetes embrace this philosophy by being heavily based on APIs.
Kubernetes Goals
- Open API and implementation
- Modular/replaceable
- Don’t force apps to know about concepts that are:
- Cloud Provider Specific
- Kubernetes Specific
- Enable Users To
- Write once, run anywhere
- Avoid vendor lock-in
- Avoid coupling app to infrastructure
Now let’s dig into the “pod” concept in Kubernetes. It is equivalent to a “node” in Docker Swarm.
Pods
A pod is the atomic piece to be deployed. It is composed of a small set of containers and volumes that are tightly coupled.
Some of its main properties are:
- A shared namespace
- containers share IP address & localhost
- share IPC, etc..
- A managed lifecycle
- a pod is bound to a node, it restart in placement
- a pod can die and cannot be reborn with same ID
Example:
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: filepuller
image: saadali/filepuller:v2
- name: webserver
image: saadali/webserver:v3
File modifications in a container are bound to the container instance only therefore a container’s termination/crash results in loss of data. This is particularly problematic for stateful apps or in the case where containers need to share files.
The Kubernetes Volume abstraction solves both of these problems.
Kubernetes Volumes
Kubernetes Volumes differ from Docker Volumes. In Docker, a volume is simply a directory on disk or in another container. Lifetimes are not managed and until very recently there were only local-disk-backed volumes. A Kubernetes volume, on the other hand, has an explicit lifetime.
A Kubernetes volume is:
- A directory, possibly with some data in it
- Accessible by all containers in pods
Volume plugins define:
- How directory is setup
- Medium that backs it
- Contents of the directory
A volume’s lifetime is the same as the pod or longer.
More importantly, Kubernetes supports many types of volumes.
Kubernetes Volume plugins
Kubernetes has many volume plugins:
- Remote Storage:
- GCE Persistent disk
- AWS
- Azure (FS & Data Disk)
- Dell EMC ScaleIO
- iSCSI
- Flocker
- NFS
- vSphere
- GlusterFS
- Ceph File and RBD
- Cinder
- Quobyte Volume
- FibreChannel
- VMware Photon PD
- Ephemeral Storage
- Empty dir (tmpfs)
- Expose Kubernetes API
- Secret
- ConfigMap
- DownwardAPI
- Local Storage (Alpha)
- Containers exposing software-based storage
- Out-of-Tree
- Flex (exec a binary, allows to use external drivers)
- CSI (Cloud Storage Interface, generic API specification for containers to define storage access, will come in a future release)
- Other:
- Host path
Since Kubernetes is open, tierce-party storage may be available with out-of-tree plugins.
To ensure inter-operability between cluster orchestrator, CloudFoundry, Mesos, and Kubernetes are working on standard out-of-tree API for “universal” storage container with the CSI.
GCE PD Example
Volume can be referenced directly, for example:
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: sleepypod
spec:
volumes:
- name: data
- gcePersistentDisk:
pdName: panda-disk
fsType: ext4
containers:
- name: sleepycontainer
image: ...
command:
- sleep
- "6000"
volumeMounts:
- name: data
mountPath: /data
readonly: false
However, directly referencing a volume is “like tatooing the name of your girlfriend on your arm when you’re 16”, it may look like a good idea because you think it will last forever, but it generally doesn’t
Persistent Volume & Claims (PVC)
So the main principle is to separate persistent volume declaration from the pod.
First we declare the persistent volumes through a specific process. Then we bind a pod to an available volume through a persistent volume claim.
PV Example
Let’s create persistent volumes pv1 with 10GiB and pv2 with 100GiB, here is pv2 definition as an example:
# pv2.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name : mypv2
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 100Gi
persistentVolumeReclaimPolicy: Retain
gcePersistentDisk:
fsType: ext4
pdName: panda-disk2
And here is how we create them:
$ kubectl create -f pv1.yaml
persistentvolume "pv1" created
$ kubectl create -f pv2.yaml
persistentvolume "pv2" created
$ kubectl get pv
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv1 10Gi RWO Available 1m
pv2 100Gi RWO Available 1m
PVC Example
Now that we have unused persistent volumes, we can claim a container through a PVC:
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
namespace: testns
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
When a claim is created, kubernetes will bound the claim with an available persistent volume:
$ kubectl create -f pvc.yaml
persistentvolumeclaim "mypvc" created
$ kubectl get pv
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv1 10Gi RWO Available 3m
pv2 100Gi RWO Bound testns/mypvc 3m
You can directly configure a PVC directly in the pod declaration:
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: sleepypod
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: mypvc
containers:
- name: sleepycontainer
image: gcr.io/google_containers/busybox
command:
- sleep
- "6000"
volumeMounts:
- name: data
mountPath: /data
readOnly: false
Dynamic Provisioning and Storage Classes
- Allows storage to be created on-demand (when requested by user).
- Eliminates need for cluster administrators to pre-provision storage.
- Cluster/Storage admins “enable” dynamic provisioning by creating StorageClass
- StorageClass defines the parameters used during creation.
- StorageClass parameters are opaque to Kubernetes so storage providers can expose any number of custom parameters for the cluster admin to use.
Here’s how you declare a StorageClass:
# sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
--
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
- Users consume storage the same way with a PVC
- “Selecting” a storage class in PVC triggers dynamic provisioning
Here’s how to create a PVC with StorageClass:
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
namespace: testns
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast
$ kubectl create -f storage_class.yaml
storageclass "fast" created
$ kubectl create -f pvc.yaml
persistentvolumeclaim "mypvc" created
$ kubectl get pvc --all-namespaces
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
testns mypvc Bound pvc-331d7407-fe18-11e6-b7cd-42010a8000cd 100Gi RWO 6s
$ kubectl get pv pvc-331d7407-fe18-11e6-b7cd-42010a8000cd
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-331d7407-fe18-11e6-b7cd-42010a8000cd 100Gi RWO Delete Bound testns/mypvc 13m
And then the user references the volume via PVC
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: sleepypod
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: mypvc
containers:
- name: sleepycontainer
image: gcr.io/google_containers/busybox
command:
- sleep
- "6000"
volumeMounts:
- name: data
mountPath: /data
readOnly: false
Default Storage Class
Default Storage Class allows dynamic provisioning even when a StorageClass is not specified in PVC.
Pre-installed Default Storage Classes:
- Amazon AWS - EBS volume
- Google Cloud (GCE/GKE) - GCE PD
- Openstack - Cinder Volume
Default Storage Class feature was introduced as alpha in Kubernetes 1.2 (GA as of 1.6)
What’s Next for Kubernetes Storage?
Kubernetes Storage is investing in:
- Container Storage Interface (CSI)
- Standardized Out-of-Tree File and Block Volume Plugins
- Local Storage
- Making node local storage available as persistent volume
- Capacity Isolation
- Setting up limits so that a single pod can’t consume all available node storage via overlay FS, logs, etc.
Impressions
Kubernetes provides, through APIs and plugins/drivers, a clean, agnostic, standardized way to declare and use volumes in your Kubernetes container. With this feature, you can actually migrate your FS backend cleanly and easily. Convergence and standardization of these plugins with solutions like CloudFoundry, Docker Swarm, and Mesos look like in progress, but only few information are available.