Run etcd clusters as a Kubernetes StatefulSet

Running etcd as a Kubernetes StatefulSet

Below demonstrates how to perform the static bootstrap process as a Kubernetes StatefulSet.

Example Manifest

This manifest contains a service and statefulset for deploying a static etcd cluster in kubernetes.

If you copy the contents of the manifest into a file named etcd.yaml, it can be applied to a cluster with this command.

$ kubectl apply --filename etcd.yaml

Upon being applied, wait for the pods to become ready.

$ kubectl get pods NAME READY STATUS RESTARTS AGE etcd-0 1/1 Running 0 24m etcd-1 1/1 Running 0 24m etcd-2 1/1 Running 0 24m

The container used in the example includes etcdctl and can be called directly inside the pods.

$ kubectl exec -it etcd-0 -- etcdctl member list -wtable +------------------+---------+--------+-------------------------+-------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+--------+-------------------------+-------------------------+------------+ | 4f98c3545405a0b0 | started | etcd-2 | http://etcd-2.etcd:2380 | http://etcd-2.etcd:2379 | false | | a394e0ee91773643 | started | etcd-0 | http://etcd-0.etcd:2380 | http://etcd-0.etcd:2379 | false | | d10297b8d2f01265 | started | etcd-1 | http://etcd-1.etcd:2380 | http://etcd-1.etcd:2379 | false | +------------------+---------+--------+-------------------------+-------------------------+------------+

To deploy with a self-signed certificate, refer to the commented configuration headings starting with ## TLS to find values that you can uncomment. Additional instructions for generating a cert with cert-manager is included in a section below.

# file: etcd.yaml --- apiVersion: v1 kind: Service metadata: name: etcd namespace: default spec: type: ClusterIP clusterIP: None selector: app: etcd ## ## Ideally we would use SRV records to do peer discovery for initialization. ## Unfortunately discovery will not work without logic to wait for these to ## populate in the container. This problem is relatively easy to overcome by ## making changes to prevent the etcd process from starting until the records ## have populated. The documentation on statefulsets briefly talk about it. ## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id publishNotReadyAddresses: true ## ## The naming scheme of the client and server ports match the scheme that etcd ## uses when doing discovery with SRV records. ports: - name: etcd-client port: 2379 - name: etcd-server port: 2380 - name: etcd-metrics port: 8080 --- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: default name: etcd spec: ## ## The service name is being set to leverage the service headlessly. ## https://kubernetes.io/docs/concepts/services-networking/service/#headless-services serviceName: etcd ## ## If you are increasing the replica count of an existing cluster, you should ## also update the --initial-cluster-state flag as noted further down in the ## container configuration. replicas: 3 ## ## For initialization, the etcd pods must be available to eachother before ## they are "ready" for traffic. The "Parallel" policy makes this possible. podManagementPolicy: Parallel ## ## To ensure availability of the etcd cluster, the rolling update strategy ## is used. For availability, there must be at least 51% of the etcd nodes ## online at any given time. updateStrategy: type: RollingUpdate ## ## This is label query over pods that should match the replica count. ## It must match the pod template's labels. For more information, see the ## following documentation: ## https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors selector: matchLabels: app: etcd ## ## Pod configuration template. template: metadata: ## ## The labeling here is tied to the "matchLabels" of this StatefulSet and ## "affinity" configuration of the pod that will be created. ## ## This example's labeling scheme is fine for one etcd cluster per ## namespace, but should you desire multiple clusters per namespace, you ## will need to update the labeling schema to be unique per etcd cluster. labels: app: etcd annotations: ## ## This gets referenced in the etcd container's configuration as part of ## the DNS name. It must match the service name created for the etcd ## cluster. The choice to place it in an annotation instead of the env ## settings is because there should only be 1 service per etcd cluster. serviceName: etcd spec: ## ## Configuring the node affinity is necessary to prevent etcd servers from ## ending up on the same hardware together. ## ## See the scheduling documentation for more information about this: ## https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity affinity: ## The podAntiAffinity is a set of rules for scheduling that describe ## when NOT to place a pod from this StatefulSet on a node. podAntiAffinity: ## ## When preparing to place the pod on a node, the scheduler will check ## for other pods matching the rules described by the labelSelector ## separated by the chosen topology key. requiredDuringSchedulingIgnoredDuringExecution: ## This label selector is looking for app=etcd - labelSelector: matchExpressions: - key: app operator: In values: - etcd ## This topology key denotes a common label used on nodes in the ## cluster. The podAntiAffinity configuration essentially states ## that if another pod has a label of app=etcd on the node, the ## scheduler should not place another pod on the node. ## https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname topologyKey: "kubernetes.io/hostname" ## ## Containers in the pod containers: ## This example only has this etcd container. - name: etcd image: quay.io/coreos/etcd:v3.5.15 imagePullPolicy: IfNotPresent ports: - name: etcd-client containerPort: 2379 - name: etcd-server containerPort: 2380 - name: etcd-metrics containerPort: 8080 ## ## These probes will fail over TLS for self-signed certificates, so etcd ## is configured to deliver metrics over port 8080 further down. ## ## As mentioned in the "Monitoring etcd" page, /readyz and /livez were ## added in v3.5.12. Prior to this, monitoring required extra tooling ## inside the container to make these probes work. ## ## The values in this readiness probe should be further validated, it ## is only an example configuration. readinessProbe: httpGet: path: /readyz port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 30 ## The values in this liveness probe should be further validated, it ## is only an example configuration. livenessProbe: httpGet: path: /livez port: 8080 initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 env: ## ## Environment variables defined here can be used by other parts of the ## container configuration. They are interpreted by Kubernetes, instead ## of in the container environment. ## ## These env vars pass along information about the pod. - name: K8S_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: HOSTNAME valueFrom: fieldRef: fieldPath: metadata.name - name: SERVICE_NAME valueFrom: fieldRef: fieldPath: metadata.annotations['serviceName'] ## ## Configuring etcdctl inside the container to connect to the etcd node ## in the container reduces confusion when debugging. - name: ETCDCTL_ENDPOINTS value: $(HOSTNAME).$(SERVICE_NAME):2379 ## ## TLS client configuration for etcdctl in the container. ## These files paths are part of the "etcd-client-certs" volume mount. # - name: ETCDCTL_KEY # value: /etc/etcd/certs/client/tls.key # - name: ETCDCTL_CERT # value: /etc/etcd/certs/client/tls.crt # - name: ETCDCTL_CACERT # value: /etc/etcd/certs/client/ca.crt ## ## Use this URI_SCHEME value for non-TLS clusters. - name: URI_SCHEME value: "http" ## TLS: Use this URI_SCHEME for TLS clusters. # - name: URI_SCHEME # value: "https" ## ## If you're using a different container, the executable may be in a ## different location. This example uses the full path to help remove ## ambiguity to you, the reader. ## Often you can just use "etcd" instead of "/usr/local/bin/etcd" and it ## will work because the $PATH includes a directory containing "etcd". command: - /usr/local/bin/etcd ## ## Arguments used with the etcd command inside the container. args: ## ## Configure the name of the etcd server. - --name=$(HOSTNAME) ## ## Configure etcd to use the persistent storage configured below. - --data-dir=/data ## ## In this example we're consolidating the WAL into sharing space with ## the data directory. This is not ideal in production environments and ## should be placed in it's own volume. - --wal-dir=/data/wal ## ## URL configurations are parameterized here and you shouldn't need to ## do anything with these. - --listen-peer-urls=$(URI_SCHEME)://0.0.0.0:2380 - --listen-client-urls=$(URI_SCHEME)://0.0.0.0:2379 - --advertise-client-urls=$(URI_SCHEME)://$(HOSTNAME).$(SERVICE_NAME):2379 ## ## This must be set to "new" for initial cluster bootstrapping. To scale ## the cluster up, this should be changed to "existing" when the replica ## count is increased. If set incorrectly, etcd makes an attempt to ## start but fail safely. - --initial-cluster-state=new ## ## Token used for cluster initialization. The recommendation for this is ## to use a unique token for every cluster. This example parameterized ## to be unique to the namespace, but if you are deploying multiple etcd ## clusters in the same namespace, you should do something extra to ## ensure uniqueness amongst clusters. - --initial-cluster-token=etcd-$(K8S_NAMESPACE) ## ## The initial cluster flag needs to be updated to match the number of ## replicas configured. When combined, these are a little hard to read. ## Here is what a single parameterized peer looks like: ## etcd-0=$(URI_SCHEME)://etcd-0.$(SERVICE_NAME):2380 - --initial-cluster=etcd-0=$(URI_SCHEME)://etcd-0.$(SERVICE_NAME):2380,etcd-1=$(URI_SCHEME)://etcd-1.$(SERVICE_NAME):2380,etcd-2=$(URI_SCHEME)://etcd-2.$(SERVICE_NAME):2380 ## ## The peer urls flag should be fine as-is. - --initial-advertise-peer-urls=$(URI_SCHEME)://$(HOSTNAME).$(SERVICE_NAME):2380 ## ## This avoids probe failure if you opt to configure TLS. - --listen-metrics-urls=http://0.0.0.0:8080 ## ## These are some configurations you may want to consider enabling, but ## should look into further to identify what settings are best for you. # - --auto-compaction-mode=periodic # - --auto-compaction-retention=10m ## ## TLS client configuration for etcd, reusing the etcdctl env vars. # - --client-cert-auth # - --trusted-ca-file=$(ETCDCTL_CACERT) # - --cert-file=$(ETCDCTL_CERT) # - --key-file=$(ETCDCTL_KEY) ## ## TLS server configuration for etcdctl in the container. ## These files paths are part of the "etcd-server-certs" volume mount. # - --peer-client-cert-auth # - --peer-trusted-ca-file=/etc/etcd/certs/server/ca.crt # - --peer-cert-file=/etc/etcd/certs/server/tls.crt # - --peer-key-file=/etc/etcd/certs/server/tls.key ## ## This is the mount configuration. volumeMounts: - name: etcd-data mountPath: /data ## ## TLS client configuration for etcdctl # - name: etcd-client-tls # mountPath: "/etc/etcd/certs/client" # readOnly: true ## ## TLS server configuration # - name: etcd-server-tls # mountPath: "/etc/etcd/certs/server" # readOnly: true volumes: ## ## TLS client configuration # - name: etcd-client-tls # secret: # secretName: etcd-client-tls # optional: false ## ## TLS server configuration # - name: etcd-server-tls # secret: # secretName: etcd-server-tls # optional: false ## ## This StatefulSet will uses the volumeClaimTemplate field to create a PVC in ## the cluster for each replica. These PVCs can not be easily resized later. volumeClaimTemplates: - metadata: name: etcd-data spec: accessModes: ["ReadWriteOnce"] ## ## In some clusters, it is necessary to explicitly set the storage class. ## This example will end up using the default storage class. # storageClassName: "" resources: requests: storage: 1Gi

Generating Certificates

In this section, we use Helm to install an operator called cert-manager.

With cert-manager installed in the cluster, self-signed certificates can be generated in the cluster. These generated certificates get placed inside a secret object that can be attached as files in containers.

This is the helm command to install cert-manager.

$ helm upgrade --install --create-namespace --namespace cert-manager cert-manager cert-manager --repo https://charts.jetstack.io --set crds.enabled=true

This is an example ClusterIssuer configuration for generating self-signed certificates.

# file: issuer.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: selfsigned spec: selfSigned: {}

This manifest creates Certificate objects for the client and server certs, referencing the ClusterIssuer “selfsigned”. The dnsNames should be an exhaustive list of valid hostnames for the certificates that cert-manager creates.

# file: certificates.yaml --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: etcd-server namespace: default spec: secretName: etcd-server-tls issuerRef: name: selfsigned kind: ClusterIssuer commonName: etcd dnsNames: - etcd - etcd.default - etcd.default.svc.cluster.local - etcd-0 - etcd-0.etcd - etcd-0.etcd.default - etcd-0.etcd.default.svc - etcd-0.etcd.default.svc.cluster.local - etcd-1 - etcd-1.etcd - etcd-1.etcd.default - etcd-1.etcd.default.svc - etcd-1.etcd.default.svc.cluster.local - etcd-2 - etcd-2.etcd - etcd-2.etcd.default - etcd-2.etcd.default.svc - etcd-2.etcd.default.svc.cluster.local --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: etcd-client namespace: default spec: secretName: etcd-client-tls issuerRef: name: selfsigned kind: ClusterIssuer commonName: etcd dnsNames: - etcd - etcd.default - etcd.default.svc.cluster.local - etcd-0 - etcd-0.etcd - etcd-0.etcd.default - etcd-0.etcd.default.svc - etcd-0.etcd.default.svc.cluster.local - etcd-1 - etcd-1.etcd - etcd-1.etcd.default - etcd-1.etcd.default.svc - etcd-1.etcd.default.svc.cluster.local - etcd-2 - etcd-2.etcd - etcd-2.etcd.default - etcd-2.etcd.default.svc - etcd-2.etcd.default.svc.cluster.local