Run etcd clusters as a Kubernetes StatefulSet

Running etcd as a Kubernetes StatefulSet

Below demonstrates how to perform the static bootstrap process as a Kubernetes StatefulSet.

Example Manifest

This manifest contains a service and statefulset for deploying a static etcd cluster in kubernetes.

If you copy the contents of the manifest into a file named etcd.yaml, it can be applied to a cluster with this command.

$ kubectl apply --filename etcd.yaml

Upon being applied, wait for the pods to become ready.

$ kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
etcd-0   1/1     Running   0          24m
etcd-1   1/1     Running   0          24m
etcd-2   1/1     Running   0          24m

The container used in the example includes etcdctl and can be called directly inside the pods.

$ kubectl exec -it etcd-0 -- etcdctl member list -wtable
+------------------+---------+--------+-------------------------+-------------------------+------------+
|        ID        | STATUS  |  NAME  |       PEER ADDRS        |      CLIENT ADDRS       | IS LEARNER |
+------------------+---------+--------+-------------------------+-------------------------+------------+
| 4f98c3545405a0b0 | started | etcd-2 | http://etcd-2.etcd:2380 | http://etcd-2.etcd:2379 |      false |
| a394e0ee91773643 | started | etcd-0 | http://etcd-0.etcd:2380 | http://etcd-0.etcd:2379 |      false |
| d10297b8d2f01265 | started | etcd-1 | http://etcd-1.etcd:2380 | http://etcd-1.etcd:2379 |      false |
+------------------+---------+--------+-------------------------+-------------------------+------------+

To deploy with a self-signed certificate, refer to the commented configuration headings starting with ## TLS to find values that you can uncomment. Additional instructions for generating a cert with cert-manager is included in a section below.

# file: etcd.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: etcd
  namespace: default
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: etcd
  ##
  ## Ideally we would use SRV records to do peer discovery for initialization.
  ## Unfortunately discovery will not work without logic to wait for these to
  ## populate in the container. This problem is relatively easy to overcome by
  ## making changes to prevent the etcd process from starting until the records
  ## have populated. The documentation on statefulsets briefly talk about it.
  ##   https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id
  publishNotReadyAddresses: true
  ##
  ## The naming scheme of the client and server ports match the scheme that etcd
  ## uses when doing discovery with SRV records.
  ports:
  - name: etcd-client
    port: 2379
  - name: etcd-server
    port: 2380
  - name: etcd-metrics
    port: 8080
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: default
  name: etcd
spec:
  ##
  ## The service name is being set to leverage the service headlessly.
  ## https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
  serviceName: etcd
  ##
  ## If you are increasing the replica count of an existing cluster, you should
  ## also update the --initial-cluster-state flag as noted further down in the
  ## container configuration.
  replicas: 3
  ##
  ## For initialization, the etcd pods must be available to eachother before
  ## they are "ready" for traffic. The "Parallel" policy makes this possible.
  podManagementPolicy: Parallel
  ##
  ## To ensure availability of the etcd cluster, the rolling update strategy
  ## is used. For availability, there must be at least 51% of the etcd nodes
  ## online at any given time.
  updateStrategy:
    type: RollingUpdate
  ##
  ## This is label query over pods that should match the replica count.
  ## It must match the pod template's labels. For more information, see the
  ## following documentation:
  ##   https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
  selector:
    matchLabels:
      app: etcd
  ##
  ## Pod configuration template.
  template:
    metadata:
      ##
      ## The labeling here is tied to the "matchLabels" of this StatefulSet and
      ## "affinity" configuration of the pod that will be created.
      ##
      ## This example's labeling scheme is fine for one etcd cluster per
      ## namespace, but should you desire multiple clusters per namespace, you
      ## will need to update the labeling schema to be unique per etcd cluster.
      labels:
        app: etcd
      annotations:
        ##
        ## This gets referenced in the etcd container's configuration as part of
        ## the DNS name. It must match the service name created for the etcd
        ## cluster. The choice to place it in an annotation instead of the env
        ## settings is because there should only be 1 service per etcd cluster.
        serviceName: etcd
    spec:
      ##
      ## Configuring the node affinity is necessary to prevent etcd servers from
      ## ending up on the same hardware together.
      ##
      ## See the scheduling documentation for more information about this:
      ##   https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity
      affinity:
        ## The podAntiAffinity is a set of rules for scheduling that describe
        ## when NOT to place a pod from this StatefulSet on a node.
        podAntiAffinity:
          ##
          ## When preparing to place the pod on a node, the scheduler will check
          ## for other pods matching the rules described by the labelSelector
          ## separated by the chosen topology key.
          requiredDuringSchedulingIgnoredDuringExecution:
          ## This label selector is looking for app=etcd
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - etcd
            ## This topology key denotes a common label used on nodes in the
            ## cluster. The podAntiAffinity configuration essentially states
            ## that if another pod has a label of app=etcd on the node, the
            ## scheduler should not place another pod on the node.
            ##   https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname
            topologyKey: "kubernetes.io/hostname"
      ##
      ## Containers in the pod
      containers:
      ## This example only has this etcd container.
      - name: etcd
        image: quay.io/coreos/etcd:v3.5.15
        imagePullPolicy: IfNotPresent
        ports:
        - name: etcd-client
          containerPort: 2379
        - name: etcd-server
          containerPort: 2380
        - name: etcd-metrics
          containerPort: 8080
        ##
        ## These probes will fail over TLS for self-signed certificates, so etcd
        ## is configured to deliver metrics over port 8080 further down.
        ##
        ## As mentioned in the "Monitoring etcd" page, /readyz and /livez were
        ## added in v3.5.12. Prior to this, monitoring required extra tooling
        ## inside the container to make these probes work.
        ##
        ## The values in this readiness probe should be further validated, it
        ## is only an example configuration.
        readinessProbe:
          httpGet:
            path: /readyz
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 30
        ## The values in this liveness probe should be further validated, it
        ## is only an example configuration.
        livenessProbe:
          httpGet:
            path: /livez
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        env:
        ##
        ## Environment variables defined here can be used by other parts of the
        ## container configuration. They are interpreted by Kubernetes, instead
        ## of in the container environment.
        ##
        ## These env vars pass along information about the pod.
        - name: K8S_NAMESPACE
          valueFrom:
            fieldRef:
             fieldPath: metadata.namespace
        - name: HOSTNAME
          valueFrom:
            fieldRef:
             fieldPath: metadata.name
        - name: SERVICE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['serviceName']
        ##
        ## Configuring etcdctl inside the container to connect to the etcd node
        ## in the container reduces confusion when debugging.
        - name: ETCDCTL_ENDPOINTS
          value: $(HOSTNAME).$(SERVICE_NAME):2379
        ##
        ## TLS client configuration for etcdctl in the container.
        ## These files paths are part of the "etcd-client-certs" volume mount.
        # - name: ETCDCTL_KEY
        #   value: /etc/etcd/certs/client/tls.key
        # - name: ETCDCTL_CERT
        #   value: /etc/etcd/certs/client/tls.crt
        # - name: ETCDCTL_CACERT
        #   value: /etc/etcd/certs/client/ca.crt
        ##
        ## Use this URI_SCHEME value for non-TLS clusters.
        - name: URI_SCHEME
          value: "http"
        ## TLS: Use this URI_SCHEME for TLS clusters.
        # - name: URI_SCHEME
        # value: "https"
        ##
        ## If you're using a different container, the executable may be in a
        ## different location. This example uses the full path to help remove
        ## ambiguity to you, the reader.
        ## Often you can just use "etcd" instead of "/usr/local/bin/etcd" and it
        ## will work because the $PATH includes a directory containing "etcd".
        command:
        - /usr/local/bin/etcd
        ##
        ## Arguments used with the etcd command inside the container.
        args:
        ##
        ## Configure the name of the etcd server.
        - --name=$(HOSTNAME)
        ##
        ## Configure etcd to use the persistent storage configured below.
        - --data-dir=/data
        ##
        ## In this example we're consolidating the WAL into sharing space with
        ## the data directory. This is not ideal in production environments and
        ## should be placed in it's own volume.
        - --wal-dir=/data/wal
        ##
        ## URL configurations are parameterized here and you shouldn't need to
        ## do anything with these.
        - --listen-peer-urls=$(URI_SCHEME)://0.0.0.0:2380
        - --listen-client-urls=$(URI_SCHEME)://0.0.0.0:2379
        - --advertise-client-urls=$(URI_SCHEME)://$(HOSTNAME).$(SERVICE_NAME):2379
        ##
        ## This must be set to "new" for initial cluster bootstrapping. To scale
        ## the cluster up, this should be changed to "existing" when the replica
        ## count is increased. If set incorrectly, etcd makes an attempt to
        ## start but fail safely.
        - --initial-cluster-state=new
        ##
        ## Token used for cluster initialization. The recommendation for this is
        ## to use a unique token for every cluster. This example parameterized
        ## to be unique to the namespace, but if you are deploying multiple etcd
        ## clusters in the same namespace, you should do something extra to
        ## ensure uniqueness amongst clusters.
        - --initial-cluster-token=etcd-$(K8S_NAMESPACE)
        ##
        ## The initial cluster flag needs to be updated to match the number of
        ## replicas configured. When combined, these are a little hard to read.
        ## Here is what a single parameterized peer looks like:
        ##   etcd-0=$(URI_SCHEME)://etcd-0.$(SERVICE_NAME):2380
        - --initial-cluster=etcd-0=$(URI_SCHEME)://etcd-0.$(SERVICE_NAME):2380,etcd-1=$(URI_SCHEME)://etcd-1.$(SERVICE_NAME):2380,etcd-2=$(URI_SCHEME)://etcd-2.$(SERVICE_NAME):2380
        ##
        ## The peer urls flag should be fine as-is.
        - --initial-advertise-peer-urls=$(URI_SCHEME)://$(HOSTNAME).$(SERVICE_NAME):2380
        ##
        ## This avoids probe failure if you opt to configure TLS.
        - --listen-metrics-urls=http://0.0.0.0:8080
        ##
        ## These are some configurations you may want to consider enabling, but
        ## should look into further to identify what settings are best for you.
        # - --auto-compaction-mode=periodic
        # - --auto-compaction-retention=10m
        ##
        ## TLS client configuration for etcd, reusing the etcdctl env vars.
        # - --client-cert-auth
        # - --trusted-ca-file=$(ETCDCTL_CACERT)
        # - --cert-file=$(ETCDCTL_CERT)
        # - --key-file=$(ETCDCTL_KEY)
        ##
        ## TLS server configuration for etcdctl in the container.
        ## These files paths are part of the "etcd-server-certs" volume mount.
        # - --peer-client-cert-auth
        # - --peer-trusted-ca-file=/etc/etcd/certs/server/ca.crt
        # - --peer-cert-file=/etc/etcd/certs/server/tls.crt
        # - --peer-key-file=/etc/etcd/certs/server/tls.key
        ##
        ## This is the mount configuration.
        volumeMounts:
        - name: etcd-data
          mountPath: /data
        ##
        ## TLS client configuration for etcdctl
        # - name: etcd-client-tls
        #   mountPath: "/etc/etcd/certs/client"
        #   readOnly: true
        ##
        ## TLS server configuration
        # - name: etcd-server-tls
        #   mountPath: "/etc/etcd/certs/server"
        #   readOnly: true
      volumes:
      ##
      ## TLS client configuration
      # - name: etcd-client-tls
      #   secret:
      #     secretName: etcd-client-tls
      #     optional: false
      ##
      ## TLS server configuration
      # - name: etcd-server-tls
      #   secret:
      #     secretName: etcd-server-tls
      #     optional: false
  ##
  ## This StatefulSet will uses the volumeClaimTemplate field to create a PVC in
  ## the cluster for each replica. These PVCs can not be easily resized later.
  volumeClaimTemplates:
  - metadata:
      name: etcd-data
    spec:
      accessModes: ["ReadWriteOnce"]
      ##
      ## In some clusters, it is necessary to explicitly set the storage class.
      ## This example will end up using the default storage class.
      # storageClassName: ""
      resources:
        requests:
          storage: 1Gi

Generating Certificates

In this section, we use Helm to install an operator called cert-manager.

With cert-manager installed in the cluster, self-signed certificates can be generated in the cluster. These generated certificates get placed inside a secret object that can be attached as files in containers.

This is the helm command to install cert-manager.

$ helm upgrade --install --create-namespace --namespace cert-manager cert-manager cert-manager --repo https://charts.jetstack.io --set crds.enabled=true

This is an example ClusterIssuer configuration for generating self-signed certificates.

# file: issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned
spec:
  selfSigned: {}

This manifest creates Certificate objects for the client and server certs, referencing the ClusterIssuer “selfsigned”. The dnsNames should be an exhaustive list of valid hostnames for the certificates that cert-manager creates.

# file: certificates.yaml
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: etcd-server
  namespace: default
spec:
  secretName: etcd-server-tls
  issuerRef:
    name: selfsigned
    kind: ClusterIssuer
  commonName: etcd
  dnsNames:
  - etcd
  - etcd.default
  - etcd.default.svc.cluster.local
  - etcd-0
  - etcd-0.etcd
  - etcd-0.etcd.default
  - etcd-0.etcd.default.svc
  - etcd-0.etcd.default.svc.cluster.local
  - etcd-1
  - etcd-1.etcd
  - etcd-1.etcd.default
  - etcd-1.etcd.default.svc
  - etcd-1.etcd.default.svc.cluster.local
  - etcd-2
  - etcd-2.etcd
  - etcd-2.etcd.default
  - etcd-2.etcd.default.svc
  - etcd-2.etcd.default.svc.cluster.local
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: etcd-client
  namespace: default
spec:
  secretName: etcd-client-tls
  issuerRef:
    name: selfsigned
    kind: ClusterIssuer
  commonName: etcd
  dnsNames:
  - etcd
  - etcd.default
  - etcd.default.svc.cluster.local
  - etcd-0
  - etcd-0.etcd
  - etcd-0.etcd.default
  - etcd-0.etcd.default.svc
  - etcd-0.etcd.default.svc.cluster.local
  - etcd-1
  - etcd-1.etcd
  - etcd-1.etcd.default
  - etcd-1.etcd.default.svc
  - etcd-1.etcd.default.svc.cluster.local
  - etcd-2
  - etcd-2.etcd
  - etcd-2.etcd.default
  - etcd-2.etcd.default.svc
  - etcd-2.etcd.default.svc.cluster.local

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified August 5, 2024: formatting - it really doesnt matter (10d90f3)