Kafka Procedures

Intended audience: Anyone who is administering Kafka at the USDF.

Upgrading Strimzi Operator

Kafka clusters deployed with the Strimzi Phalanx app use the Strimzi version deployed in the Chart. The renovate bot in the Phalanx repo updates the Strimzi version automatically. To upgrade the Strimzi operator in ArgoCD follow the below instructions.

  1. Before upgrading Strimzi, ensure that the latest version of the operator is compatible with the Kubernetes and Kafka versions running in your cluster.

  2. After validating Refresh the Strimzi app in Phalanx and Sync.

  3. Observe the logs for any issues with the upgrade. If the currently deployed Kafka version is not supported by the latest operator, the operator will fail to initiate a Kafka rollout and will display an error. See Upgrading Kafka for instructions.

Upgrading Kafka

The Kafka upgrade process in Phalanx is detailed below.

  1. Before upgrading Kafka, ensure that the latest version of the operator is compatible with the Kubernetes and Kafka versions running in your cluster.

  2. Update the values file for the environment to be upgraded. Example below.

    kafka:
        version: "3.9.0"
    
  3. Deploy the change as a pull request to Phalanx. When changes are committed run Refresh the ArgoCD app for your Kafka instance in Phalanx and Sync.

Restart Kafka

A rolling update is used to restart Kafka. To restart follow the instructions here to add an annotation to perform a rolling restart. If the rolling update does not work the pods can be deleted. The PVC will not be deleted. The StrimziPodSet will handle recreation of the pods with the same PVC.

Shutdown Kafka Gracefully

Kafka can be shutdown gracefully if needed. To shut down gracefully, follow these steps below. Replace the Kafka cluster name and namespaces.

  1. Pause reconciliation of Strimzi resources. This will prevent the operator from restarting the pods after they are deleted.

    kubectl annotate --overwrite Kafka <replace with Kafka cluster name> strimzi.io/pause-reconciliation="true" -n <replace with namespace>
    
  2. Terminate the Kafka Controller and Broker Pods.

    kubectl delete StrimziPodSet <replace with name of cluster>-controller <replace with Kafka cluster name>  -n <replace with namespace>
    
  3. After the intervention, resume reconciliation of Strimzi resources. This will trigger the operator to start the Pods again.

    kubectl annotate --overwrite Kafka <replace with Kafka cluster name> strimzi.io/pause-reconciliation="false" -n <replace with namespace>
    

Add or Remove Kafka Cluster to Strimzi Operator

Each time a new Kafka instance is added or removed the watchNamespaces configuration in Strimzi should be updated. Below shows an example from the S3-File-Notifications Phalanx Strimzi app. Follow the normal Phalanx and ArgoCD process to perform a pull request, Refresh, and Sync changes to apply.

watchNamespaces:
    - "prompt-kafka"
    - "s3-file-notifications"

Configuring Kafka Networking

A load balancer needs to be configured if a Kafka cluster needs to be accessible outside of the vCluster. The service type needs to be changed to loadbalancer and allocateLoadBalancerNodePorts needs to be set to false for security. An example below with the Strimzi helm chart and Phalanx. Note that if loadbalancer services are already provisioned and allocateLoadBalancerNodePorts is set to true the services will need to be deleted to remove the node ports.

- name: external
  type: loadbalancer
  configuration:
    allocateLoadBalancerNodePorts: false

An address pool has to be assigned to the loadbalancer``service.  The ``sdf-rubin-ingest address pool is used for services that should be accessible inside S3DF only. As part of the a Kafka cluster provisioning with loadbalancer services IP Addresses are assigned. Obtain these IPs with kubectl get services -n <replace with namespace of cluster>. Add the metallb.io/loadBalancerIPs annotation to the Helm values file in Phalanx for the bootstrap and the brokers and deploy. An example below. Note this may be different if not using Phalanx.

externalListener:
  bootstrap:
    annotations:
      metallb.io/address-pool: sdf-rubin-ingest
      metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
  brokers:
    - broker: 0
      annotations:
        metallb.io/address-pool: sdf-rubin-ingest
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
    - broker: 1
      annotations:
        metallb.io/address-pool: sdf-rubin-ingest
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
    - broker: 3
      annotations:
        metallb.io/address-pool: sdf-rubin-ingest
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx

The sdf-dmz is used for services that need to be accessible outside USDF. Services need approval before using the sdf-dmz address pool. Open a Service Now Ticket to request a DMZ Services Cyber Exemption Request.

Once approved configure the load balancer. As part of the a Kafka cluster provisioning with loadbalancer service IP Addresses are assigned. Obtain these IPs with kubectl get services -n <replace with namespace of cluster>. Add the metallb.io/loadBalancerIPs annotation to the Helm values file in Phalanx for the bootstrap and the brokers and deploy. An example below. Note this may be different if not using Phalanx.

externalListener:
  bootstrap:
    annotations:
      metallb.io/address-pool: sdf-dmz
      metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
  brokers:
    - broker: 0
      annotations:
        metallb.io/address-pool: sdf-dmz
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
    - broker: 1
      annotations:
        metallb.io/address-pool: sdf-dmz
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx
    - broker: 3
      annotations:
        metallb.io/address-pool: sdf-dmz
        metallb.io/loadBalancerIPs: xxx.xxx.xxx.xxx