Wednesday, July 02, 2025

๐Ÿงญ Our Kafka Kubernetes Setup Journey

From CrashLoops to Clarity: A Kafka Story with Punchlines and Pods


๐Ÿ•’ The Timeline (Kafka Chronicles)

  • Day 1: 

    • Installed Lens, deployed initial YAMLs, faced mysterious pod crash loops with no logs — panic set in.
    • Started brute-force debugging using kubectl, then switched to manual container testing via ctr.
  • Day 2: 

    • Narrowed down the issue to Bitnami image defaults, fought KRaft mode, fixed image version.
    • Built dynamic environment variable injection for broker.id, node.id, and advertised listeners.
  • Day 3: 

    • Introduced authentication via secrets, tested producers/consumers.
    • Added REST proxy, tested with cluster ID and REST API.
    • Finalized all YAML files and ran celebratory kafka-topics.sh --create like champions.

๐Ÿ›ซ Phase 0: The Glorious (and Naive) Start

Goal: Spin up a 3-node Apache Kafka cluster using Kubernetes (via Lens), Zookeeper-based (no KRaft), no PVCs, exposed via services and REST proxy.

Initial Stack:

  • Bitnami Kafka Docker image

  • Bitnami Zookeeper image

  • Kubernetes Lens

  • REST Proxy image (confluentinc)

  • Secrets via Kubernetes

Expectation: Easy.

Reality:

"Back-off restarting failed container kafka"

That one line haunted our logs for hours with no output. That single error message forced us to roll up our sleeves, dive into the container runtime, and take control like mad engineers on a runaway train.


๐Ÿ” Phase 1: Let the Logs Speak (and Yell)

Kafka wasn't happy. Errors included:

  • Kafka requires at least one process role to be set

  • KRaft mode requires a unique node.id

  • KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: unbound variable

๐Ÿคฆ Root Causes:

  • Bitnami defaults to KRaft, even if we want Zookeeper.

  • KAFKA_ENABLE_KRAFT=no was ignored in early attempts.

  • Forgot KAFKA_CFG_NODE_ID and KAFKA_CFG_PROCESS_ROLES

We started building dynamic broker/node IDs using this trick:

ordinal=${HOSTNAME##*-}
export KAFKA_CFG_NODE_ID=$ordinal
export KAFKA_BROKER_ID=$ordinal

๐Ÿ” Phase 2: Debugging the Bitnami Beast

We went full bare-metal using ctr to run Kafka containers manually with all debug flags:

ctr --debug run --rm -t \
  --hostname kafka-test-0 \
  --label name=kafka-test \
  --env BITNAMI_DEBUG=true \
  --env KAFKA_LOG4J_ROOT_LOGLEVEL=DEBUG \
  --env KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181 \
  --env KAFKA_BROKER_ID=0 \
  --env ALLOW_PLAINTEXT_LISTENER=yes \
  --env KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT \
  --env KAFKA_CFG_PROCESS_ROLES=broker \
  --env KAFKA_ENABLE_KRAFT=false \
  --env KAFKA_CLUSTER_ID=kafka-test-cluster-1234 \
  --env KAFKA_CFG_NODE_ID=0 \
  docker.io/bitnami/kafka:latest kafka-test

Based on the generated errors, I found out that whatever I do KRAFT is always enabled so I found out that I need to go to a previous version of bitnami/Kafka, I checked the helm chart and if you scroll down after the flags you can find out the version of Kafka used for each image and for that I needed to go to version: kafka:3.9.0-debian-12-r13

✅ Switching to Kafka 3.9.0 fixed issues — Kafka 4.0 enforces KRaft, breaking Zookeeper compatibility


๐Ÿ” Phase 3: Securing It with Secrets (and Sanity)

We created a Kubernetes Secret with the following mapping:

User Password
producer producer@kafkaserver
consumer consumer@kafkaserver
apiVersion: v1
kind: Secret
metadata:
  name: kafka-auth-secret
  namespace: kafka-server
  labels:
    app: kafka
    component: auth
type: Opaque
data:
  KAFKA_CLIENT_USERS: cHJvZHVjZXIsY29uc3VtZXI=
  KAFKA_CLIENT_PASSWORDS: cHJvZHVjZXJAa2Fma2FzZXJ2ZXIsY29uc3VtZXJAa2Fma2FzZXJ2ZXI=

Then injected into the StatefulSet via env.valueFrom.secretKeyRef.

 ✅ To generate the data values for KAFKA_CLIENT_USERS and KAFKA_CLIENT_PASSWORDS you need to use base64 command

echo -n 'producer@kafkaserver,consumer@kafkaserver' | base64

echo -n 'producer,consumer' | base64 


๐Ÿงช Phase 4: Testing Inside the Pod

To validate authentication, we ran:

Producer

kafka-console-producer.sh \
  --broker-list kafka-0.kafka.kafka-server.svc.cluster.local:9092 \
  --topic auth-test-topic \
  --producer.config <(echo "...with producer JAAS config...")

Consumer

kafka-console-consumer.sh \
  --bootstrap-server kafka-0.kafka.kafka-server.svc.cluster.local:9092 \
  --topic auth-test-topic \
  --from-beginning \
  --consumer.config <(echo "...with consumer JAAS config...")

๐ŸŒ Phase 5: REST Proxy Integration

We deployed Kafka REST Proxy and queried the cluster ID:

curl http://kafka-rest-proxy.kafka-server.svc.cluster.local:8082/v3/clusters

Then used:

curl -X POST -H "Content-Type: application/vnd.kafka.v3+json" \
  --data '{"topic_name": "test-topic", "partitions": 1, "replication_factor": 1}' \
  http://kafka-rest-proxy.kafka-server.svc.cluster.local:8082/v3/clusters/<cluster-id>/topics

It worked -- Eventually 


๐Ÿง  Technical Findings

  • Bitnami Kafka 4.0+ forces KRaft — use 3.9.0 for Zookeeper.

  • Set KAFKA_ENABLE_KRAFT=no, but also avoid any KRaft-only vars.

  • Use ordinal to dynamically assign broker.id and node.id.

  • Zookeeper must be deployed before Kafka.

  • REST Proxy works fine once cluster ID is fetched and correct Content-Type is used.

  • The Back-off restarting failed container error with no logs can often mean missing required env vars.


✅ Final Working YAMLs

Kafka StatefulSet (3 Nodes)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: kafka-server
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: kafka
          image: bitnami/kafka:3.9.0-debian-12-r13
          ports:
            - containerPort: 9092
          env:
            - name: KAFKA_CFG_ZOOKEEPER_CONNECT
              value: "zookeeper.kafka-server.svc.cluster.local:2181"
            - name: KAFKA_CFG_LISTENERS
              value: "PLAINTEXT://:9092"
            - name: KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP
              value: "PLAINTEXT:PLAINTEXT"
            - name: KAFKA_CFG_INTER_BROKER_LISTENER_NAME
              value: "PLAINTEXT"
            - name: KAFKA_ENABLE_KRAFT
              value: "no"
            - name: ALLOW_PLAINTEXT_LISTENER
              value: "yes"
            - name: KAFKA_CLIENT_USERS
              valueFrom:
                secretKeyRef:
                  name: kafka-auth-secret
                  key: KAFKA_CLIENT_USERS
            - name: KAFKA_CLIENT_PASSWORDS
              valueFrom:
                secretKeyRef:
                  name: kafka-auth-secret
                  key: KAFKA_CLIENT_PASSWORDS
            - name: BITNAMI_DEBUG
              value: "true"
            - name: KAFKA_LOG4J_ROOT_LOGLEVEL
              value: "DEBUG"
          command:
            - bash
            - -c
            - |
              ordinal=${HOSTNAME##*-}
              export KAFKA_CFG_NODE_ID=$ordinal
              export KAFKA_BROKER_ID=$ordinal
              export KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://${HOSTNAME}.kafka.kafka-server.svc.cluster.local:9092
              exec /opt/bitnami/scripts/kafka/entrypoint.sh /opt/bitnami/scripts/kafka/run.sh

✅ For production make sure to remove DEBUG related configurations 

Zookeeper StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zookeeper
  namespace: kafka-server
spec:
  serviceName: zookeeper
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: zookeeper
          image: bitnami/zookeeper:latest
          ports:
            - containerPort: 2181
              name: client
          env:
            - name: ALLOW_ANONYMOUS_LOGIN
              value: "yes"

REST Proxy Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-rest-proxy
  namespace: kafka-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-rest-proxy
  template:
    metadata:
      labels:
        app: kafka-rest-proxy
    spec:
      containers:
        - name: rest-proxy
          image: confluentinc/cp-kafka-rest:7.4.0
          ports:
            - containerPort: 8082
          env:
            - name: KAFKA_REST_BOOTSTRAP_SERVERS
              value: PLAINTEXT://kafka.kafka-server.svc.cluster.local:9092
            - name: KAFKA_REST_HOST_NAME
              value: kafka-rest-proxy

Services

---
apiVersion: v1
kind: Service
metadata:
  name: kafka
  namespace: kafka-server
spec:
  clusterIP: None
  selector:
    app: kafka
  ports:
    - name: kafka
      port: 9092
      targetPort: 9092
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper
  namespace: kafka-server
spec:
  clusterIP: None
  selector:
    app: zookeeper
  ports:
    - port: 2181
      targetPort: 2181
      name: client
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-rest-proxy
  namespace: kafka-server
spec:
  type: ClusterIP
  selector:
    app: kafka-rest-proxy
  ports:
    - port: 8082
      targetPort: 8082