Wednesday, July 02, 2025

๐Ÿงญ Our Kafka Kubernetes Setup Journey

From CrashLoops to Clarity: A Kafka Story with Punchlines and Pods


๐Ÿ•’ The Timeline (Kafka Chronicles)

  • Day 1: 

    • Installed Lens, deployed initial YAMLs, faced mysterious pod crash loops with no logs — panic set in.
    • Started brute-force debugging using kubectl, then switched to manual container testing via ctr.
  • Day 2: 

    • Narrowed down the issue to Bitnami image defaults, fought KRaft mode, fixed image version.
    • Built dynamic environment variable injection for broker.id, node.id, and advertised listeners.
  • Day 3: 

    • Introduced authentication via secrets, tested producers/consumers.
    • Added REST proxy, tested with cluster ID and REST API.
    • Finalized all YAML files and ran celebratory kafka-topics.sh --create like champions.

๐Ÿ›ซ Phase 0: The Glorious (and Naive) Start

Goal: Spin up a 3-node Apache Kafka cluster using Kubernetes (via Lens), Zookeeper-based (no KRaft), no PVCs, exposed via services and REST proxy.

Initial Stack:

  • Bitnami Kafka Docker image

  • Bitnami Zookeeper image

  • Kubernetes Lens

  • REST Proxy image (confluentinc)

  • Secrets via Kubernetes

Expectation: Easy.

Reality:

"Back-off restarting failed container kafka"

That one line haunted our logs for hours with no output. That single error message forced us to roll up our sleeves, dive into the container runtime, and take control like mad engineers on a runaway train.


๐Ÿ” Phase 1: Let the Logs Speak (and Yell)

Kafka wasn't happy. Errors included:

  • Kafka requires at least one process role to be set

  • KRaft mode requires a unique node.id

  • KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: unbound variable

๐Ÿคฆ Root Causes:

  • Bitnami defaults to KRaft, even if we want Zookeeper.

  • KAFKA_ENABLE_KRAFT=no was ignored in early attempts.

  • Forgot KAFKA_CFG_NODE_ID and KAFKA_CFG_PROCESS_ROLES

We started building dynamic broker/node IDs using this trick:

ordinal=${HOSTNAME##*-}
export KAFKA_CFG_NODE_ID=$ordinal
export KAFKA_BROKER_ID=$ordinal

๐Ÿ” Phase 2: Debugging the Bitnami Beast

We went full bare-metal using ctr to run Kafka containers manually with all debug flags:

ctr --debug run --rm -t \
  --hostname kafka-test-0 \
  --label name=kafka-test \
  --env BITNAMI_DEBUG=true \
  --env KAFKA_LOG4J_ROOT_LOGLEVEL=DEBUG \
  --env KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181 \
  --env KAFKA_BROKER_ID=0 \
  --env ALLOW_PLAINTEXT_LISTENER=yes \
  --env KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT \
  --env KAFKA_CFG_PROCESS_ROLES=broker \
  --env KAFKA_ENABLE_KRAFT=false \
  --env KAFKA_CLUSTER_ID=kafka-test-cluster-1234 \
  --env KAFKA_CFG_NODE_ID=0 \
  docker.io/bitnami/kafka:latest kafka-test

Based on the generated errors, I found out that whatever I do KRAFT is always enabled so I found out that I need to go to a previous version of bitnami/Kafka, I checked the helm chart and if you scroll down after the flags you can find out the version of Kafka used for each image and for that I needed to go to version: kafka:3.9.0-debian-12-r13

✅ Switching to Kafka 3.9.0 fixed issues — Kafka 4.0 enforces KRaft, breaking Zookeeper compatibility


๐Ÿ” Phase 3: Securing It with Secrets (and Sanity)

We created a Kubernetes Secret with the following mapping:

User Password
producer producer@kafkaserver
consumer consumer@kafkaserver
apiVersion: v1
kind: Secret
metadata:
  name: kafka-auth-secret
  namespace: kafka-server
  labels:
    app: kafka
    component: auth
type: Opaque
data:
  KAFKA_CLIENT_USERS: cHJvZHVjZXIsY29uc3VtZXI=
  KAFKA_CLIENT_PASSWORDS: cHJvZHVjZXJAa2Fma2FzZXJ2ZXIsY29uc3VtZXJAa2Fma2FzZXJ2ZXI=

Then injected into the StatefulSet via env.valueFrom.secretKeyRef.

 ✅ To generate the data values for KAFKA_CLIENT_USERS and KAFKA_CLIENT_PASSWORDS you need to use base64 command

echo -n 'producer@kafkaserver,consumer@kafkaserver' | base64

echo -n 'producer,consumer' | base64 


๐Ÿงช Phase 4: Testing Inside the Pod

To validate authentication, we ran:

Producer

kafka-console-producer.sh \
  --broker-list kafka-0.kafka.kafka-server.svc.cluster.local:9092 \
  --topic auth-test-topic \
  --producer.config <(echo "...with producer JAAS config...")

Consumer

kafka-console-consumer.sh \
  --bootstrap-server kafka-0.kafka.kafka-server.svc.cluster.local:9092 \
  --topic auth-test-topic \
  --from-beginning \
  --consumer.config <(echo "...with consumer JAAS config...")

๐ŸŒ Phase 5: REST Proxy Integration

We deployed Kafka REST Proxy and queried the cluster ID:

curl http://kafka-rest-proxy.kafka-server.svc.cluster.local:8082/v3/clusters

Then used:

curl -X POST -H "Content-Type: application/vnd.kafka.v3+json" \
  --data '{"topic_name": "test-topic", "partitions": 1, "replication_factor": 1}' \
  http://kafka-rest-proxy.kafka-server.svc.cluster.local:8082/v3/clusters/<cluster-id>/topics

It worked -- Eventually 


๐Ÿง  Technical Findings

  • Bitnami Kafka 4.0+ forces KRaft — use 3.9.0 for Zookeeper.

  • Set KAFKA_ENABLE_KRAFT=no, but also avoid any KRaft-only vars.

  • Use ordinal to dynamically assign broker.id and node.id.

  • Zookeeper must be deployed before Kafka.

  • REST Proxy works fine once cluster ID is fetched and correct Content-Type is used.

  • The Back-off restarting failed container error with no logs can often mean missing required env vars.


✅ Final Working YAMLs

Kafka StatefulSet (3 Nodes)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: kafka-server
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: kafka
          image: bitnami/kafka:3.9.0-debian-12-r13
          ports:
            - containerPort: 9092
          env:
            - name: KAFKA_CFG_ZOOKEEPER_CONNECT
              value: "zookeeper.kafka-server.svc.cluster.local:2181"
            - name: KAFKA_CFG_LISTENERS
              value: "PLAINTEXT://:9092"
            - name: KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP
              value: "PLAINTEXT:PLAINTEXT"
            - name: KAFKA_CFG_INTER_BROKER_LISTENER_NAME
              value: "PLAINTEXT"
            - name: KAFKA_ENABLE_KRAFT
              value: "no"
            - name: ALLOW_PLAINTEXT_LISTENER
              value: "yes"
            - name: KAFKA_CLIENT_USERS
              valueFrom:
                secretKeyRef:
                  name: kafka-auth-secret
                  key: KAFKA_CLIENT_USERS
            - name: KAFKA_CLIENT_PASSWORDS
              valueFrom:
                secretKeyRef:
                  name: kafka-auth-secret
                  key: KAFKA_CLIENT_PASSWORDS
            - name: BITNAMI_DEBUG
              value: "true"
            - name: KAFKA_LOG4J_ROOT_LOGLEVEL
              value: "DEBUG"
          command:
            - bash
            - -c
            - |
              ordinal=${HOSTNAME##*-}
              export KAFKA_CFG_NODE_ID=$ordinal
              export KAFKA_BROKER_ID=$ordinal
              export KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://${HOSTNAME}.kafka.kafka-server.svc.cluster.local:9092
              exec /opt/bitnami/scripts/kafka/entrypoint.sh /opt/bitnami/scripts/kafka/run.sh

✅ For production make sure to remove DEBUG related configurations 

Zookeeper StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zookeeper
  namespace: kafka-server
spec:
  serviceName: zookeeper
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: zookeeper
          image: bitnami/zookeeper:latest
          ports:
            - containerPort: 2181
              name: client
          env:
            - name: ALLOW_ANONYMOUS_LOGIN
              value: "yes"

REST Proxy Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-rest-proxy
  namespace: kafka-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-rest-proxy
  template:
    metadata:
      labels:
        app: kafka-rest-proxy
    spec:
      containers:
        - name: rest-proxy
          image: confluentinc/cp-kafka-rest:7.4.0
          ports:
            - containerPort: 8082
          env:
            - name: KAFKA_REST_BOOTSTRAP_SERVERS
              value: PLAINTEXT://kafka.kafka-server.svc.cluster.local:9092
            - name: KAFKA_REST_HOST_NAME
              value: kafka-rest-proxy

Services

---
apiVersion: v1
kind: Service
metadata:
  name: kafka
  namespace: kafka-server
spec:
  clusterIP: None
  selector:
    app: kafka
  ports:
    - name: kafka
      port: 9092
      targetPort: 9092
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper
  namespace: kafka-server
spec:
  clusterIP: None
  selector:
    app: zookeeper
  ports:
    - port: 2181
      targetPort: 2181
      name: client
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-rest-proxy
  namespace: kafka-server
spec:
  type: ClusterIP
  selector:
    app: kafka-rest-proxy
  ports:
    - port: 8082
      targetPort: 8082

Sunday, July 11, 2021

git_find_big Contrib Back

I am contributing back some modifications as my way to thanks Antony Stubbs for the git_find_big.sh script.

As per Antony Stubbs recommendations, I have added some documentation for the changes I made and also made more changes so this should be the latest version.

#!/bin/bash
#set -x

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# Did some modifications on the script - 08-July-2021 @author Khamis Siksek
# [KS] changed the size to kilo bytes
# [KS] added KB=1024 constant to be used later in size calculations
# [KS] changed " to ' where applicable
# [KS] added a check for the pack file if it exists or not
# [KS] made the number of returned big files become a passable parameter
# [KS] used topBigFilesNo=10 as default value if not passed
# [KS] changed `command` to $(command) where applicable
# [KS] put the output in formattedText and echo that variable
# [KS] added exit 0 in case of success and exit -1 in case of an error
# [KS] packFile might hold multiple idx files thats why I used $(echo ${packFile}) in verify-pack
# [KS] added a check on the size and compressedSize since if they are too small they will show wrong output
# [KS] changed the variable "y" to "object" to make more readable
# [KS] enclosed all variables with {} wherever applicable
# [KS] changed sort to regular sort instead of reverse and used tail instead of head
# [KS] added more types to grep -v in objects (was only chain now it contains commit and tree)
# [KS] added informative message for the user that this may take few minutes

# make the number of returned big files configurable and can be passed as a parameter
topBigFilesNo=${1};
[[ -z "${1}" ]] && topBigFilesNo=10;

# check if the pack file exists or not
packFile=$(ls -1S .git/objects/pack/pack-*.idx 2> /dev/null);
[[ $? != 0 ]] && echo "index pack file(s) in .git do not exist" && exit -1;

# informative message for the user
echo 'This may take few seconds(minutes) depending on the size of the repository, please wait ...';

objects=$(git verify-pack -v $(echo "${packFile}") | grep -v 'chain\|commit\|tree' | sort -k3n | tail -"${topBigFilesNo}");

# as they are big files its more reasonable to show the size in KiB
echo 'All sizes are in KiBs. The pack column is the size of the object, compressed, inside the pack file.';

# constant
KB=1024;

# set the internal field seperator to line break, to iterate easily over the verify-pack output
IFS=$'\n';

# preparing the header of the output
output='Size,Pack,SHA,Location';

# loop goes through the objects to check their sizes
for object in $objects
do
    # extract the size in kilobytes
    size=$(echo ${object} | cut -f 5 -d ' ');
    [[ ! -z ${size} ]] && size=$((${size}/${KB})) || size=0;

    # extract the compressed size in kilobytes
    compressedSize=$(echo ${object} | cut -f 6 -d ' ');
    [[ ! -z ${compressedSize} ]] && compressedSize=$((${compressedSize}/${KB})) || compressedSize=0;

    # extract the SHA
    sha=$(echo ${object} | cut -f 1 -d ' ');

    # find the objects location in the repository tree
    other=$(git rev-list --all --objects | grep ${sha});
    
    #lineBreak=$(echo -e "\n")
    output="${output}\n${size},${compressedSize},${other}";
done

formattedOutput=$(echo -e ${output} | column -t -s ', ');
echo "${formattedOutput}";

exit 0;