Kafka — Durable Log Transport

Buffer logs in Kafka for reliability and fan-out to multiple sinks without data loss

10m 10m reading Lab included

The Problem

If Elasticsearch goes down, logs are lost. If the network between Vector and Elasticsearch drops, logs are lost. You need a buffer between collection and storage — something durable that holds logs until the downstream system recovers.

Why Kafka

Kafka is a distributed commit log. Once a message is written, it’s persisted to disk and replicated. It can buffer millions of messages while Elasticsearch is offline.

Vector Agent → Kafka → Vector Aggregator → Elasticsearch
                 ↓
            Persists to disk
            Retains for 7 days
            Multiple consumers

Kafka in the Pipeline

# infrastructure/docker-compose.kafka.yml
services:
  kafka:
    image: bitnami/kafka:3.7
    container_name: kafka
    environment:
      KAFKA_CFG_NODE_ID: 0
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: "true"
      KAFKA_CFG_LOG_RETENTION_HOURS: 168  # 7 days
    ports:
      - "9092:9092"
    volumes:
      - kafka-data:/bitnami/kafka

KRaft Mode (No ZooKeeper)

This uses Kafka’s built-in KRaft consensus — no ZooKeeper dependency. Simpler to deploy and operate.

How Kafka Helps the Log Pipeline

Scenario Without Kafka With Kafka
ES goes down 1 hour 1 hour of logs lost 0 logs lost (buffered)
Spike in log volume Vector overwhelms ES Kafka absorbs the spike
Add second consumer Change Vector config Add consumer group
Network partition Logs dropped Kafka retries

Log Retention

KAFKA_CFG_LOG_RETENTION_HOURS: 168  # 7 days

Kafka retains messages for 7 days regardless of consumption. If your aggregator is down for a weekend, all logs are still there on Monday.

Topic: app-logs

Vector Agent writes to the app-logs topic. Vector Aggregator reads from it. The topic is auto-created on first write.

For production, pre-create topics with explicit partition counts:

kafka-topics.sh --create \
  --topic app-logs \
  --partitions 6 \
  --replication-factor 3 \
  --bootstrap-server kafka:9092

Consumer Groups

# Vector Aggregator config
[sources.kafka]
type = "kafka"
group_id = "vector-aggregator"

The group_id enables Kafka to track what’s been consumed. If Vector Aggregator restarts, it resumes from the last committed offset — no duplicates, no gaps.

Next Step

In the next lesson, we build the Vector Aggregator — reading from Kafka, transforming events, and shipping to Elasticsearch.