Vector Agent — Lightweight Log Collection

Collect logs from files with Vector — no heavyweight agents, low resource usage, built in Rust

10m 10m reading Lab included

The Problem

Your app writes structured JSON logs to /var/log/app/app.log. These files sit on individual servers. You need to collect them, parse them, and send them to a central store — without installing a heavy agent (like Logstash) that competes for CPU and memory with your application.

Why Vector

Vector is written in Rust — it uses ~10MB of memory and minimal CPU. Compare with Logstash (JVM, 256MB+ heap) or Fluentd (Ruby, significant overhead).

Agent	Language	Memory	Config
Vector	Rust	~10MB	TOML
Logstash	Java (JVM)	256MB+	Custom DSL
Fluentd	Ruby	60MB+	XML-like
Filebeat	Go	~30MB	YAML

Vector Agent Configuration

# infrastructure/vector/vector-agent.toml

# SOURCES: Read log files
[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
read_from = "beginning"
fingerprint.strategy = "device_and_inode"

# TRANSFORMS: Parse JSON logs
[transforms.parse_json]
type = "remap"
inputs = ["app_logs"]
source = '''
parsed, err = parse_json(.message)
if err == null {
  . = merge(., parsed)
  del(.message)
} else {
  .raw_message = .message
  del(.message)
}
.source = "vector-agent"
.pipeline = "file-to-kafka"
.log_file = .file
'''

# SINKS: Forward to Kafka
[sinks.kafka]
type = "kafka"
inputs = ["parse_json"]
bootstrap_servers = "kafka:9092"
topic = "app-logs"
encoding.codec = "json"

[sinks.kafka.buffer]
type = "disk"
max_size = 268435456  # 256MB
when_full = "block"

Architecture — Sources → Transforms → Sinks

[File Source] → [VRL Transform] → [Kafka Sink]
    ↓                ↓                 ↓
 Read files     Parse JSON        Send to Kafka

Source: File

[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
read_from = "beginning"
fingerprint.strategy = "device_and_inode"

Tails all .log files in /var/log/app/
device_and_inode fingerprinting survives log rotation (file renamed, new file created)
Tracks read position in a checkpoint file — restarts pick up where they left off

Transform: VRL (Vector Remap Language)

source = '''
parsed, err = parse_json(.message)
if err == null {
  . = merge(., parsed)
  del(.message)
}
'''

VRL is Vector’s built-in language for transforming events:

Try to parse the log line as JSON
If successful, merge all fields into the top-level event
If not JSON, preserve as raw_message

Sink: Kafka

[sinks.kafka]
type = "kafka"
inputs = ["parse_json"]
bootstrap_servers = "kafka:9092"
topic = "app-logs"
encoding.codec = "json"

Forward processed events to Kafka topic app-logs.

Disk Buffer — No Lost Logs

[sinks.kafka.buffer]
type = "disk"
max_size = 268435456  # 256MB
when_full = "block"

If Kafka is temporarily unavailable, Vector buffers to disk. When Kafka recovers, buffered events are sent. when_full = "block" pauses reading (backpressure) rather than dropping logs.

Why File-Based Collection (Not Docker Socket)

Approach	Docker Socket	File-Based
Marathon	Unpredictable container names	Works via host mount
Docker Swarm	Socket access limited	Works via bind mounts
Kubernetes	Needs DaemonSet or sidecar	Works with emptyDir/hostPath
Security	Requires Docker socket access	Read-only file access

File-based collection is universal — it works the same way on every orchestrator.

Running Vector Agent

# infrastructure/docker-compose.vector.yml
services:
  vector-agent:
    image: timberio/vector:0.35.0-alpine
    container_name: vector-agent
    volumes:
      - ./vector/vector-agent.toml:/etc/vector/vector.toml:ro
      - app-logs:/var/log/app:ro
      - vector-data:/var/lib/vector

Next Step

In the next lesson, we add Kafka as a durable log transport — buffering logs for reliability.

Previous Lesson Prometheus Metrics — RED Method Next Lesson Kafka — Durable Log Transport