SNMPzapp
High-throughput SNMP poller. Sharded collection, Avro-over-Kafka, OpenTelemetry, React control plane.
Go 1.22+ Kafka Schema Registry OTel React + Vite
How it flows
Devices in. Avro on Kafka out. Telemetry on the side. Sharded across pods.
data telemetry control · downstream
What it does
- Polls SNMP v1/v2c/v3 devices on a configurable cadence.
- Resolves OIDs to MIB names via a built-in table plus loadable text files.
- Serialises readings as Avro and produces them to Kafka with Schema Registry.
- Emits librdkafka stats, OTel metrics, and structured logs.
- Shards work across replicas using DNS-based peer discovery + FNV ownership.
- Ships a React UI for device + OID management, live readings, and producer health.
Quick start
1. Clone
git clone https://github.com/advayaflow/snmpzapp.git
cd snmpzapp 2. Run the full stack with Docker Compose
Brings up SNMPzapp, Kafka + Schema Registry, OTel collector, Prometheus, Loki, and Grafana.
docker compose up -d - UI
http://localhost:8080 - Grafana
http://localhost:3000(admin / admin) - Schema Registry
http://localhost:8081
3. Or run the binary directly
go build -o snmpzapp ./cmd/snmpzapp
./snmpzapp -config config.yaml Minimal config
poll_interval_seconds: 30
devices:
- name: router1
host: 10.0.0.1
version: "2c"
community: public
oids:
- 1.3.6.1.2.1.1.1.0 # sysDescr
- 1.3.6.1.2.1.1.3.0 # sysUpTime
kafka:
bootstrap_servers: localhost:9092
schema_registry_url: http://localhost:8081
default_topic: snmp.readings
client_id: snmpzapp
acks: all
compression_type: zstd
linger_ms: 20
shard:
service: snmpzapp.local
self_index: 0
count: 1
Architecture
Collector
gosnmp-based GET/WALK loop per device, MIB-resolved on emit.
Sharding
DNS lookup of peer service + FNV-32a hash of device name mod N.
Kafka producer pool
One librdkafka producer per unique (acks, compression, linger) tuple. Per-topic overrides spawn extras only when needed.
Avro schema
Typed value union (value_long / value_double / value_string) keeps queries cheap downstream.
Telemetry
librdkafka stats parsed into OTel gauges; structured logs via OTLP to Loki.
Control plane
React + Vite UI for device + OID CRUD, live readings, producer health.
Stress + scaling notes
- Single replica polls ~10k devices at 30s cadence on 4 vCPU.
- Scale horizontally: bump
shard.count, deploy N pods behind a headless Service. Each pod owns ~devices / N. - Kafka throughput follows linger + compression tuning. Default
zstd+linger_ms=20is a sane start.
Source
Repo: github.com/advayaflow/snmpzapp
Issues + roadmap live on GitHub. PRs welcome.