HA and redundancy for Chirpstack

Thanks for the help guys :smiling_face_with_tear:

I’m still open to any improvements, but I’ll leave my plan here for anyone else researching Chirpstack redundancy and high availability in the future:

General Network Flow

Overview:

  • Chirpstack depends on two databases: PostgreSQL for persistent storage and Redis for caching and temporary data. It uses the MQTT broker (EMQX) to communicate with gateways.
  • HAProxy will use LetsEncrypt certificates from certbot to terminate TLS from WAN connections and proxy the traffic to either Chirpstack (for clients connecting to web interface) or to the EMQX nodes (for gateways connecting via MQTT to Chirpstack).

Server Deployment - Redundancy and HA

General:

  • Stack run on 3 Proxmox Ubuntu LXC Containers distributed between 3 hosts.
  • Regular backups so if a host fails, the container can be restored and run on a new host, the services will automatically rejoin their clusters.
  • Since Proxmox HA doesn’t restore containers from backups, manually restoring backups rather than using the HA mode will prevent the databases from needing to do a full replication of the master, and prevent services from needing large updates.

Networking:

  • The DNS address that clients and gateways connect to should load-balance between the healthy HAProxies.
  • Requires 2 network interfaces: WAN interface and a private bridge between the three containers. HAProxy will expose itself on the WAN interface for client and gateway connections; other services can communicate with their clusters (on separate hosts) over the private bridge.

EMQX Cluster:

  • The three EMQX nodes will mirror each other, any message published on one will be published to the others.
  • Since the MQTT messages are stateless it doesn’t matter which broker Chirpstack or the gateways connect to at any time (sticky-sessions not required).

Postgres + Repmgr Cluster:

  • 1 master database, 2 standby databases.
  • Repmgr will handle the replication of the master database to standbys, as well as promoting a new master if the current master fails.

Redis Cluster:

  • Redis cluster shards the keys between the three masters using hashing.
  • It automatically handles replication of the masters to their replicas.
  • The cluster automatically detects if one of the masters goes down and will promote the associated replica.
  • Chirpstack natively supports redis clusters and will detect the masters automatically so HAProxy is not necessary for this connection.

HAProxy:

  • Incoming WAN connections should be load-balanced between the three proxies.
  • HAProxy will load-balance (round-robin) MQTT traffic between the three EMQX nodes. Prioritizing its own host (MQTT traffic will either be gateways connecting over WAN, or Chirpstack connecting over the private bridge).
  • HAProxy will load balance clients connecting to the web-interface (or API requests as they are the same endpoint) between the three Chirpstack instances. Prioritizing its own host.
  • For PSQL, custom health checks can be configured to poll Repmgr’s API for the current PostgreSQL master. Ensuring that database queries are always directed to the master.

Certbot:

  • Only a single instance of certbot to reduce LetsEncrypt requests.
  • Will use a cron job to automate certificate generation.
  • Use lsyncd on the two hosts without certbot to synchronize certificates and have the post_cmd trigger a graceful reload of HAProxy.
  • Certbot should renew the certs early so there is time if the certbot host fails to restore the backup before the certs expire and TLS fails.

This plan allows for 1 of the 3 servers to fail while still maintaining high availability and service continuity.

Implementation

Prerequisites:

  • 3 Proxmox LXC containers running Ubuntu 22.04 LTS, distributed between 3 hosts. 3GB memory, 16GB Disk, can scale later if needed.

  • A WAN interface and private bridge interface for the three containers

  • The HAProxies will expose themselves on the WAN interface and proxy traffic to the backend. The only ports that need to be open for this would be 443 (HTTPS), 80 (HTTP), 8883 (MQTT) and 22 (SSH for me).

  • The private bridge should allow the three Proxmox containers to communicate with each other so clusters can talk.

  • Each container should have a static IP or DNS on the private bridge so I can reference them from each other. Something like lorawan-host-1, lorawan-host-2, lorawan-host-3. Similarly, for the WAN interface they could have individual DNS for now but once I am finished we will have to set up a single DNS that load-balances between the three containers.

Setup Steps:

Setup lsyncd

  • Setup lsyncd on the two hosts that will not have certbot on them, this will be used for configurations (Chirpstack/MQTT/repmgr) as well as certificates.

Setup EMQX (MQTT) Cluster

  • Install one node on each host, mirroring each other
  • Test mirroring functionality

Setup HAProxy

  • Install HAProxy on each host.
  • Set up HAProxy to load-balance between the EMQX cluster, prioritizing its own host.
  • Test HAProxy load-balancing and health-checks for EMQX cluster

Setup Postgresql cluster

  • Install PSQL and repmgr on each host
  • Setup repmgr for PSQL health checks, slave replication, and master promotion
  • Test repmgr for the above
  • Setup HAProxy with custom health check to proxy Chirpstack queries to master
  • Test HAProxy for the above
  • Delete current database, execute PSQL setup commands for Chirpstack on master.

Setup Redis Cluster

  • Install the redis cluster on each host with one main node and one offset backup node.
  • Test promotions of slaves.

Setup Chirpstack

  • Install one instance on each host
  • Connect them to redis cluster, postgres, EMQX
  • Test functionality from UI
  • Set HAProxy to load-balance and health check instances, prioritizing its own host.

Setup Certbot

  • Setup certbot on a single host
  • Automate with cron job
  • Configure HAProxy to terminate TLS from WAN and pass unencrypted traffic to Chirpstack/EMQX.
  • Lsyncd certificates across hosts with post_cmd to reload HAProxy
  • Test HAProxy TLS and certificate generation

Post Requisites:

  • Add DNS load balancer / health checks to balance WAN traffic between the three hosts.
1 Like