I’m looking to add High Availability and redundancy to my postgres/redis/MQTT and was just looking for some suggestions on what people are currently using and what is supported.
For redis I am looking at using Redis Sentinel, although I feel I’ve seen comments that sentinel is not supported in the past. Is redis cluster the typical implementation instead?
For postgres I am currently looking at using Patroni for handling failover to standby databases. Then have Chirpstack’s postgresql endpoint be an HAProxy instance with custom health checks to query the Patroni rest API for the master database and then proxy Chirpstacks queries to the master database.
As for the MQTT broker I’ve seen a lot of people using EMQX. If I go with an EMQX cluster how do I specify that in the Chirpstack backend. Does EMQX support a single endpoint for the cluster that I can specify in Chirpstack, or would I need to use HA-proxy here as well to have Chirpstack connect to whatever broker is up?
Any comments or suggestions on what people are currently using would be greatly appreciated!
Chirpstack depends on two databases: PostgreSQL for persistent storage and Redis for caching and temporary data. It uses the MQTT broker (EMQX) to communicate with gateways.
HAProxy will use LetsEncrypt certificates from certbot to terminate TLS from WAN connections and proxy the traffic to either Chirpstack (for clients connecting to web interface) or to the EMQX nodes (for gateways connecting via MQTT to Chirpstack).
Stack run on 3 Proxmox Ubuntu LXC Containers distributed between 3 hosts.
Regular backups so if a host fails, the container can be restored and run on a new host, the services will automatically rejoin their clusters.
Since Proxmox HA doesn’t restore containers from backups, manually restoring backups rather than using the HA mode will prevent the databases from needing to do a full replication of the master, and prevent services from needing large updates.
Networking:
The DNS address that clients and gateways connect to should load-balance between the healthy HAProxies.
Requires 2 network interfaces: WAN interface and a private bridge between the three containers. HAProxy will expose itself on the WAN interface for client and gateway connections; other services can communicate with their clusters (on separate hosts) over the private bridge.
EMQX Cluster:
The three EMQX nodes will mirror each other, any message published on one will be published to the others.
Since the MQTT messages are stateless it doesn’t matter which broker Chirpstack or the gateways connect to at any time (sticky-sessions not required).
Postgres + Repmgr Cluster:
1 master database, 2 standby databases.
Repmgr will handle the replication of the master database to standbys, as well as promoting a new master if the current master fails.
Redis Cluster:
Redis cluster shards the keys between the three masters using hashing.
It automatically handles replication of the masters to their replicas.
The cluster automatically detects if one of the masters goes down and will promote the associated replica.
Chirpstack natively supports redis clusters and will detect the masters automatically so HAProxy is not necessary for this connection.
HAProxy:
Incoming WAN connections should be load-balanced between the three proxies.
HAProxy will load-balance (round-robin) MQTT traffic between the three EMQX nodes. Prioritizing its own host (MQTT traffic will either be gateways connecting over WAN, or Chirpstack connecting over the private bridge).
HAProxy will load balance clients connecting to the web-interface (or API requests as they are the same endpoint) between the three Chirpstack instances. Prioritizing its own host.
For PSQL, custom health checks can be configured to poll Repmgr’s API for the current PostgreSQL master. Ensuring that database queries are always directed to the master.
Certbot:
Only a single instance of certbot to reduce LetsEncrypt requests.
Will use a cron job to automate certificate generation.
Use lsyncd on the two hosts without certbot to synchronize certificates and have the post_cmd trigger a graceful reload of HAProxy.
Certbot should renew the certs early so there is time if the certbot host fails to restore the backup before the certs expire and TLS fails.
This plan allows for 1 of the 3 servers to fail while still maintaining high availability and service continuity.
Implementation
Prerequisites:
3 Proxmox LXC containers running Ubuntu 22.04 LTS, distributed between 3 hosts. 3GB memory, 16GB Disk, can scale later if needed.
A WAN interface and private bridge interface for the three containers
The HAProxies will expose themselves on the WAN interface and proxy traffic to the backend. The only ports that need to be open for this would be 443 (HTTPS), 80 (HTTP), 8883 (MQTT) and 22 (SSH for me).
The private bridge should allow the three Proxmox containers to communicate with each other so clusters can talk.
Each container should have a static IP or DNS on the private bridge so I can reference them from each other. Something like lorawan-host-1, lorawan-host-2, lorawan-host-3. Similarly, for the WAN interface they could have individual DNS for now but once I am finished we will have to set up a single DNS that load-balances between the three containers.
Setup Steps:
Setup lsyncd
Setup lsyncd on the two hosts that will not have certbot on them, this will be used for configurations (Chirpstack/MQTT/repmgr) as well as certificates.
Setup EMQX (MQTT) Cluster
Install one node on each host, mirroring each other
Test mirroring functionality
Setup HAProxy
Install HAProxy on each host.
Set up HAProxy to load-balance between the EMQX cluster, prioritizing its own host.
Test HAProxy load-balancing and health-checks for EMQX cluster
Setup Postgresql cluster
Install PSQL and repmgr on each host
Setup repmgr for PSQL health checks, slave replication, and master promotion
Test repmgr for the above
Setup HAProxy with custom health check to proxy Chirpstack queries to master
Test HAProxy for the above
Delete current database, execute PSQL setup commands for Chirpstack on master.
Setup Redis Cluster
Install the redis cluster on each host with one main node and one offset backup node.
Test promotions of slaves.
Setup Chirpstack
Install one instance on each host
Connect them to redis cluster, postgres, EMQX
Test functionality from UI
Set HAProxy to load-balance and health check instances, prioritizing its own host.
Setup Certbot
Setup certbot on a single host
Automate with cron job
Configure HAProxy to terminate TLS from WAN and pass unencrypted traffic to Chirpstack/EMQX.
Lsyncd certificates across hosts with post_cmd to reload HAProxy
Test HAProxy TLS and certificate generation
Post Requisites:
Add DNS load balancer / health checks to balance WAN traffic between the three hosts.