I was assigned to help solve a deployment problem Chirpstack, at work. Chirpstack runs within Kubernetes, within an on-prem installation. The cluster has no ingress controller, thus the GWB service is exposed with NodePort. All gateways use the UDP Packet Forwarder.
Normally, the GW traffic goes through a Nginx proxy, which routes to the computers containing the Kubernetes cluster. But the problem persists, even if we point the GW directly to one of the VMs running Kubernetes.
Problem: if I have one instance of GWB, there is no problem with downlinks. If I have multiple instances of the GWB, downlinks do not really seem to work - they usually do not get logged within the LoRaWAN frames view and neither does the GW receive them. It is not a 100% failure, as it would work with more attempts.
Unfortunately, I do not own this system and so I cannot just collect log messages and their configuration files from it. My relationship with that team is also rather complex, so the best I can do would be to try things out myself and ask on the community forum. It can be said that I belong to the parent company of this team’s.
So I would like to ask for tips - what sort of settings for Kubernetes might be required to ensure that downlinks will work properly, when we have a cluster containing multiple GWB instances? I cannot directly make changes the shape of this deployment either. But if there is a strong reason to change it, I could make a suggestion.
Since this problem affects downlinks and I also saw errors getting logged about GWB not having the downlink frame within its cache for the Tx ACK it receives, I once suggested that they enable ClientIP sessionAffinity for the GWB service - to ensure that the TX_ACK goes back to the GWB which triggered the downlink. It seemed to work for a short period of time, until they reboot the services while testing whether the problem was truly taken care of. This change was rolled back every since.
I believe I have a deep understanding of LoRaWAN and the gateway backhaul protocols, and also have some understanding of how Chirpstack’s data flows work from studying its logs and code. But I am quite clueless when it comes to Kubernetes.
I would be grateful for any help and tips provided! Thank you in advance. Especially since there is very little information I have, to begin with.
Not a direct answer, but please note that UDP is stateless. What you want to avoid is that each UDP packet from the gateway is routed to a different GW Bridge instance within the cluster, as this will mess up the downlink routing for sure.
Based on your description they are just using a K8s service which just load balances requests across available pods and explains the dropped downlinks.
You are right; they will need to set up sessionAffinity based on the ClientIP. We were looking to go this route with our setup but switched to the basicstation packet forwarder instead.
For sessionAffinity you will need to have an IngressController in Kubernetes. You will want to make sure that the ingress controller you use supports sessionAffinity based on clientIP addresses as not all of them do.
Thank you all for the help rendered so far.
I have two more questions, just to get a better understanding:
- Does this mean that using NodePort in this way is just unworkable?
- What would be the difference between using an ingress controller (with session affinity configured) and just setting “service.spec.sessionAffinity” to “ClientIP” (as described here) with the current method of using NodePort?
It all depends on how you are sending traffic into the cluster and there is a chance Kubernetes has added support for this, but I can’t find anything.
This github issue talks about the challenge of sessionAffinity at the service and why an Ingress Controller may be needed.
I think it all depends on how you send traffic into your cluster. If you use a load balancer like AWS ALB I think you will need an ingress controller to handle the PROXY protocol X-Forwarded-For header.
@sp193 did you manage to get this set up in the end?
Struggling with the exact same issue here
Just wondering if you actually ever got this to work?
Even with session affinity it seems the return packets are lost by the ingress controller.
However, if you got it to work before, maybe it’s just a configuration issue on my end.
Thanks in advance
No, we never ended up going this route. We switched to BasicStation and we are able to run multiple instances of that.
I would recommend using BasicStation over the UDP packet forwarder. The UDP packet forwarder hasn’t been updated since 2017 and according to The Things Stack has scalability and security drawbacks (Semtech UDP Packet Forwarder | The Things Stack for LoRaWAN).
Thanks for the quick reply @Greg_Bird !
Unfortunately we don’t have a choice because this is an implementation using Helium Network and their HPR sends Semtech UDP format…
Guess we need them to change that but for the time being will just have to use node affinity or something
We have 3 GWB running in our K8s cluster and it just works. I dont know what your problem is @sp193. The UDP is a steady connection. Once connected it keeps connected to the GWB. On the other side, the UDP connected GWB subscribes/publishes to the topic specific to the Gateway EUI. So Uplink and Downlink always gets to the right GWB, because only one of the 3 GWBs are subscribed to the GW EUI topic.