Multiple GWB instances in Kubernetes

sp193 · June 11, 2022, 7:49am

Hi all,

I was assigned to help solve a deployment problem Chirpstack, at work. Chirpstack runs within Kubernetes, within an on-prem installation. The cluster has no ingress controller, thus the GWB service is exposed with NodePort. All gateways use the UDP Packet Forwarder.
Normally, the GW traffic goes through a Nginx proxy, which routes to the computers containing the Kubernetes cluster. But the problem persists, even if we point the GW directly to one of the VMs running Kubernetes.

Problem: if I have one instance of GWB, there is no problem with downlinks. If I have multiple instances of the GWB, downlinks do not really seem to work - they usually do not get logged within the LoRaWAN frames view and neither does the GW receive them. It is not a 100% failure, as it would work with more attempts.

Unfortunately, I do not own this system and so I cannot just collect log messages and their configuration files from it. My relationship with that team is also rather complex, so the best I can do would be to try things out myself and ask on the community forum. It can be said that I belong to the parent company of this team’s.

So I would like to ask for tips - what sort of settings for Kubernetes might be required to ensure that downlinks will work properly, when we have a cluster containing multiple GWB instances? I cannot directly make changes the shape of this deployment either. But if there is a strong reason to change it, I could make a suggestion.

Since this problem affects downlinks and I also saw errors getting logged about GWB not having the downlink frame within its cache for the Tx ACK it receives, I once suggested that they enable ClientIP sessionAffinity for the GWB service - to ensure that the TX_ACK goes back to the GWB which triggered the downlink. It seemed to work for a short period of time, until they reboot the services while testing whether the problem was truly taken care of. This change was rolled back every since.

I believe I have a deep understanding of LoRaWAN and the gateway backhaul protocols, and also have some understanding of how Chirpstack’s data flows work from studying its logs and code. But I am quite clueless when it comes to Kubernetes.

I would be grateful for any help and tips provided! Thank you in advance. Especially since there is very little information I have, to begin with.

brocaar · June 13, 2022, 11:15am

Not a direct answer, but please note that UDP is stateless. What you want to avoid is that each UDP packet from the gateway is routed to a different GW Bridge instance within the cluster, as this will mess up the downlink routing for sure.

Greg_Bird · June 13, 2022, 10:10pm

Based on your description they are just using a K8s service which just load balances requests across available pods and explains the dropped downlinks.

You are right; they will need to set up sessionAffinity based on the ClientIP. We were looking to go this route with our setup but switched to the basicstation packet forwarder instead.

For sessionAffinity you will need to have an IngressController in Kubernetes. You will want to make sure that the ingress controller you use supports sessionAffinity based on clientIP addresses as not all of them do.

sp193 · June 14, 2022, 12:59am

Thank you all for the help rendered so far.

I have two more questions, just to get a better understanding:

Does this mean that using NodePort in this way is just unworkable?
What would be the difference between using an ingress controller (with session affinity configured) and just setting “service.spec.sessionAffinity” to “ClientIP” (as described here) with the current method of using NodePort?

Greg_Bird · June 14, 2022, 4:05am

It all depends on how you are sending traffic into the cluster and there is a chance Kubernetes has added support for this, but I can’t find anything.

This github issue talks about the challenge of sessionAffinity at the service and why an Ingress Controller may be needed.

I think it all depends on how you send traffic into your cluster. If you use a load balancer like AWS ALB I think you will need an ingress controller to handle the PROXY protocol X-Forwarded-For header.

shawaj · November 6, 2023, 7:04pm

@sp193 did you manage to get this set up in the end?

Struggling with the exact same issue here

shawaj · November 7, 2023, 2:41pm

Hi @Greg_Bird,

Just wondering if you actually ever got this to work?

Even with session affinity it seems the return packets are lost by the ingress controller.

However, if you got it to work before, maybe it’s just a configuration issue on my end.

Thanks in advance

Greg_Bird · November 7, 2023, 4:02pm

No, we never ended up going this route. We switched to BasicStation and we are able to run multiple instances of that.

I would recommend using BasicStation over the UDP packet forwarder. The UDP packet forwarder hasn’t been updated since 2017 and according to The Things Stack has scalability and security drawbacks (Semtech UDP Packet Forwarder | The Things Stack for LoRaWAN).

shawaj · November 7, 2023, 5:58pm

Thanks for the quick reply @Greg_Bird !

Unfortunately we don’t have a choice because this is an implementation using Helium Network and their HPR sends Semtech UDP format…

Guess we need them to change that but for the time being will just have to use node affinity or something

HofIoT · November 10, 2023, 3:09pm

We have 3 GWB running in our K8s cluster and it just works. I dont know what your problem is @sp193. The UDP is a steady connection. Once connected it keeps connected to the GWB. On the other side, the UDP connected GWB subscribes/publishes to the topic specific to the Gateway EUI. So Uplink and Downlink always gets to the right GWB, because only one of the 3 GWBs are subscribed to the GW EUI topic.

Jake_Murphy · January 14, 2025, 1:58pm

Hi,

I know it’s quite a bit of time since you answered this, but would you be able to give a bit of context?

Did you have to setup a LoadBalancer, set any session affinity as others here mentioned.

Did it require anything extra, or did you simply scale up your pods?

sp193 · January 14, 2025, 2:30pm

If you mean me, the production system I touched in 2022 was scrapped and replaced due to failing to make it work well. We started over with my team’s product, which started off with our proprietary LNS, before we migrated to Chirpstack. We did not encounter the vast number of problems that the original deployment had.

Due to many key personnel resigning with no replacement, I do touch the K8S in baremetal and the cloud. But because of severe lack of resources, I did not bother trying multiple gateway bridges. Anyway, I do not think having multiple copies would do anything because Golang compiles into machine code, which is very efficient.

Jake_Murphy · January 14, 2025, 2:37pm

Super fast response, thanks so much! Sorry to hear about the shortages and talent loss you had to endure.

Liam_Philipp · January 14, 2025, 2:41pm

Presumably though yes, if you are using the UDP forwarder on your gateways, with multiple GWBs on the server, you want to ensure each gateway sends to the same GWB. Firstly so UDP packets do not get split between GWBs but also so multiple GWBs do not respond to the same downlink command for a single gateway.

Installing the GWB/MQTT forwarder on your gateway should mitigate these worries though if that is a viable option for you.

Jake_Murphy · January 14, 2025, 3:33pm

Unfortunately not. We have gateways that come with a pre-installed Common Packet Forwarder and I’ve seen in some of the documentation that the GWB is incompatible with that.

I see some folks mentioned here about session affinity for LoadBalancing, but also about problems with session affinity. But those are also 1.5+ years ago, and who knows how many versions behind the one we are currently using in our implementation. Then there was a relatively recent post by @HofIoT saying it worked out of the gate for him.

Lot to take into consideration! But we have a local environment and quite a few tags, so I can send people out on bikes around the car park to test!

Liam_Philipp · January 14, 2025, 3:41pm

I remember helping someone who used the “common packet forwarder” in the past and those issues with incompatibility. Do you have an idea on how to get around this? Whether or not the GWB is on your gateway or server, the common packet forwarder will still have to send its data to it.

Perhaps you will need to use basic-station instead if your gateway supports it.

sp193 · January 14, 2025, 4:06pm

Do you mean Kerlink’s Common Packet Forwarder? It’s just their own implementation of the Semtech Packet Forwarder, without any real differences in protocol implementation. Personally, I believe it’s about UDP hole punching not being fulfilled correctly across the whole connection. Back in 2022, the remaining team on the sister company did not understand the network architecture, so there was just not much that can be done.

I have used Chirpstack with Kerlink’s gateways and there is no problem. Both the legacy Kerlink Wirnet Station and Kerlink Wirnet i-series (iStation, iBTS Compact etc).

If you have good control over and understanding of the network to make UDP hole-punching work correctly all the time, it should be fine.

Personally, I would like to champion the Semtech BasicStation module, as it is not specific to any LNS. For me in the AS923 region, it has some areas to improve, but that is for other discussion topics.

Liam_Philipp · January 14, 2025, 4:11pm

Interesting info, I am not sure if they are the same but I was referring to the Cisco Common Packet Forwarder. Here is the discussion I had with someone struggling to get it to connect:

There’s not a lot of information online on what it actually is, just how to configure it. Although Cisco seems to suggest it allows compatibility with third-party Network Severs, so presumably it is another implementation of the UDP packet forwarder, but all I can speak to is that conversation where it was not working.

Jake_Murphy · January 14, 2025, 4:26pm

We are using the Cisco Common Packet Forwarder for our implementation. It was a lot of conversations and reaching out to people to get it to work. I think we worked with Cisco themselves directly and it still took 3 1-2 hour sessions to debug everything.

But that was also nearly 4 years ago now, so I unfortunately can’t remember any details about those conversations

Liam_Philipp · January 14, 2025, 4:27pm

But you got it to work with Chirpstack? Or some other LNS?