Handle gateway stats error

disk91 · May 22, 2023, 9:37pm

Hello

Got a lot of error in logs

ERROR chirpstack::uplink::stats: Handle gateway stats error error=Update gateway state

How can I fix that ? as in the context of Helium, gateways are not directly connected and I assume it is the reason why. If this is the case, can we disable this ?

brocaar · May 23, 2023, 10:33am

Which version are you using, and do you have the following in your chirpstack.toml (ChirpStack v4)?

[gateway]
  # Allow unknown gateways.
  #
  # If set to true, then uplinks received from gateways not configured in
  # ChirpStack will be allowed.
  allow_unknown_gateways=false

disk91 · May 26, 2023, 7:46pm

Hi,
version is 4.1.1
this setting is set to true (helium context)

ccall48 · June 23, 2023, 4:54am

I’m also trying to send unknown gateways as a test to another chirpstack server i have (i have tried it on my m1 mbp and currently in a cloud VPS → using chirpstack-docker).

I have enabled that in the chirpstack.toml and forwarding a few gateways as a test at the gateway using chirpstack packet forwarder. the error i get which seems to be up on an uplink.

chirpstack-gateway-bridge-au915_1  | time="2023-06-23T04:46:53.887629174Z" level=error msg="backend/semtechudp: could not handle packet" addr="xx.xx.xx.xx:46401" data_base64=AqbfBU3nXU4CB3M6 error="no internal frame cache for token 57254"
...
chirpstack-gateway-bridge-au915_1  | time="2023-06-23T04:47:10.74867755Z" level=error msg="backend/semtechudp: could not handle packet" addr="xx.xx.xx.xx:47494" data_base64=AlmJBdw6Pf7oaHdH error="no internal frame cache for token 35161"
...
chirpstack-gateway-bridge-au915_1  | time="2023-06-23T04:47:20.737450222Z" level=error msg="backend/semtechudp: could not handle packet" addr="xx.xx.xx.xx:47494" data_base64=Ah6cBdw6Pf7oaHdH error="no internal frame cache for token 39966"

our main instance sees the gateway traffic with joins etc fine, but nothing is picked up in the test instance, just the error above from the same traffic.

–

I am not sure what exactly i did, but after a few docker rebuilds it seems to have dropped the error and is currently working for the test device at the moment… but I am unsure why it started working exactly.

brocaar · June 28, 2023, 4:06pm

Is UDP routed to multiple instances? Then this might be your issue. The ChirpStack Gateway Bridge keeps some state about each gateway, and if UDP data is distributed over multiple instances then you get issues. In this case it seems your ChirpStack Gateway Bridge instance is receiving responses for tokens that are unknown to this instance.

In general UDP data of a single gateway should always be routed to the same ChirpStack Gateway Bridge instance.

disk91 · August 29, 2023, 11:41am

In the context of Helium, you have multiple LNS context so the traffic is sent to multiple chirpstack instances.

Victor_Li · August 29, 2023, 4:00pm

Hi Orne, this is Victor from Nova Labs. Trying to understand the issue here.

Are you saying that a single gateway (Helium Hotspot) should NOT be routed to different Chirpstack Gateway Bridge instances? Otherwise it causes that error?

What’s the consequence of this error? Does it affect data transfer? Thanx!

brocaar · September 5, 2023, 3:31pm

Once the ChirpStack Gateway Bridge received an UDP packet from a gateway, it will subscribe to the MQTT topic of the corresponding gateway, e.g. eu868/gateway/0102030405060708/command/#.

This means that if you load-balance the UDP data over multiple ChirpStack Gateway Bridge instances, then each instance will subscribe to the the gateway MQTT topic and downlink data might be sent to the gateway through multiple ChirpStack Gateway Bridge instances.

ChirpStack Gateway Bridge instances might start to subscribe and unsubscribe constantly as after X time not receiving UDP data, the GW Bridge will consider the gateway offline.

As well it will most likely break the downlink in an other way. ChirpStack Gateway Bridge keeps some temporary state to handle the RX1 and RX2 downlinks. In case the UDP packet-forwarder sends back a negative acknowledgement, then the GW Bridge will re-try using RX2 parameters. In case the UDP data is load-balanced across multiple GW Bridge instances, then the acknowledgement (TX_ACK) is most likely sent to a GW Bridge instance which is not aware about this downlink (with token Z) and thus is not able to handle the TX_ACK correctly. See:

github.com

Lora-net/packet_forwarder/blob/master/PROTOCOL.TXT#L277


      
          	| Gateway |                                                    | Server  |
          	+---------+                                                    +---------+
          	     |      ------------------------------------------------------\ |
          	     |      | Anytime after first PULL_DATA for each packet to TX |-|
          	     |      ------------------------------------------------------- |
          	     |                                                              |
          	     |                            PULL_RESP (token Z, JSON payload) |
          	     |<-------------------------------------------------------------|
          	     |                                                              |
          	     | TX_ACK (token Z, JSON payload)                               |
          	     |------------------------------------------------------------->|
          
          
### 5.2. PULL_DATA packet ###
          
          
That packet type is used by the gateway to poll data from the server.
          
          
This data exchange is initialized by the gateway because it might be 
          impossible for the server to send packets to the gateway if the gateway is 
          behind a NAT.
          
          
When the gateway initialize the exchange, the network route towards the

Long story short, you can setup multiple GW Bridge instances, but the UDP data of a single gateway should always be routed to a single GW Bridge instance.

Victor_Li · September 5, 2023, 11:45pm

By multiple GW bridge instances, do you mean multiple GW bridge instances under the same region?

If I have 1 US915 GW bridge, 1 EU868 GW bridge, I’m fine right?

brocaar · September 6, 2023, 1:03pm

Correct, that is fine But a single gateway would in such case only communicate with either the US915 GW Bridge or EU868 GW Bridge, never both…

What I mean is the following situation:


                                          / [GW Bridge 1]
[GW 1 + GW 2] --> UDP --> [load-balancer] 
                                          \ [GW Bridge 2]

What you want to avoid is that UDP data of [GW 1] is randomly distributed over GW Bridge 1 and GW Bridge 2. All data should either go to GW Bridge 1 or GW Bridge 2.

Thus once the first UDP packet of GW 1 was sent to GW Bridge 2, all following data needs to go to GW Bridge 2 as well. Some load-balancers might not handle this well as UDP has no connection state like TCP.

One (simple) way might be to use round robin DNS load-balancing. E.g. the UDP Forwarder resolves gw-bridge.example.com and the DNS server returns one of the IP addresses behind that hostname. As the UDP Forwarder will only resolve the hostname on start, it will then route all UDP data to the same IP for the lifetime of the process.

system · December 5, 2023, 1:04pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.