Need help in understanding errors

I have some new devices installed today. Everything seemed to be working pretty well, then one device started logging these events, taken from the device event page in ChirpStack. I don’t see any errors in the ChirpStack logs, and right now I have the log level set to TRACE.

level:"ERROR"
code:"DOWNLINK_GATEWAY"
description:"TOO_LATE"
level:"WARNING"
code:"UPLINK_F_CNT_RETRANSMISSION"
description:"Uplink was flagged as re-transmission / frame-counter did not increment"

I have 5 devices that are exactly the same, all configured similarly. The other 4 are working as expected. I don’t have direct access to this device, as it’s at a customer site. I’ll be heading back there tomorrow to at least restart it. After a couple of these errors the device stopped communicating. There are two gateways in the system, both with close proximity and a good signal.

I have seen the occasional uplink re-transmission warning in other devices but not the TOO_LATE error. What more can I look at to figure out what is happening?


Update: Now a second device has stopped communicating with similar messages in the event page just prior to stopping.

Doug

1 Like

These are two independent issues, which might be related:

The gateway TOO_LATE is similar to the TOO_EARLY. In most cases this means that the downlink was received too late by the gateway to be transmitted at the requested timestamp. Must likely this is because of high network latency (e.g. cellular). The best solution would be to change the rx1_delay to for example rx1_delay=3 (default = 1 second). This means the RX1 receive window is 3 seconds after uplink, which means there is more time to send the downlink to the gateway.

The UPLINK_F_CNT_RETRANSMISSION might or might not be related. For example when the uplink is confirmed and the NBTrans is set > 1 and the device does not receive the downlink (because of the above issue), then it will re-transmit. However, as the previous uplink with the same frame-counter was already seen by ChirpStack it will be rejected with the above retransmission warning.

It could also be that the device is configured with an NBTrans > 1 and since there is no downlink, it will re-transmit the uplink X times as configured by the NBTrans value.

The NBTrans value might be controlled by the ChirpStack ADR algorithm, based on the detected packet-loss.

The rx_delay=3 adjustment seems to have eliminated both the TOO_LATE and TOO_EARLY errors. I also increased the speed plan of the cellular connection for both gateways. This is the first installation I have done that uses only cellular connected gateways. Previously I have set up hybrid systems where the primary connection is Ethernet, with one or two gateways also having a cellular connection.

I am still seeing an occasional flurry of UPLINK_F_CNT_RETRANSMISSION warnings, but this is so far not permanently disrupting communication. These are Dragino LA66 v2 devices. The default max NBTrans value is one, and the fcnt is set to not change for each NBTrans. I have not adjusted these.

Is there a way to see exactly what the ADR algorithm is sending to the device?

Is the node too close to the gateway?

Is the node too close to the gateway?

Thanks for the question. I don’t think this is the case, but the notion of “too close” needs to be defined a bit more for me.

I will say this:

  • There are 5 devices in this environment, all between 20-30 meters from the closest gateway
  • There is a second gateway on a different floor (testing both LTE connection as well as LoRa signal strength)
  • In my development/test environment, I have a gateways and devices both really close and not close – on the same desk, and a gateway at least 100 meters away in a metal building) – I don’t see this issue

Is there a specific discussion or relevant information on close proximity interfering with communication you can point me to? I will look into this further.

After working fine for more than 24 hours, I have a couple devices that are indicating multiple re-transmission warnings, and one stopped communicating again.

This image is showing:

  • A good “up” frame at 12:38:58, after an OTAA error (DevNonce already used)
  • I flushed OTAA device nonces for this device at 12:53
  • Another valid “up” frame is sent at 12:53:03 (these should occur every 5 minutes)
  • Followed by a bunch of re-transmission warnings

Eventually, the application using the device (Dragino LA66 v2 USB device) notices the low data rate and re-initializes the device. It re-joins and then communicates normally for a while. This happened a few minutes after I grabbed this screenshot and the device is now communicating normally.

This is a (potential) production environment and I cannot access any of the devices. I am using RAK7268V2 LTE gateways (OS 2.2.2) – 2 in this deployment – running the ChirpStack MQTT Forwarder installed as described in the documentation.

I have not seen this behavior in a dev/test environment, using a similar setup but with a combination of LTE and Ethernet connected gateways. If I can somehow recreate the environment and behavior, I’m sure I can make the proper adjustments to ensure this either doesn’t happen or can be more effectively dealt with at the device level.

Right now I’m still stumped on why this is happening. I don’t know what else to look at in order to diagnose the problem.

I would be very interested in knowing what ADR adjustments are being sent to the device, even if it takes querying the database manually. I can see in the ChirpStack logs there are a ton of “Requesting ADR change” entries.

You can get this information from the LoRaWAN frames tab in the web-interface. When you click on the blue button ([up] / [down] buttons) you get the full LoRaWAN payload as JSON tree. This will also expose the LinkADRReq / LinkADRAns mac-command payloads. Here you will find the data-rate, tx-power, nb trans and channel-mask ChirpStack is requesting.

You can get this information from the LoRaWAN frames tab in the web-interface. When you click on the blue button ([up] / [down] buttons) you get the full LoRaWAN payload as JSON tree.

Is there any way to query this from the API? I couldn’t find any reference to ADR outside of what algorithm is being used.

There is an “internal” API for this, but it is not recommended to use this, as it might change over time. The internal API is intended to facilitate the web-interface and should not be used by other applications. See also: chirpstack/api/proto/api/internal.proto at master · chirpstack/chirpstack · GitHub