Troubleshooting Downlink failures

Hi

Firstly, thank you for an awesome piece of software. I’ve been using Chirpstack for a couple of years without any problem.

Recently I’ve had a gateway go down due to a suspected power spike, and I’ve been struggling to get it back up and working correctly. The gateway is a Raspberry Pi4 with the Chirpstack OS running. The Lora Concentrator is a RAK2245 cape.

Downlinks, both added through the external API of the Application server and the built-in forms get enqueued but never sent to the device. Looking at the logs of the Application server I get the following log entry.

chirpstack-application-server[641]: time=“2022-05-29T16:08:15.461294188+02:0
0” level=info msg=“downlink device-queue item handled” confirmed=true dev_eui=e0bbd2773c6eb564 f_cnt=1
96

Note the feedback “downlink device-queue item handled”. I never get the “downlink device-queue item acknowledged”

Similarly I get the downlink log entry.

chirpstack-network-server[642]: time=“2022-05-29T16:08:15.458419952+02:00” l
evel=info msg=“device-queue item created” ctx_id=3dd1a9cb-0745-4da1-b1f7-c265d446e4a5 dev_eui=e0bbd277
3c6eb564 f_cnt=196

On the chirpstack-gateway-bridge side I also get info regarding downlinks, however, the info is not as clear.

chirpstack-gateway-bridge[643]: time=“2022-05-29T16:08:18.311404745+02:00” l
evel=info msg=“integration/mqtt: downlink frame received” downlink_id=a5affa55-5cd9-4112-a172-2592aaba
46fe gateway_id=dca632fffe784db0
May 29 16:08:18 NXGW0003D chirpstack-gateway-bridge[643]: time=“2022-05-29T16:08:18.313278626+02:00” l
evel=info msg=“integration/mqtt: publishing event” downlink_id=a5affa55-5cd9-4112-a172-2592aaba46fe ev
ent=ack qos=0 topic=gateway/dca632fffe784db0/event/ack
May 29 16:08:20 NXGW0003D chirpstack-gateway-bridge[643]: time=“2022-05-29T16:08:20.108038152+02:00” l
evel=info msg=“integration/mqtt: publishing event” event=up qos=0 topic=gateway/dca632fffe784db0/event
/up uplink_id=4c7e24e5-366f-4f35-846b-1f0668c1f299
May 29 16:08:20 NXGW0003D chirpstack-gateway-bridge[643]: time=“2022-05-29T16:08:20.441996181+02:00” l
evel=info msg=“integration/mqtt: downlink frame received” downlink_id=89bfcaa2-554f-48e6-a11c-97172a77
8dc8 gateway_id=dca632fffe784db0
May 29 16:08:20 NXGW0003D chirpstack-gateway-bridge[643]: time=“2022-05-29T16:08:20.443426971+02:00” l
evel=info msg=“integration/mqtt: publishing event” downlink_id=89bfcaa2-554f-48e6-a11c-97172a778dc8 ev
ent=ack qos=0 topic=gateway/dca632fffe784db0/event/ack

Monitoring the live Lorawan Frames and the Device Data the downlink never happens. And the item never gets cleared from the queue.

On this specific GW there are three devices. One works perfectly and downlinks get transmitted. For the other two devices downlinks get enqueued but not transmitted.

Please assist me in clearing this issue.

While looking into the problem I stumbled onto the following. The chirpstack-network-server reports the following error:

May 29 16:15:41 NXGW0003D chirpstack-network-server[642]: time=“2022-05-29T16:15:41.505957407+02:00” l
evel=warning msg=“get device-session for devaddr error” ctx_id=18691613-7099-4fc3-99b1-2c7920b5ebe0 de
v_addr=007c65e4 dev_eui=37373ca9d6131fd7 error=“object does not exist”

This device address is the same as the device address of the post above, however, the dev_eui is different. The EUI shown here is the original EUI associated with that device address. During the troubleshooting process, I deleted the device and the application numerous times. The EUI listed in the previous post is the new EUI, whereas the EUI listed in the above paragraph is the old EUI.

Can it be that the old EUI and device address are still joined together and this is causing the downlink to fail (as they are associated with the new EUI)?

I have tried to delete the old EUIs using the chirpstack-application-server API to not avail.

I have managed to clear the error message by flushing the redis store. However, when doing so all the keys, including the device address, Networks Session Key and the Application Key are also removed.

I’ve also managed to get the one device, that was working, to stop working. None of the three devices are now getting and downlink messages.

Secondly the logs now show the previously working device is also getting the following message:

msg=“downlink device-queue item handled”

and nothing else

After about an hour just leaving the gateway as is the unit that were receiving downlinks started to do so again…

Any help, advice or ideas would be appreciated.

Could this be related to a hardware issue?

Lastly, the quickest option to fix this would be replace the gateway (RPi and Concentrator), but this requires me to get on a plane and physically go there - which is the most expensive option. I am also considering re-installing all the software if this would possibly fix the issue.

Please help.

Further to my previous messages, I’ve noticed that the frame counter does not increase for uplink messages. I suspect this is part of the problem as fcnt for ABP devices should increase. In the past I’ve side stepped this issue with the ‘Disable frame-counter validation’ tick box. However this is not working at the moment.

Hi All

An update I want to share for anyone who might struggle in the future. I noticed that the downlink packets get acknowledged by the network server every time I (re)activate the unit. I then played with the REST API to automate the process of (re)activation (ONLY to be used as a bandaid) until I have more knowledge and understanding of the issue. Turns out this solved the problem and the units are now accepting downlink messages as before. I suspect the additional fields used in the API might be the issue.