Downlink Queue Issues and Errors in ChirpStack v4

josealun · November 23, 2023, 9:14am

Hello ChirpStack Community,

I’m currently facing a challenge with downlink message handling in my ChirpStack environment. I am using ChirpStack version 4 with a Mikrotik ltAp gateway. The issue involves downlink messages getting stalled in the queue, and I’m encountering specific errors that I’m hoping to get help with.

Environment:

ChirpStack version: v4
Gateway model: Mikrotik ltAp
UDP packet forwarder

Issue Description : New devices can join the network and receive their first downlink message successfully. However, subsequent downlink messages get queued and are not sent. The chirpstack-gateway-bridge logs and chirpstack logs reveal two distinct error messages:

nov 23 09:54:46 chirpstack-gateway-bridge[xxxxxx]: time="2023-11-23T09:54:46.690713473+01:00" level=error msg="backend/semtechudp: could not handle packet" addr="[IP Address]:[Port]" data_base64=Aj4IBUg2NyATAE8A error="no internal frame cache for token 7848"

ERROR chirpstack::downlink::scheduler: Schedule next queue-item for device failed error=Get device lock

Has someone any idea about what could be the cause? I’ve flushed the queues of every device, but this behaviour repeats on any new downlink after the first one. Uplinks are received without issues.

sp193 · December 28, 2023, 2:54pm

Downlink messages getting “stuck” means that the gateway somehow cannot acknowledge the downlinks. So to Chirpstack, the downlinks were never successfully sent.

Since you’re using the UDP Packet Forwarder which uses GWMP, GWMP defines the downlink workflow as:

The gateway will periodically send PULL_DATA, as a heartbeat message that opens a downlink channel.
The LNS (Chirpstack) may send PULL_RESP at any time to issue a downlink, including a token.
The gateway sends PULL_ACK to acknowledge the downlink, passing the token included in (2).

After (3), Chirpstack checks the token and updates the downlink queue if the downlink was successfully transmitted.
If you cannot see downlinks issued (not even issued from the gateway’s side), perhaps there is a problem with keeping the downlink channel open. I would compare this against Chirpstack’s and the gateway bridge’s logs:

Chirpstack logs whenever it tries to issue a downlink.
Gateway Bridge logs whenever it receives a message from Chirpstack on the “/down” MQTT topic. If you have verbosity turned up, it will also log the GWMP message exchanges with the LoRa gateway.
LoRa gateway should note the PULL_RESP message and log what happened to the transmission.

On the gateway’s end, PULL_DATA must be sent sufficiently frequent, to keep the session alive. This includes keeping the path open between all firewalls. And UDP hole-punching must work. The default value used by the reference Semtech software from Github has been sufficient in my experience, which is 10s. Some gateways have different values by default.

If you have more than one Gateway Bridge serving the same port, have you tried scaling back to just one instance?

Finally, please ensure that you selected the correct device class for the device. Class A devices can only be communicated with, if it sends an uplink. Class C devices can have messages sent to them at any time, but also only if Chirpstack was informed that the device is a Class C device.

josealun · February 14, 2024, 2:59pm

Hi @sp193 ,thank you for your reply.
It was a bug on the Mikrotik’s gateway end, it was fixed on their next firmware release. But your explanation was great.