Device_queue filling up

johan.forsvoll · June 1, 2022, 8:48am

The device queue (downlink messages) is filling up with messages. We discovered this when the server stopped responding after several days. The downlink messages are sent using REST API /api/devices/<dev_eui>/queue. The table device_queue in database chirpstack_ns contains a bunch of messages. Deleting these message, restarting the application server and the netowrk server brings the system back up again.

This happens on queues for offline devices. As we are in a test situation, there are a lot of devices that are turned off, but our software is still sending downlink messages to these devices. I am not sure if this is a configuration issue, bug in our system or a bug in the Chirpstack Network Server logic.

Versions
chirpstack-application-server/stable,now 3.17.6 amd64 [installed]
chirpstack-gateway-bridge/stable,now 3.13.3 amd64 [installed]
chirpstack-network-server/stable,now 3.16.1 amd64 [installed]

Characteristics:

Devices are offline
Class C devices (obviously)
OTAA
Message NOT confirmed

Questions

Have we done a configuration error?
Should our software stop sending downlink messages when devices are detected as offline?
Is there a configuration setting to limit the maximum number of downlink messages in the queue?

brocaar · June 13, 2022, 10:52am

Should our software stop sending downlink messages when devices are detected as offline?

For Class-A devices, it is expected that the queue will fill up in this case, as an uplink is required to trigger a downlink.

Is there a configuration setting to limit the maximum number of downlink messages in the queue?

There isn’t.

johan.forsvoll · June 13, 2022, 12:05pm

@brocaar Thanks for the response.
This is a class C device. I guess the queue needs to maintain the fcnt, but i don’t understand why the Chirpstack/gateway isn’t ACKing the downlink regardless of the actual device status… This is class C, non-confirmed messages.

brocaar · June 13, 2022, 12:34pm

It might be the scheduler batch size if there are too many Class-C downlinks in the queue in that case. The downlink fails as the device > GW association is invalidated in the database. If too many devices are failing because of this, then at some time a scheduler batch only includes failures, blocking the other items from being sent. Two things that you can do (this will be improved in the next version):

Increase the batch size (see chirpstack-network-server.toml config)
Stop enqueueing downlinks for inactive devices / flush these queues

Again, this will be improved in a next version, but the above will hopefully solve your issue until then

sagarpatel · June 14, 2022, 4:34am

Hi @brocaar,

As per your suggestion, we can do the second option but we cannot increase batch size in the chirpstack-network-server.toml config because the scheduler batch size comes from the Go Source code.

chirpstack-network-server/downlink.go at 3971570b77c79c1cfd184b6f06a4f1770b5a0db0 · brocaar/chirpstack-network-server (github.com)

var (
	schedulerBatchSize = 100
	schedulerInterval  time.Duration
)

Thanks