Should invalid queue items be deleted?

In Network Server v3.13, the re-queue of items that are not acknowledged by the gateway feature was added. I am looking at the network server logs and have noticed a build-up of queued items that have become invalid. There are 2 errors I see, and each has a slightly different cause. Note these errors are happening every scheduler interval when it tries to downlink, thus become quite spammy on the logs and system resources. I wanted to get some feedback on the solution to this.

Scenario 1

Mar 28, 2022 @ 19:24:12.846	time="2022-03-28T19:24:12.846384252Z" level=error msg="get device-session error" ctx_id=dd378d3b-e896-4875-9219-616c4c808174 dev_eui=xxxxxxxxxxxxxxxx error="object does not exist"

A frame is stuck in the device queue for a dev_eui that does not exist. Actually, it does exist in the application server, it just has not joined yet (so it has no DevAddr). So I am wondering how this could occur? If you attempt to enqueue a downlink to a device that has never joined, does it not error out right away before the frame gets in the queue? I feel like the only way this could happen is if the device was active, then its gateway went down so the packet was not ack’d, then the dev_eui was removed from app server and re-added, and the device has never been on since?

Scenario 2

Mar 28, 2022 @ 19:24:12.339	time="2022-03-28T19:24:12.338932294Z" level=error msg="schedule next device-queue item error" ctx_id=0de8c17e-d547-4afb-8be4-b54aec2be8ab dev_eui=yyyyyyyyyyyyyyyyyyy error="get device gateway RXInfoSet error: object does not exist"

A frame is in the device queue, but this device has no routing information to a gateway. Maybe this dev_eui joined but has never sent an uplink, or its gateway routing info has expired – can that occur?

Solution A

In at least one of these scenarios, I would propose that the queue item is dropped (Scenario 1). In Scenario 2 you probably don’t want to drop it, as it makes sense to keep it in the queue until the device sends its uplink, especially after a fresh join when a downlink is queued before the device has a chance to send an uplink!

Solution B

Time based flush. Perhaps this is an application responsibility, but maybe there is way to flush old queue items? (It looks like we’ve had many packets stuck in the queue for 6+ months, printing that error every second for each of them). I think the API allows querying queue items, but with thousands of potential packets being returned to parse out, maybe that is asking too much of the API.

any thoughts, corrections, or other solutions?