Questions on buffering of queued downlink messages

eric24 · July 17, 2018, 1:53pm

A few questions about downlink messages:

Where are the messages actually stored (on the AS or NS; in memory, SQL, etc.)?
I assume they are queued for transmission in FIFO order?
How many can be buffered in the queue at once per device (i.e. is there a limit to the number of downlink messages that can be queued for a given device)?
Does use of the ‘reference’ property (i.e. sending a second message with the same reference value) allow me to overwrite a message that is already in the queue (and has therefore not yet been sent)? And if so, does the “updated” message retain it’s position in the queue or is it moved to the end?
Is there a way to remove/cancel a message from the queue?

PS - In this case, we are sending downlink messages via MQTT, but I assume that none of these answers would change for someone using the REST API?

brocaar · July 17, 2018, 3:07pm

They are stored in the LoRa Server database (SQL). Note that the payload is stored encrypted.
Correct (as this is also important for the downlink frame-counter).
This is not limited by LoRa Server. Note that when a device re-activates, that the device-queue is flushed as the security context has changed.
Please note that the reference field will be removed in the next upcoming major version. When scheduling downlink messages using the REST API, you will get a FCnt as API response. When an acknowledgement is received, the ACK contains the acknowledged FCnt value. Unfortunately this is not possible when scheduling over MQTT as there is no direct response to sending a MQTT message (one option could be to keep the reference in the MQTT message and then send a FCnt back paired with the given reference.

See also: https://github.com/brocaar/lora-app-server/blob/master/docs/content/overview/changelog.md

And https://forum.loraserver.io/t/help-testing-the-next-lora-app-server-release-with-lorawan-1-1-support/1610 if you’re interested in testing the next major version.

eric24 · July 17, 2018, 3:25pm

Thanks for the information.

But after ‘reference’ is removed, there does not then seem to be any way to facilitate a “replacement” message or message cancellation. I have found that having this ability is important, especially for devices that communicate very infrequently (i.e. the user may change their mind about a downlink command that was sent two hours ago that won’t be delivered for another two hours, based on the uplink schedule).

Do you have any thoughts about this? Maybe API calls to access the queued messages by devEui and FCnt?

brocaar · July 17, 2018, 4:13pm

Note that the device-queue can already be retrieved by:

GET /api/devices/{dev_eui}/queue

And flushed by:

DELETE /api/devices/{dev_eui}/queue

Please note that the reference was never intended to “swap” queue items. It was more a reference given to a frame which could be used for acknowledgements. When the device-queue was moved from LoRa App Server to LoRa Server, this added some complexity that I preferred to remove than keep for the next major version as this was prone to errors.

eric24 · July 17, 2018, 4:39pm

Ah, OK. So those API calls could also be extended, fairly easily I think, by just adding an optional FCnt:

GET /api/devices/{dev_eui}/queue/{FCnt}

And (to delete an individual message):

DELETE /api/devices/{dev_eui}/queue/{FCnt}

And even (to update an existing message):

PUT /api/devices/{dev_eui}/queue/{FCnt}

The least useful of these is the GET, I suppose, but the other two would be very powerful. As it stands, I realize it would be possible to GET the list of queued messages, DELETE the entire queue, and then resubmit the new “set” of messages. That might be fine for most common use-cases, but adding the FCnt key and the PUT option would be something to consider.

brocaar · July 18, 2018, 6:23am

That could definitely be something to consider. However, note that a DELETE {FCnt} has the side-effect that you will get a gap in your queue, or LoRa App Server has to flush the whole remaining queue (after the deleted frame-counter) and re-encrypt these. (Note that the FCnt is an argument to the encryption function.)

eric24 · July 18, 2018, 12:59pm

Yep, I hadn’t considered that–it’s a good point. But wouldn’t the current approach of “flush everything and resend the entire queue” also cause a FCnt gap?

brocaar · July 18, 2018, 1:46pm

It wouldn’t. When enqueuing data, LoRa App Server requests the FCnt to use from LoRa Server. LoRa Server will look at the device-session (which contains the FCnt for the next downlink message) or when there are items in the queue for the given device, it takes the max(FCnt) from the queue + 1.

The device-session FCnt (down) is only incremented on an actual transmission. So when flushing the queue, LoRa Server automatically falls-back onto the device-session FCnt.

eric24 · July 18, 2018, 2:05pm

Very good. Makes sense. So “flush everything and resend” is a workable solution (and based on the way FCnt works in this scenario, it may be the only workable solution without significant changes). All in all, for most use cases, it’s probably a reasonable solution, as the number of messages queued for any given device should be fairly small.

niau · July 14, 2021, 4:22am

@brocaar @eric24 I am sorry for spamming on that thread but I foud it very useful. The piece of information I am interested is there a way to avoid flushing the network queue on device re-activation.

bconway · July 15, 2021, 1:07am

No currently, but I believe there is work being done on this.

brocaar · July 21, 2021, 12:26pm

Yes, I’m working on making this possible.

eric24 · October 13, 2021, 2:32pm

If you do this, it should definitely be a configurable behavior, maybe as part of the device or device profile configuration. I can see it being useful, but not in all cases–sometimes you want to start from a known point on a device join.

brocaar · November 1, 2021, 9:31am

Something like “Flush downlink queue on OTAA” ?

eric24 · November 1, 2021, 4:06pm

Exactly (OTAA being when a new join request/accept occurs). I feel like this belongs in the device profile, because it’s unlikely that a specific instance of a device would need to behave differently than other devices of that type. And if you ever needed to flush the downlink queue of any specific device, that’s already possible via the API.