Device losing channel mask / how to resend chMask

np0 · December 12, 2022, 5:31pm

Hi all,

We have our network server running in US915 sub-band 4. Everything works great, the modules join and communicate to the LNS with on their intended channels on an hourly basis with close to zero packet loss. The end devices follow the LoRaWAN 1.0.4 spec using an off the shelf stack.

The problem we are having is that in the event that the gateways are offline for an extended period of time, the modules revert to a mode where they start transmitting on all channel and as a consequence, we start incurring massive packet loss. I’m not 100% sure if this is the fault of our module, we are looking into that, but more importantly, what I would like to know is if there is any way to force the module to recover the correct channel mask without forcing a rejoin at the device itself.

I suspect that this may be possible via a customer ADR plugin, potentially looking at both the short term and long term packet loss and deciding whether or not to send a new channel mask. Is this possible? Is the channel mask sent as part of an ADR response or does this only happen on a JoinAccept? Are there any other mechanisms we could use to remotely get the devices to behave this way?

Has anyone experienced this before? I can potentially force an ADR command on at least some of the modules by changing the max/min rate in the service profile, but that’s just a stopgap measure and not a long term solution.

The behaviour of the device seems consistent with pages 19-20 of the LoRaWAN 1.0.4 spec, specifically “Furthermore, if at any point during the backoff the resulting configuration results in an invalid combination of TX power, data rate or channel mask, the end-device SHALL immediately re-enable all default channels and use the maximum TX power permissible for and available to this end-device.” and the following table.

brocaar · December 22, 2022, 4:01pm

ChirpStack keeps an internal state of every "connected’ device in a device-session. Based on what is stored there vs what is configured (.toml files), it send a LinkADRReq mac-command to re-configure the device channel-mask.

I’m not sure what the best way would be. E.g. ChirpStack could detect that the device reverted to DR0 and then re-send the channel-mask to the device. But that would not work for devices that are already on DR0. Then the only indication would be a sudden increase of packet-loss.

np0 · December 22, 2022, 7:19pm

Thanks for the response. In this case it appears that with prolonged gateway outage, our module is backing off and correctly following the LoRaWAN spec and enabling all channels for transmission. The problem is that we are then only getting roughly 1 in 8 packets received since we’re operating on a single sub-band (US915 sub-band 4)

When the gateway is back online the LinkADRReq bit is set and Chirpstack does send a channel mask to the device, but this is only with chMaskCntl 1 enabling sub-band 4 as per our .toml file, but not sending chMaskCntl 7 which would disable the rest of the unused channels. So the modules continue to transmit on all channels.

We inquired with other network server vendors (Tektelic for example, who provides our gateway) and have confirmed that on their network server, the LinkADRReq response does in fact send chMaskCntl 7 to clear the other channels. Chirpstack seems to only do this on the initial JoinAccept.

This is described in more detail about a year ago in issue 558 (LinkAdrReq command to use ChMaskCntl 5 (US915 Region) · Issue #558 · brocaar/chirpstack-network-server · GitHub) by another user. It seems that they have also tested with other network servers and they all seem to send chMaskCntl 7 before setting the correct channel masks in the LinkADRReq download. In the issue on github, it seems like there was a little bit of back and forth conversation regarding the interpretation of the LoRaWAN spec and what (if anything) the NS should do in this case. We are fairly convinced that the NS should clear the channels and set the correct channels as per the .toml file in the LinkADRReq downlink packet as is the case with other network servers.

As you mentioned, the only way to detect this issue is with an increase in packet loss. We took a loss rate of 80% and assumed that these modules were affected. We then used the NS API to manually push two MAC commands to the downlink queue (chmaskcntl 7 and chmaskcntl 1 0x00FF while also setting DR0) and have successfully brought these modules back to 0 packet loss, however continuously checking for this case isn’t an ideal long term solution. I was hoping we could quickly fix this this as part of an ADR plugin, but it doesn’t look like we have access that kind of low level MAC command in the ADR plugin. Best case we have the behaviour corrected in Chirpstack in the LinkADRReq downlink. I’m hoping this can be done as part of an official release as opposed to a custom branch.

Thanks again for your assistance.

brocaar · January 9, 2023, 2:39pm

I do understand the issue and I’m not arguing that ChirpStack shouldn’t handle this I’m trying to find out what should trigger ChirpStack to re-send the full channel-mask. The problem is that the device doesn’t tell the NS that it reset its channel-mask, so the NS can only assume. But then the question is:

What parameters will the NS use as indicators that the device has reset its channel-mask
Or should the NS send the channel-mask periodically to the device (which is a dumb solution and adds overhead)

np0 · January 9, 2023, 6:36pm

In our case, and I believe this would be the case with most devices, once the device has backed off, it will set the ADRACKReq bit in subsequent uplink packets (as per Semtech’s ADR backoff flow guidelines).

Chirpstack does response in this situation with the enabled channels, but not the full channel mask. I agree that it’s silly to send it periodically if it’s not needed.

I would suggest that Chirpstack should respond with the full channel mask (setting chMaskCntl=7 first, and then the enabled channels) any time the device sends an uplink with ADRACKReq set. This should not cause unnecessary overhead as it’s basically already the behaviour of Chirpstack. We are just adding the additional command for chMaskCntl=7 to downlink. It should be up to the device to know when to set ADRACKReq (e.g. when it needs this information and not with every uplink). In this case the NS doesn’t need to send anything periodically, it would just act as it already does unless the device is specifically requesting ADR Ack.

I have a Chirpstack frame log of device uplinks and gateway responses when we simulated this scenario. I’m happy to share if it’s helpful.

brocaar · January 19, 2023, 9:10am

Actually, this might not be such a bad approach.

There might be some overhead, depending the channel-configuration it might need to send more than one LinkADRReq mac-command as downlink, where for ADR only one LinkADRReq is needed.

In essence, the implementation would be:

If the ADRACKReq bit is set in the uplink, enable all uplink channels of the device internally (in the device-session). Then ChirpStack will automatically send the LinkADRReq command(s) to re-configure the channel-mask to the channels as set in the region_xxxxx.toml configuration. On every downlink, it will detect which mac-commands are needed to bring the device in the desired state.

np0 · January 19, 2023, 10:07pm

Sounds great @brocaar - looking forward to seeing this in action. Once again, much appreciated. Hopefully this will resolve some similar concerns that others on the forum were having as well.

np0 · January 22, 2023, 11:38pm

Just noticed that you mentioned region_xxxxx.toml. I should probably mention that we’re using v3, not v4. we are holding off on v4 support pending Azure IoT Hub support.