LinkADRReq / LinkADRAns in endless loop

I have been lurking for the last couple years but finally got my LoraServer up and running. I apologize for the long post, but since I have all the Troubleshooting information, I figured it would be useful

I am using the Rak2245 Pi hat with the RAK image configured to use ChStk. I am using this with Moteino mega nodes using the following versions:
MCCI LMIC 3.2.0 library on a Moteino mega
chirpstack-network-server 3.4.1
chirpstack-application-server 3.5.1
chirpstack-gateway-bridge 3.4.1

For my testing I am using the OTAA-ttn example sketch

I allready have 2 netowrks up and running:
The first network is using Loriot with several devices runnig APB on subband 1
Ths second network is TTN with serveral devices using a mix of OTAA and APB on subband 2

The Chirpstack is setup to use subband 3 in the global-config and in the gatway config (IF freq), gateway profile (16,17,…,23,66) and device profile (freq 905.5 - 906.9 and 906.2)

The problem that I am having is that the device is successefully transmitting on subband 3 and the JoinReq and JoinAccept are happening properly, but after a packet or two the linkADRreq from the GW and linkADRAns from the device go back and forth in an infinite loop

Here are the details:
After several TX in random subbands, the JoinReq eventually hits a subband 3 freq that gateway is listening to
(confirmed from LMIC debug)

I grabbed the raw packet data using
tail -n2000 /var/log/syslog | grep txpk
and
tail -n2000 /var/log/syslog | grep txpk

Packets decoded using
https://lorawan-packet-decoder-0ta6puiniaut.runkit.sh/

Join Req
Assuming base64-encoded packet
AAAAAAAAAAAA12a0Zpd+uC5JwSitAy4=

Message Type = Join Request
PHYPayload = 000000000000000000D766B466977EB82E49C128AD032E

( PHYPayload = MHDR[1] | MACPayload[…] | MIC[4] )
MHDR = 00
MACPayload = 0000000000000000D766B466977EB82E49C1
MIC = 28AD032E

( MACPayload = AppEUI[8] | DevEUI[8] | DevNonce[2] )
AppEUI = 0000000000000000
DevEUI = 2EB87E9766B466D7
DevNonce = C149

The packet decoder is not consistent with little endian and big endian. Some fields are in order and others are byte swapped. For example the fields of the mac payload are in byte order, but the devEUI is byte swapped (end device is little endian)

The gateway responds with
Assuming base64-encoded packet
IN3oLeZpOAWlJsOgVhfkaUY=

      Message Type = Join Accept -- WARNING: The values below have not been decrypted
        PHYPayload = 20DDE82DE6693805A526C3A05617E46946

      ( PHYPayload = MHDR[1] | MACPayload[..] | MIC[4] )
              MHDR = 20
        MACPayload = DDE82DE6693805A526C3A056
               MIC = 17E46946 (from packet) INVALID (tried MSB 0000-FFFF)
                   = CAE84E38 (expected, assuming 32 bits frame counter with MSB 0000)

      ( MACPayload = AppNonce[3] | NetID[3] | DevAddr[4] | DLSettings[1] | RxDelay[1] | CFList[0|15] )
          AppNonce = 2DE8DD
             NetID = 3869E6
           DevAddr = C326A505
        DLSettings = A0
           RxDelay = 56
            CFList = 

DLSettings.RX1DRoffset = 2
DLSettings.RX2DataRate = 0
RxDelay.Del = 6

I decoded the Join accept message using the AppKey and DevNonce using the instructions at the bottom

“OTAA Join Accepts cannot be verified or decrypted above, as those need some additional data to validate their MIC and derive the secret session keys.”

Using the link at https://runkit.com/avbentem/deciphering-a-lorawan-otaa-join-accept I cloned the script and modified it to remove all references to CFList as it is empty in the join accept and breaks the script if you don’t remove them.

This gives me the following for the keys and device ID
Payload = 20dde82de6693805a526c3a05617e46946
MHDR = 20
Join Accept = 01000000000043afd0000801d2faae7f
AppNonce = 000001
NetID = 000000
DevAddr = 00d0af43
DLSettings = 08
RXDelay = 01
message MIC = d2faae7f
verified MIC = d2faae7f
NwkSKey = 52a41d9ece92608069874f2e16035210
AppSKey = 6b46618a21b61c21da93cedc607346c3

Using the NwkSKey and AppSKey, I decoded the rest of the packets
The next packet is data upload from node (on Band 3) with ADR set to true

Assuming base64-encoded packet
QEOv0ACAAAABA487QcS2dlDE/DmEm1YawNM=

Message Type = Data
PHYPayload = 4043AFD00080000001038F3B41C4B67650C4FC39849B561AC0D3

( PHYPayload = MHDR[1] | MACPayload[…] | MIC[4] )
MHDR = 40
MACPayload = 43AFD00080000001038F3B41C4B67650C4FC39849B
MIC = 561AC0D3 (from packet)
= 561AC0D3 (expected, assuming 32 bits frame counter with MSB 0000)

( MACPayload = FHDR | FPort | FRMPayload )
FHDR = 43AFD000800000
FPort = 01
FRMPayload = 038F3B41C4B67650C4FC39849B (from packet, encrypted)
= 48656C6C6F2C20776F726C6421 (decrypted)

  ( FHDR = DevAddr[4] | FCtrl[1] | FCnt[2] | FOpts[0..15] )
 DevAddr = 00D0AF43 (Big Endian)
   FCtrl = 80
    FCnt = 0000 (Big Endian)
   FOpts = 

Message Type = Unconfirmed Data Up
Direction = up
FCnt = 0 (from packet, 16 bits)
= 0 (32 bits, assuming MSB 0x0000)
FCtrl.ACK = false
FCtrl.ADR = true

The gateway responds with an linkADRreq
Assuming base64-encoded packet
YEOv0AAKAAADAAAAcAMQAP8BjqNqNg==

Message Type = Data
PHYPayload = 6043AFD0000A00000300000070031000FF018EA36A36

( PHYPayload = MHDR[1] | MACPayload[…] | MIC[4] )
MHDR = 60
MACPayload = 43AFD0000A00000300000070031000FF01
MIC = 8EA36A36 (from packet)
= 8EA36A36 (expected, assuming 32 bits frame counter with MSB 0000)

( MACPayload = FHDR | FPort | FRMPayload )
FHDR = 43AFD0000A00000300000070031000FF01
FPort =
FRMPayload =

  ( FHDR = DevAddr[4] | FCtrl[1] | FCnt[2] | FOpts[0..15] )
 DevAddr = 00D0AF43 (Big Endian)
   FCtrl = 0A
    FCnt = 0000 (Big Endian)
   FOpts = 0300000070031000FF01

Message Type = Unconfirmed Data Down
Direction = down
FCnt = 0 (from packet, 16 bits)
= 0 (32 bits, assuming MSB 0x0000)
FCtrl.ACK = false
FCtrl.ADR = false

The Fopts 0300000070031000FF01 breaks down to
Fctrl = 0A 1000 1010 (ADR=false, AdrReq= false, ACK=False, RFU= Fpend) 10B Fopt

Fopts (bid end) 03 00 0000 70 03 10 00FF 01
0x03 LinkADR req 1:2:1 bytes payload
0x00 Data Rate 0 ( SF10 / 125 kHz) TX power 00
0x0000 ch mask 0000 0000 0000 0000 No channels

0x70 0111 0000 (1:3:4 bits) 0 111 0000 RFU=false ChMaskCntl=111(125kHz OFF) NBtx=0

Fopts big end
0x03 LinkADR req 1:2:1 bytes
0x10 Data rate 1 (LoRa: SF9 / 125 kHz) TxPow 0 (30dBm)
0x00FF ChMask 0000 0000 1111 1111
0x01 RFU=False ChMaskCntl=000 (0 to 15) NBtx=1

The node responds with
Assuming base64-encoded packet
QEOv0ACEAQADBgMGRcu/0g==

Message Type = Data
PHYPayload = 4043AFD0008401000306030645CBBFD2

( PHYPayload = MHDR[1] | MACPayload[…] | MIC[4] )
MHDR = 40
MACPayload = 43AFD00084010003060306
MIC = 45CBBFD2 (from packet)
= 45CBBFD2 (expected, assuming 32 bits frame counter with MSB 0000)

( MACPayload = FHDR | FPort | FRMPayload )
FHDR = 43AFD00084010003060306
FPort =
FRMPayload =

  ( FHDR = DevAddr[4] | FCtrl[1] | FCnt[2] | FOpts[0..15] )
 DevAddr = 00D0AF43 (Big Endian)
   FCtrl = 84
    FCnt = 0001 (Big Endian)
   FOpts = 03060306

Message Type = Unconfirmed Data Up
Direction = up
FCnt = 1 (from packet, 16 bits)
= 1 (32 bits, assuming MSB 0x0000)
FCtrl.ACK = false
FCtrl.ADR = true

Fopt 03060306 breaks down to
Fopts (bid end) 03 06 03 06
0x03 LinkADR ans 1 bytes payload
0x06 0000 0110 RFU 0 PowerACK=True, DataRateACK=True, ChMask=False

0x03 LinkADR ans 1 bytes payload
0x06 0000 0110 RFU 0 PowerACK=True, DataRateACK=True, ChMask=False

The device accepts the ADR power and DataRate, but rejects the ChMask.

From here on, the gateway and device go into an endless loop:

The Gw packet with Fopts of 0300000070031000FF01 and the devcice reply with Fopts of 03060306 bounce back and forth forever after this at full speed.

Since I have 2 other gateways running I pulled the ADRreq packets for those to compare:

Loriot:

Fctrl = 85 1000 0101 (ADR=true, AdrReq= false, ACK=False, Fpend= false) 5B Fopt

Fopts (bid end) 03 11 FF00 00
0x03 LinkADR req 1:2:1 bytes payload
0x00 Data Rate 1 ( SF9 / 125 kHz) TX power 01
0xFF00 ch mask 1111 1111 0000 0000 MSB LSB=0000 0000 1111 1111 (Ch 0-7)
0x00 0000 0000 (1:3:4 bits) 0 000 0000 RFU=false ChMaskCntl=000 (1-15 125kHz) NBtx=0

TTN:
Fctrl = 8A 1000 1010 (ADR=True, AdrReq= false, ACK=False, Fpend= false) 10B Fopt

Fopts (bid end) 03 40 0200 71 03 35 00FF 01
0x03 LinkADR req 1:2:1 bytes payload
0x40 Data Rate 4 ( SF8 / 500 kHz) TX power 00
0x0200 ch mask 0000 0010 0000 0000 MSB LSB=0000 0000 0000 0010 (Ch65)
0x71 0111 0001 (1:3:4 bits) 0 111 0001 RFU=false ChMaskCntl=111(125kHz OFF) NBtx=1

Fopts big end
0x03 LinkADR req 1:2:1 bytes
0x35 Data rate 3 (LoRa: SF7 / 125 kHz) TxPow 5 (20dBm)
0x00FF ChMask 0000 0000 1111 1111 MSB LSB=1111 1111 0000 0000 (Ch 8-15)
0x01 RFU=False ChMaskCntl=000 (0 to 15) NBtx=1

ChStk
Fctrl = 0A 1000 1010 (ADR=false, AdrReq= false, ACK=False, Fpend=false) 10B Fopt

Fopts (bid end) 03 00 0000 70 03 01 00FF 01
0x03 LinkADR req 1:2:1 bytes payload
0x00 Data Rate 0 ( SF10 / 125 kHz) TX power 00
0x0000 ch mask 0000 0000 0000 0000 MSB LSB=0000 0000 0000 0000 (No channels)
0x70 0111 0000 (1:3:4 bits) 0 111 0000 RFU=false ChMaskCntl=111(125kHz OFF) NBtx=0

Fopts big end
0x03 LinkADR req 1:2:1 bytes
0x01 Data rate 0 (LoRa: SF10 / 125 kHz) TxPow 1 (28dBm)
0x00FF ChMask 0000 0000 1111 1111 MSB LSB=1111 1111 0000 0000 (Ch8-15)
0x01 RFU=False ChMaskCntl=000 (0 to 15) NBtx=1

To add to the confusion, it appears that the Fopts fields are in order, but the multi-byte ChMask is byte swapped (to match the spec and actual frequencies)
For an additional wrinkle, the lorawan spec doesn’t specify, but only the LSB bytes are used in the ChMask for 500kHz channels when ChMaskCntl is set to 6 or 7.

Loriot does not turn any channels off, instead it only has one request to turn on the band 1 (0,1,…,6,7) 125kHz. Interestingly, after a while it will switch the ADR to ch64 with a separate message after a few dozen packets (nodes are close to gw and transmit often)

TTN turns off all 125kHz channels while turning on ch65 and then turns on the B2 channels (8,9,…,14,15)

The one thing that sticks out, is that ChStk does not turn on any 500kHz channels when it turns off the 125kHz channels using ChMaskCntl=7

From the 1.0.2 spec the ChMaskACK rejects for the following reasons. One of which is that there is a ChMask requiring all channels to be disabled

Channel mask ACK Bit = 0
The channel mask sent enables a yet undefined channel or the channel mask required all channels to be disabled. The command was discarded and the end device state was not changed.

I have seen other posts in the forums with the ADR request that has all channels off
tektelic-home-sensor-missing-7-of-8-frames

linkadrreq-issues-overloads-data-payload
adr-chmask-all-false

My behavior is similar to the description in the post
downlink-messages-after-unconfirmed-uplink

The spec is not clear how the ChMaskACK is supposed to handle piggybacked requests such as the TTN and Chirpspeak. Perhaps there is an issue with LMIC not handling the piggybacked request atomically and rejecting it if either part has 0 channels?

I have also tried to change the rx_delay to larger values per “downlink-every-after-uplink-with-mic-linkadrreq” but the behavior did not change.

Since the device hardware is known good (works on TTN/Loriot with different code), the MCCI LMIC is latest and using the example code, and the gateway seems to Join and JoinAccept correctly, I am hoping it is some configuration that I missed.