My Chirpstack configuration is currently running Chirpstack v3 with application-server, network-server (AU915). Some compatibility reasons stop us from migrating to v4.
Our Chirpstack setup already has lots of devices correctly sending and receiving traffic. All of the devices use OTAA to join the network and have ADR enabled. So far, so good. All of the working devices are ESP32 chips using SX1262 transceivers with the SX126X-Arduino library.
Now, we are doing tests with an experimental device using an architecture and chipset completely different from the ones before. The new device is ARM-based and uses a “seeed studio WM1110-A” SOC. The original example code is supposed to connect to The Things Network, and we have modified it to use our own DevEUI and AppKey as generated by our Chirpstack. However, the Join procedure fails.
For debugging, I am using a RAK Raspberry Pi 4 gateway and two devices, an ESP32-based one, and the experimental one. Both devices are less than one meter from the rpi4 gateway. I am also simultaneously monitoring the http://{SERVER}/#/organizations/{ORGID}/applications/{APPID}/devices/{DEVEUI}/frames urls for both devices, as well as the MQTT topics under application/{APPID}/device/{DEVID}/event/XXXX .
The behavior I see with the working (ESP32) device is:
- Device sends JoinRequest
- JoinRequest appears in the ChirpStack monitoring. This JoinRequest has a devNonce field with some random integer value, always different between tests
- A JoinAccept is generated and it appears in the Chirpstack monitoring
- At around the same time, there is a message posted in application/{APPID}/device/{DEVID}/event/join
- Data starts being exchanged correctly. No more JoinRequest or JoinAccept messages appear in Chirpstack
On the other hand, the experimental device behaves like this:
- Device sends JoinRequest
- JoinRequest appears in the Chirpstack monitoring. Unlike the other device, this packet has a devNonce field that is always 1.
- A JoinAccept is generated, and it appears in the Chirpstack monitoring.
- There is no corresponding message published at the corresponding application/{APPID}/device/{DEVID}/event/join topic.
- The device apparently never receives the message. Eventually it sends another join request packet.
- This second JoinRequest also appears in the Chirpstack monitoring. However the devNonce field is again 1.
- After this, Chirpstack never emits a JoinAccept packet in the monitoring or any payload at the corresponding MQTT topic.
- Repeated attempts to send the join request instead end up with publishing on the application/{APPID}/device/{DEVID}/event/error topic, with the ominous text “error: validate dev-nonce error”.
- This cycle repeats over and over until I turn the device off.
The only way I can reset the behavior for the experimental device is by deleting and recreating the device, with the exact same devid and appkey values. I have found a few topics that mention deleting a row in the device_activation table of the postgresql database, but when I do it, without restarting the networkserver, it does nothing. I cannot freely restart the networkserver because it is a production system.
From what I have read on various topics, I get the idea that the device is no supposed to reuse the same value for devNonce in the join request. However I am at a loss as to why the device does not receive the join accept packet at all, or why the JoinAccept that appears exactly once is shown in the ChirpStack monitoring without a corresponding join event at the MQTT topic.
Is the above scenario familiar to you? Do you have tips on where to debug this issue? Which logs and messages should I look for? A hardware failure could be involved, but it does not explain the lack of a join event visible at the MQTT topic. BTW, at which point is the join event actually posted to MQTT (as opposed to the monitoring), and by which component?