JoinRequest from new device triggers JoinAccept on Chirpstack but not MQTT

My Chirpstack configuration is currently running Chirpstack v3 with application-server, network-server (AU915). Some compatibility reasons stop us from migrating to v4.

Our Chirpstack setup already has lots of devices correctly sending and receiving traffic. All of the devices use OTAA to join the network and have ADR enabled. So far, so good. All of the working devices are ESP32 chips using SX1262 transceivers with the SX126X-Arduino library.

Now, we are doing tests with an experimental device using an architecture and chipset completely different from the ones before. The new device is ARM-based and uses a “seeed studio WM1110-A” SOC. The original example code is supposed to connect to The Things Network, and we have modified it to use our own DevEUI and AppKey as generated by our Chirpstack. However, the Join procedure fails.

For debugging, I am using a RAK Raspberry Pi 4 gateway and two devices, an ESP32-based one, and the experimental one. Both devices are less than one meter from the rpi4 gateway. I am also simultaneously monitoring the http://{SERVER}/#/organizations/{ORGID}/applications/{APPID}/devices/{DEVEUI}/frames urls for both devices, as well as the MQTT topics under application/{APPID}/device/{DEVID}/event/XXXX .

The behavior I see with the working (ESP32) device is:

  • Device sends JoinRequest
  • JoinRequest appears in the ChirpStack monitoring. This JoinRequest has a devNonce field with some random integer value, always different between tests
  • A JoinAccept is generated and it appears in the Chirpstack monitoring
  • At around the same time, there is a message posted in application/{APPID}/device/{DEVID}/event/join
  • Data starts being exchanged correctly. No more JoinRequest or JoinAccept messages appear in Chirpstack

On the other hand, the experimental device behaves like this:

  • Device sends JoinRequest
  • JoinRequest appears in the Chirpstack monitoring. Unlike the other device, this packet has a devNonce field that is always 1.
  • A JoinAccept is generated, and it appears in the Chirpstack monitoring.
  • There is no corresponding message published at the corresponding application/{APPID}/device/{DEVID}/event/join topic.
  • The device apparently never receives the message. Eventually it sends another join request packet.
  • This second JoinRequest also appears in the Chirpstack monitoring. However the devNonce field is again 1.
  • After this, Chirpstack never emits a JoinAccept packet in the monitoring or any payload at the corresponding MQTT topic.
  • Repeated attempts to send the join request instead end up with publishing on the application/{APPID}/device/{DEVID}/event/error topic, with the ominous text “error: validate dev-nonce error”.
  • This cycle repeats over and over until I turn the device off.

The only way I can reset the behavior for the experimental device is by deleting and recreating the device, with the exact same devid and appkey values. I have found a few topics that mention deleting a row in the device_activation table of the postgresql database, but when I do it, without restarting the networkserver, it does nothing. I cannot freely restart the networkserver because it is a production system.

From what I have read on various topics, I get the idea that the device is no supposed to reuse the same value for devNonce in the join request. However I am at a loss as to why the device does not receive the join accept packet at all, or why the JoinAccept that appears exactly once is shown in the ChirpStack monitoring without a corresponding join event at the MQTT topic.

Is the above scenario familiar to you? Do you have tips on where to debug this issue? Which logs and messages should I look for? A hardware failure could be involved, but it does not explain the lack of a join event visible at the MQTT topic. BTW, at which point is the join event actually posted to MQTT (as opposed to the monitoring), and by which component?

There no mention of the Chirpstack logs here. I doubt I can help you as I’ve never used V3 but if you haven’t checked yet there is likely errors in the logs each time the device attempts to join. Hopefully those might provide some more context for you.

I do not see anything that really stands out in the logs:

Sorry for the weird format, since these are extracts from the docker container logs containing stderr from the networkserver.

These are the logs for a device that joins correctly to the network:

{"log":"time=\"2024-10-08T20:03:39Z\" level=info msg=\"backend/joinserver: request received\" message_type=JoinReq receiver_id=0000000000000000 sender_id=000000 transaction_id=2910269231\n","stream":"stderr","time":"2024-10-08T20:03:39.87
0329874Z"}
{"log":"time=\"2024-10-08T20:03:39Z\" level=info msg=\"device-keys updated\" ctx_id=\"\u003cnil\u003e\" dev_eui=ecda3b6203dc0000\n","stream":"stderr","time":"2024-10-08T20:03:39.873406345Z"}
{"log":"time=\"2024-10-08T20:03:39Z\" level=info msg=\"backend/joinserver: sending response\" dev_eui=ecda3b6203dc0000 message_type=JoinAns receiver_id=000000 result_code=Success sender_id=0000000000000000 transaction_id=2910269231\n","stream":"stderr","time":"2024-10-08T20:03:39.873448365Z"}
...
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"device last-seen and dr updated\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_eui=ecda3b6203dc0000\n","stream":"stderr","time":"2024-10-08T20:03:54.611098019Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"device activation updated\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_addr=002a7a3b dev_eui=ecda3b6203dc0000\n","stream":"stderr","time":"2024-10-08T20:03:54.613434969Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"finished unary call with code OK\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 grpc.code=OK grpc.method=HandleUplinkData grpc.request.deadline=\"2024-10-08T20:03:55Z\" grpc.service=as.ApplicationServerService grpc.start_time=\"2024-10-08T20:03:54Z\" grpc.time_ms=9.279 peer.address=\"172.19.0.8:41312\" span.kind=server system=grpc\n","stream":"stderr","time":"2024-10-08T20:03:54.614141397Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"integration/logger: logging event\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_eui=ecda3b6203dc0000 type=join\n","stream":"stderr","time":"2024-10-08T20:03:54.614164108Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"integration/logger: logging event\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_eui=ecda3b6203dc0000 type=up\n","stream":"stderr","time":"2024-10-08T20:03:54.614981634Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"integration/mqtt: publishing event\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_eui=ecda3b6203dc0000 qos=0 retain=false topic=application/34/device/ecda3b6203dc0000/event/join\n","stream":"stderr","time":"2024-10-08T20:03:54.615046478Z"}
{"log":"time=\"2024-10-08T20:03:54Z\" level=info msg=\"integration/mqtt: publishing event\" ctx_id=e71eb67c-0b56-40d8-9e92-7efa276b15a1 dev_eui=ecda3b6203dc0000 qos=0 retain=false topic=application/34/device/ecda3b6203dc0000/event/up\n","stream":"stderr","time":"2024-10-08T20:03:54.615603628Z"}

These are the logs for the device that fails to join:

{"log":"time=\"2024-10-07T18:42:35Z\" level=info msg=\"backend/joinserver: request received\" message_type=JoinReq receiver_id=52231036a77e1cad sender_id=000000 transaction_id=3807718115\n","stream":"stderr","time":"2024-10-07T18:42:35.250515593Z"}
{"log":"time=\"2024-10-07T18:42:35Z\" level=info msg=\"device-keys updated\" ctx_id=\"\u003cnil\u003e\" dev_eui=52231036a77e1cad\n","stream":"stderr","time":"2024-10-07T18:42:35.255460558Z"}
{"log":"time=\"2024-10-07T18:42:35Z\" level=info msg=\"backend/joinserver: sending response\" dev_eui=52231036a77e1cad message_type=JoinAns receiver_id=000000 result_code=Success sender_id=52231036a77e1cad transaction_id=3807718115\n","stream":"stderr","time":"2024-10-07T18:42:35.255594746Z"}
...
{"log":"time=\"2024-10-07T19:31:16Z\" level=error msg=\"validate dev-nonce error\" dev_eui=52231036a77e1cad type=OTAA\n","stream":"stderr","time":"2024-10-07T19:31:16.289337139Z"}
{"log":"time=\"2024-10-07T19:31:16Z\" level=info msg=\"finished unary call with code OK\" ctx_id=08bba10d-d115-4a09-ac73-02271d5deb4c grpc.code=OK grpc.method=HandleError grpc.service=as.ApplicationServerService grpc.start_time=\"2024-10-07T19:31:16Z\" grpc.time_ms=2.947 peer.address=\"172.19.0.8:41312\" span.kind=server system=grpc\n","stream":"stderr","time":"2024-10-07T19:31:16.290318229Z"}
{"log":"time=\"2024-10-07T19:31:16Z\" level=info msg=\"integration/logger: logging event\" ctx_id=08bba10d-d115-4a09-ac73-02271d5deb4c dev_eui=52231036a77e1cad type=error\n","stream":"stderr","time":"2024-10-07T19:31:16.290357583Z"}
{"log":"time=\"2024-10-07T19:31:16Z\" level=info msg=\"integration/mqtt: publishing event\" ctx_id=08bba10d-d115-4a09-ac73-02271d5deb4c dev_eui=52231036a77e1cad qos=0 retain=false topic=application/19/device/52231036a77e1cad/event/error\n","stream":"stderr","time":"2024-10-07T19:31:16.290365271Z"}

It appears that the join event is not published when the device has been authenticated by the networkserver, but only after the first uplink packet has been received from the just-joined device.

1 Like

Update: it turned out that, since this was an experimental device, the pin wireup was made using a reference design that differed from the one expected by the example LoRaWAN code, but the differences were not pointed out until now. Specifically, the IRQ pin was not the one expected by the library. Once the IRQ pin number was fixed in the library code, all downlink packets could be correctly received, as expected. Sorry for the inconvenience.