Intermittent Device Disconnects After Months of Normal Operation (Heartbeat Once per Day)

Hello everyone,

I am experiencing an intermittent issue with some of my LoRaWAN sensors on ChirpStack.
Here is the scenario:

  • The device performs a join request, receives the join accept, and starts communicating normally (sending at least one heartbeat message per day).
  • After several days, weeks, or even months of flawless operation, the device suddenly sends its last uplink and then never communicates with the network again, with no apparent reason.
  • This issue is random: it does not affect all devices, nor does it happen systematically.

Some additional details:

  • Devices are configured to send at least one message per day.
  • The gateway (iFemtocell Evolution) and ChirpStack LNS receive all messages normally until the disconnect.
  • There is no network or firmware change when the issue occurs.
  • Restarting or resetting the device usually restores communication, but the problem can reappear months later.

Has anyone encountered this kind of behavior before?
What troubleshooting steps would you recommend for diagnosing such a random issue (coverage problems, ADR, RX window configuration, frequency plan, etc.)?

Thank you in advance for your feedback and suggestions!

Best regards,

1 Like

The only time I ever seen an issue similar to this was when I had a hardware or firmware bug in the end device I was working with. Are all devices you are testing with the same type (manufacturer/model)? And can you tell us what the device is?

Debugging firmware on a device is falls way outside of what most people here would do. But in my testing of end devices that failed and I was able to get them working again the issue usually revolved around temperature.

  1. The device gets to hot and stops working. depending on firmware it may recover on its own or require a reset to get it out of the error state.
  2. The device gets to cold and the battery voltage drops causing it to loose power. Usually self recovers when the temp goes back up.

When a device stops reporting what does the RSSI and data rate of the device look like?