Ways to handle errors send from devices (energy meter)


I need some input on ways to handle error details provided by the LoRaWAN device/node.

First a little description what we are dealing with before getting to the core of the issue.
We’ve got a couple of energy meters with LoRaWAN installed country wide.

The devices internally use a microcontroller with LoRaWAN which talks to the Modbus interface of the actual energy meter. This is an all in one device and not an accessible separate Modbus-to-LoRaWAN adapter you have to connect to the energy meter.

We have the requirement that the devices have to report the meter reading every 15 minutes and this has to be done “automatically”. We can’t query each of 50+ devices all 15 minutes manually. I think this is quite clear.

The first hardware version did simply stop sending data after a couple of weeks (could have been an internal communication issue - see note on HW version 2).
So we quickly saw that the device went “offline” in ChirpStack when they didn’t send any data within the defined intervall of 35 minutes (2x 15 min reporting interval + 5 min extra delay).

This was easy to check as we just had to ask ChirpStack for the number of inactive devices and it was shown in our own dashboard. Still fixing this by driving ~400+ km (one way) if nobody is near that device is bad as you can imagine.

The second hardware version now installed does keep sending data but it might not send the meter reading but instead an error message or not even that (as I just saw). In case of a problem the device only reports it’s serial number plus some additional error detail (e.g. a modbus communication error occurred by a special error code, etc.). We now can still communicate with the device an ask it to reboot…still not a great solution but better then driving 400+ km as long as the reboot works.

The problem now is that we don’t see that there is a problem because we only get an alert on devices not sending data but in this case the device IS still sending data every ~15 min and is not going completely offline.
So in ChirpStack is always online but still doesn’t work as intented.

The data we get from the device is not directly processed but stored in the attached SQL database (PostgreSQL). We have different users taking energy for ~2 hours to up to several days or even multiple weeks, so we just ask the database for the energy meter readings and see the consumption over time.

To check if there is something wrong with the device we now could create a service that is checking the database every ~30 minutes to see if we have received data not including the meter reading.

Again some external “active” task and it would be great if there would be some alternative.
In the best case it is something we simply could use the existing ChirpStack API which we already use (maybe there is something we are missing - currently the production system is still running the old v3 ChirpStack stack).

I assume we are not the first and only one with this requirement so there should be a solution.
If ChirpStack had another “error” status beside active/inactive/never seen. That error could be triggered while decoding the transmitted data to raise an error alert.

So how are you handling this kind of issue?

Many thanks in advance!