Restore devices in application deleted by mistake

Jerome · July 11, 2024, 5:15pm

I deleted an application by accident.

I have a midnight backup of my VM. Rather than restoring that backup and loose the data between the backup and the mistake, I dumped the DB from the backup to get the information about the devices. I identified the device and device_keys tables.

I recreated the application and very manually recreated the devices using EUI, name and description from the device table, and the app key from the device_keys table.

Devices send on a regular basis, about 10 minutes but I don’t see data coming. There is nothing in the LORAWAN FRAMES tabs since the deletion. I see the frames from before the deletion but nothing since then. I don’t see join attempts, for instance.

Is there something I did wrong (apart from the initial mistake…) or something else I should do?

What could explain the fact that I don’t get any frames?

Will I have better luck if I restore the whole VM backup? (I’d lose data from other applications but at this point it is better than having to get physical access to the sensors.)

Did the sensors even “notice” the app not existing anymore during the outage? The frames they usually send are “UnconfirmedDataUp” but I’m not sure what that means.

Thanks for any help.

(Chirpstack v3)

datnus · July 12, 2024, 2:36am

ChirpStack v3 and before v4.7 store device-session keys in Redis.
The device-session keys were likely deleted when you delete the device.

When you restore the database, the device-session keys in Redis are not restored.
So the device is not accepted by server anymore.

Fastest way is to reboot the device to force a re-join.
Newer devices recently can do a re-join if no downlink from sever after a period.

Jerome · July 12, 2024, 9:25am

Thanks for answering.

Right. Device sessions are stored in Redis so I need to get that back from by backup as well. IIUC, the default TTL is 30 days so they are still valid.

My options are:

Read device sessions in backup redis and write them in production redis.
Restore pre-delete backup of the VM. This makes me lose data on other services stored on the same VM for the period after the deletion. And that’s assuming the restore will work and I won’t lose all connections in the process. I’m not 100% sure about this.

I’m pretty illiterate about redis. I have the feeling it shouldn’t be too complicated but I’m struggling. I can use redis-cli. Python is fine as well if required. What would be the commands I’d need to run?

What are the names of the keys I need to get?
Does it hurt if get all device sessions (without filtering to get only the ones that were deleted)?
What command should I run on the backup to get the device sessions?
What command should I run on production instance to write those device sessions back?

It’s only 18 devices so I don’t mind doing things manually.

If I go for option 2 (full VM backup restore), will it work or is there a chance that for some reason (device sessions shouldn’t be too old if TTL is 30 days but maybe another reason) I lose connection to all devices (including those from other projects that were not deleted)? I always assumed full restore was the silver bullet but now I’m in doubt.

Thanks.

Good news is the EM300 sensors already rejoined. Dragino SHT31 doen’t seem to rejoin automatically, however.

abaena · July 12, 2024, 9:50am

Hi Jerome,

If you are already struggling with devices rejoins I think it’s the best time to migrate to Chirpstack 4.7+ because the sessions are now stored in Postgres instead of Redis making it easier to backup and restore things when everything goes wrong I don’t know how difficult would be to you to do that (you said you’re running Chirpstack V3 right now). But you can start doing a “migration” locally to see if thinks goes wrong or not.

But as far as I know, devices need to rejoin so I don’t know if restoring the sessions will work at all. Some models rejoin automatically 1 time a day. Check Dragino documentation to check if they do periodically rejoins so you can wait to them to rejoin automatically. Otherwise you’ll need to do the rejoin manually by removing the battery. Some devices stop sending data when are not joined to any lorawan server to save battery. So the only way is to force the rejoin manually

Good luck

Jerome · July 12, 2024, 9:56am

Thanks. I intend to migrate to newer Chirpstack some day. I thought I’d run both in parallel and use v4 for new projects, letting old projects die on v3. Anyway, migrating won’t solve my current issue, just facilitate restoration next time I screw up (which I hope won’t happen).

Are you saying that even restoring device sessions from redis won’t fix the communication issue?

I assumed either the device doesn’t notice the outage and keeps sending as usual and restoring the session will work, or the device notices and rejoins like the EM300 did.

abaena · July 12, 2024, 10:11am

Nice!

Yeah, lorawan devices in general are a bit tricky. Each manufacturer implement them in different ways.

I don’t kow if this is your device:

But it says they have a reset button. Can you say someone to press it for you maybe?

20240712_12h05m44s_grim

It seems that it’s the only way to rejoin your devices, I don’t know how to restore sessions and if it’s possible at all. Maybe someone else knows that but I would recommend to do it manually if it’s possible to you.

Check this also:

Maybe you will want to enable or know how to do autorejoin in the future on your Dragino devices.

One last question, how many days have past since your devices gone offline?

Adrián

Jerome · July 12, 2024, 10:22am

Unfortunately, the device are in a remote area where access is complicated. Physical access is something I’d really like to avoid.

I deleted the application yesterday so it’s less than two days. IIUC, this should be shorter than the session TTL.

Is there any reason restoring the device session in redis wouldn’t work?

I just thought of a third and intermediate way: full redis backup restore. Dump all redis DB from VM backup and load it into production redis DB. It would be the same as copying keys except full Redis backup/restore is probably easier to achieve.

Redis is only used for Chirpstack on this VM. What I don’t like about this is that it will also affect other devices sessions, but if the TTL is 30 days, losing the last 2 days of device sessions refreshes in redis shouldn’t be an issue, right?

Would it make sense to try that? Are there risks I didn’t think of?

Jerome · July 15, 2024, 7:53am

Good news. The devices all rejoined automatically in less than 2 days.

I’m still interested in the answers to the following questions:

Assuming the devices don’t auto rejoin and the session TTL is not reached, would restoring the device sessions (from redis, or postgresql for v4.7+) do the trick?
Would it be an issue to reload the whole redis dump from before the deletion rather than cherry-picking lost sessions? I mean can it break the ongoing sessions of other devices? (Assuming no new device session was created after the dump as those would obviously be lost.)

I’ve seen on the forums other people get caught by the app delete button. I just checked and there is indeed a confirmation modal. This didn’t stop me from getting caught. I was operating in bad network conditions, bad screen resolution, holding the laptop in my hands, with limited time, etc. I understand the logic of this button being here, but considering the possibility of mistake and the potential damage, I still think it could be “hidden” a bit more. I’d put it at the bottom of the “application configuration” page. See what GitHub does before deleting a repo or organization. All bordered in red + they ask to type (you may copy-paste) the repo/orga name. I don’t know if this was modified in v4, I need to carve out some time for the migration.

datnus · July 15, 2024, 10:23am

Yes, Chirpstack v4 has this.

I would try to restore the redis for the lost sessions only if possible.
BTW, I never do it yet for Redis.

datnus · July 15, 2024, 10:24am

You use Dragino LSN50v2-S31?
Yes, these nodes have rejoin function by default.

Jerome · July 15, 2024, 12:31pm

Thanks for answering.

Monitoring is only a small part of our activity and we’ve been using LoRa for a few years now, but there are still obscure parts for me. We’re happy it just works. But when it doesn’t, be it range issues, gateway downtime or human error like this one, well… it’s kind of a hard time. Because physical access may be difficult and expensive.

I’m glad I didn’t panic and full-restore the VM right away since ultimately I managed to get everything back in little time.

Maybe an export feature would have helped. I had to grab the EUIs and app keys in a DB dump (manual and error prone). This was asked already and it was suggested to use the API.

I didn’t chose those sensors myself, but if I have to chose other sensors some day, I’ll try to chose devices with rejoin. This is a lifesaver.

Glad to see Chirpstack v4 addresses the “easy delete” issue. One more reason to find time to make the move. Congratulations, and thanks again for developing it and making it available. Chirpstack is part of why it “just works”.

abaena · July 16, 2024, 7:02am

I’m happy to hear that, yeah with Lora you should always wait 1 to 3 days to start panicking, hahaha. The devices usually rejoin automatically, but it depends on the manufacturer.

Regards,

Adrián