I have a production v3 stack running in AWS with separate Postgresql DB server and redis server and mqtt broker. The v3 stack is running the AS, NS and gateway-bridge in docker containers on a single EC2 instance. I have my gateways connected to the gateway-bridge via basic-station.
I have a test v4 stack running with a new Postgresql DB and new mqtt broker. I’m using the same redis server in both instances. I did have a redis-root-key set in my v3 instance, so I set the redis-root-key to be the same value in the v4 stack.
I successfully migrated by v3 DB to my v4 DB. My hope was to update my gateways to point to the new v4 gateway-bridge (via a DNS update) and all the devices would reconnect to the new v4 stack without any intervention. I see my gateway connected, but my devices do not connect to the v4 stack. This is with some test devices and test gateway. I have 100’s of gateways and 1000’s of devices in my production v3 system, so I’m hoping for very simple migration path.
A couple of questions:
- Is there a timely-ness factor in the v3 to v4 db migration? I.e. Do I need to do the migration just before I switch the gateways so that the timestamps are up-to-date in the DB? I.e. I ran the migration a day or 2 before I tried the switch-over.
- Is it safe to run the v3 to v4 DB migration multiple times? What if some of the devices have re-connected in the v4 DB? That means some entries would be out of date in the v3 DB.
- Is my expectation that the devices should connect to the new stack correct?
- Is there a better way to migrate all my devices to my new instance?
- What is the best way to debug this?
- What messages should I look for in my logs? (v3 vs v4)
- Do I need to increase the log levels? (I’m set at info currently)
Thanks in advance for any help!
Kevin
Kevin
After you migrated your databases to your V4 instance, when you look at the “activation” tab of one of your devices. Does it show the OTAA keys? If not you likely missed the 4.7 redis → postgres device session key migration: [release] ChirpStack v4.7
After you do that on your V4 instance the keys should come back in the activation tab and the devices should come online.
I can’t answer many of your specific questions though as I have never done the migration myself.
Thanks for the clue, I missed that step. When I try to run the migration, I hit a timeout error:
~ $ chirpstack -c /etc/chirpstack migrate-device-sessions-to-postgres
2024-10-08T04:59:33.239502Z INFO chirpstack::storage: Setting up PostgreSQL connection pool
2024-10-08T04:59:33.239882Z INFO chirpstack::storage: Applying schema migrations
2024-10-08T04:59:33.263242Z INFO chirpstack::storage: Setting up Redis client
2024-10-08T04:59:33.263404Z INFO chirpstack::storage: Setting Redis prefix prefix=lorawan-prod01:
2024-10-08T04:59:33.263458Z INFO chirpstack::cmd::migrate_ds_to_pg: Migrating device-sessions from Redis to PostgreSQL
2024-10-08T04:59:33.263476Z INFO chirpstack::cmd::migrate_ds_to_pg: Getting DevEUIs from PostgreSQL without device-session
2024-10-08T04:59:33.267930Z INFO chirpstack::cmd::migrate_ds_to_pg: There are 519 devices in PostgreSQL without device-session set
Error: Error occurred while creating a new object: Operation timed out (os error 110)
Caused by:
Operation timed out (os error 110)
Stack backtrace:
0: anyhow::error::::new
1: chirpstack::main::{{closure}}
2: chirpstack::main
3: std::sys_common::backtrace::__rust_begin_short_backtrace
4: main
Any ideas of what might be the cause? Thanks!
Kevin
I enabled debug logging and got the following line: DEBUG chirpstack::cmd::migrate_ds_to_pg: Migrating device-session dev_eui=c45757ffff3e7acc
Looking at the rust code, it appears it’s trying to retrieve a redis key that doesn’t exist in my redis, that is formatted like this: device:{c45757ffff3e7acc}:ds
If I run the redis-cli and input this command: keys *ds, I get no entries returned. So it appears my redis cache doesn’t contain any device status data. How is that possible? Thanks in advance.
Kevin
Looking at the v3 network-server source code, it appears the device session is stored at {REDIS_KEY}:lora:ns:device:{DEVICE_EUI}. This redis key does not seem to match what the v4 ‘migrate-device-sessions-to-postgres’ command is looking for and that is why the command is not working for me. The key paths look totally different to me. What am I missing?
Ok, looking at the query that the command ‘migrate-device-sessions-to-postgres’ runs, the devices are ones that have never connected or had sessions over a month old, so the device session has expired. I don’t think there should be a device session to migrate in that case.
If that is the case, then I’m back to where I started. I’m not sure why my migration is not working and could use some help. Thanks!
Here is the code for the migration:
use anyhow::Result;
use diesel::prelude::*;
use diesel_async::RunQueryDsl;
use tracing::{debug, info};
use crate::storage::{self, device_session, error::Error, get_async_db_conn, schema::device};
use lrwn::{DevAddr, EUI64};
pub async fn run() -> Result<()> {
storage::setup().await?;
info!("Migrating device-sessions from Redis to PostgreSQL");
info!("Getting DevEUIs from PostgreSQL without device-session");
let dev_euis: Vec<EUI64> = device::dsl::device
.select(device::dsl::dev_eui)
.filter(device::dsl::device_session.is_null())
.load(&mut get_async_db_conn().await?)
.await?;
info!(
"There are {} devices in PostgreSQL without device-session set",
dev_euis.len()
);
for dev_eui in &dev_euis {
debug!(dev_eui = %dev_eui, "Migrating device-session");
let ds = match device_session::get(dev_eui).await {
Ok(v) => v,
Err(e) => match e {
Error::NotFound(_) => {
debug!(dev_eui = %dev_eui, "Device does not have a device-session");
continue;
}
_ => {
return Err(anyhow::Error::new(e));
}
},
};
storage::device::partial_update(
*dev_eui,
&storage::device::DeviceChangeset {
dev_addr: Some(Some(DevAddr::from_slice(&ds.dev_addr)?)),
device_session: Some(Some(ds)),
..Default::default()
},
)
.await?;
debug!(dev_eui = %dev_eui, "Device-session migrated");
}
Ok(())
}
Even without reading GO it is pretty clear theres a section that should ignore devices without device-sessions. So perhaps that is not your issue? A couple shots in the dark:
- Did all of you devices migrate over okay? i.e do you have the same number of devices on V3 as V4? I think if a device were missing / deleted that could cause the code to fail.
- Was your V3 the latest version of V3 before doing the V3 to V4 migration? If not your redis keys might not be what it expects.