Migrating v3 to v4 questions

happidad · October 7, 2024, 4:56pm

I have a production v3 stack running in AWS with separate Postgresql DB server and redis server and mqtt broker. The v3 stack is running the AS, NS and gateway-bridge in docker containers on a single EC2 instance. I have my gateways connected to the gateway-bridge via basic-station.

I have a test v4 stack running with a new Postgresql DB and new mqtt broker. I’m using the same redis server in both instances. I did have a redis-root-key set in my v3 instance, so I set the redis-root-key to be the same value in the v4 stack.

I successfully migrated by v3 DB to my v4 DB. My hope was to update my gateways to point to the new v4 gateway-bridge (via a DNS update) and all the devices would reconnect to the new v4 stack without any intervention. I see my gateway connected, but my devices do not connect to the v4 stack. This is with some test devices and test gateway. I have 100’s of gateways and 1000’s of devices in my production v3 system, so I’m hoping for very simple migration path.

A couple of questions:

Is there a timely-ness factor in the v3 to v4 db migration? I.e. Do I need to do the migration just before I switch the gateways so that the timestamps are up-to-date in the DB? I.e. I ran the migration a day or 2 before I tried the switch-over.
Is it safe to run the v3 to v4 DB migration multiple times? What if some of the devices have re-connected in the v4 DB? That means some entries would be out of date in the v3 DB.
Is my expectation that the devices should connect to the new stack correct?
Is there a better way to migrate all my devices to my new instance?
What is the best way to debug this?
What messages should I look for in my logs? (v3 vs v4)
Do I need to increase the log levels? (I’m set at info currently)

Thanks in advance for any help!
Kevin

Kevin

Liam_Philipp · October 7, 2024, 7:00pm

After you migrated your databases to your V4 instance, when you look at the “activation” tab of one of your devices. Does it show the OTAA keys? If not you likely missed the 4.7 redis → postgres device session key migration: [release] ChirpStack v4.7

After you do that on your V4 instance the keys should come back in the activation tab and the devices should come online.

I can’t answer many of your specific questions though as I have never done the migration myself.

happidad · October 8, 2024, 5:06am

Thanks for the clue, I missed that step. When I try to run the migration, I hit a timeout error:

~ $ chirpstack -c /etc/chirpstack migrate-device-sessions-to-postgres
2024-10-08T04:59:33.239502Z INFO chirpstack::storage: Setting up PostgreSQL connection pool
2024-10-08T04:59:33.239882Z INFO chirpstack::storage: Applying schema migrations
2024-10-08T04:59:33.263242Z INFO chirpstack::storage: Setting up Redis client
2024-10-08T04:59:33.263404Z INFO chirpstack::storage: Setting Redis prefix prefix=lorawan-prod01:
2024-10-08T04:59:33.263458Z INFO chirpstack::cmd::migrate_ds_to_pg: Migrating device-sessions from Redis to PostgreSQL
2024-10-08T04:59:33.263476Z INFO chirpstack::cmd::migrate_ds_to_pg: Getting DevEUIs from PostgreSQL without device-session
2024-10-08T04:59:33.267930Z INFO chirpstack::cmd::migrate_ds_to_pg: There are 519 devices in PostgreSQL without device-session set
Error: Error occurred while creating a new object: Operation timed out (os error 110)

Caused by:
Operation timed out (os error 110)

Stack backtrace:
0: anyhow::error::::new
1: chirpstack::main::{{closure}}
2: chirpstack::main
3: std::sys_common::backtrace::__rust_begin_short_backtrace
4: main

Any ideas of what might be the cause? Thanks!

Kevin

happidad · October 8, 2024, 4:56pm

I enabled debug logging and got the following line: DEBUG chirpstack::cmd::migrate_ds_to_pg: Migrating device-session dev_eui=c45757ffff3e7acc

Looking at the rust code, it appears it’s trying to retrieve a redis key that doesn’t exist in my redis, that is formatted like this: device:{c45757ffff3e7acc}:ds

If I run the redis-cli and input this command: keys *ds, I get no entries returned. So it appears my redis cache doesn’t contain any device status data. How is that possible? Thanks in advance.

Kevin

happidad · October 11, 2024, 1:24am

Looking at the v3 network-server source code, it appears the device session is stored at {REDIS_KEY}:lora:ns:device:{DEVICE_EUI}. This redis key does not seem to match what the v4 ‘migrate-device-sessions-to-postgres’ command is looking for and that is why the command is not working for me. The key paths look totally different to me. What am I missing?

happidad · October 16, 2024, 12:37am

Ok, looking at the query that the command ‘migrate-device-sessions-to-postgres’ runs, the devices are ones that have never connected or had sessions over a month old, so the device session has expired. I don’t think there should be a device session to migrate in that case.

If that is the case, then I’m back to where I started. I’m not sure why my migration is not working and could use some help. Thanks!

Liam_Philipp · October 16, 2024, 1:04pm

Here is the code for the migration:

use anyhow::Result;
use diesel::prelude::*;
use diesel_async::RunQueryDsl;
use tracing::{debug, info};

use crate::storage::{self, device_session, error::Error, get_async_db_conn, schema::device};
use lrwn::{DevAddr, EUI64};

pub async fn run() -> Result<()> {
    storage::setup().await?;

    info!("Migrating device-sessions from Redis to PostgreSQL");
    info!("Getting DevEUIs from PostgreSQL without device-session");

    let dev_euis: Vec<EUI64> = device::dsl::device
        .select(device::dsl::dev_eui)
        .filter(device::dsl::device_session.is_null())
        .load(&mut get_async_db_conn().await?)
        .await?;

    info!(
        "There are {} devices in PostgreSQL without device-session set",
        dev_euis.len()
    );

    for dev_eui in &dev_euis {
        debug!(dev_eui = %dev_eui, "Migrating device-session");

        let ds = match device_session::get(dev_eui).await {
            Ok(v) => v,
            Err(e) => match e {
                Error::NotFound(_) => {
                    debug!(dev_eui = %dev_eui, "Device does not have a device-session");
                    continue;
                }
                _ => {
                    return Err(anyhow::Error::new(e));
                }
            },
        };

        storage::device::partial_update(
            *dev_eui,
            &storage::device::DeviceChangeset {
                dev_addr: Some(Some(DevAddr::from_slice(&ds.dev_addr)?)),
                device_session: Some(Some(ds)),
                ..Default::default()
            },
        )
        .await?;

        debug!(dev_eui = %dev_eui, "Device-session migrated");
    }

    Ok(())
}

Even without reading GO it is pretty clear theres a section that should ignore devices without device-sessions. So perhaps that is not your issue? A couple shots in the dark:

Did all of you devices migrate over okay? i.e do you have the same number of devices on V3 as V4? I think if a device were missing / deleted that could cause the code to fail.
Was your V3 the latest version of V3 before doing the V3 to V4 migration? If not your redis keys might not be what it expects.

happidad · November 10, 2024, 6:38pm

After a lot of round about investigations, the root issue ended up being a simple misconfiguration. My networking setup did not allow the redis port connection. After allowing the redis port connection, the migration worked as expected.

For more details on my issues and where I went wrong, continue reading.

I initially assumed my chirpstack configuration was fully operational by looking at the logs at startup. Since I didn’t see any redis errors, I thought the redis configuration was fine. I hadn’t looked closely at the chirpstack logs while I was debugging this issue, which was my first mistake. Apparently, no errors get logged until the server actually tries to use redis, which is why the migration was failing at the point it failed.

I don’t know if anyone else has this issue, but for me, a lot of my AWS issues seem to boil down to security group configuration issues and forgetting to open up ports. I have hit this issue multiple times, but I never seem to learn. One easy way to verify connectivity via a particular port from instance to instance is to use telnet from the client to server, specifying the port. If it connects, then the security group is configured correctly.

The “4.7 redis → postgres device session key migration” was not actually an issue for me. The v3 to v4 migration script handled everything for me and this step was not required.

One other difficulty I ran into that might save someone else some pains was where to run the migration script. My chirpstack server and redis server are located in AWS, and network communication to my redis server is only allowed within my AWS subnet. That means I can not directly connect to my redis server from outside of my AWS ec2 instances. I initially was trying to run the migration script from my local development machine. In order to get that to work, I had to use an ec2 node to build an ssh tunnel from my machine to the redis server via the ec2 intermediary. This was a bit of effort to work. And then since I was transitioning from a redis server (used by my v3 chirpstack) to a valkey server (used by my v4 chirpstack), I needed two separate ssh tunnels to each of the redis and valkey servers. A lot of data was travelling down to my local network and back into the AWS network. The much easier solution is to allocate another ec2 instance in the right subnet to build the migration script and run it. Then that ec2 host can directly connect to each of the redis and valkey servers directly.

This is a long post and hopefully someone finds this information useful.

system · February 8, 2025, 6:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.