Suddent validation errors

Hi all, my chirpstack installation suddently stopped working and I am trying to debug the issue. I have some thoughts, but I wanted to pick your brains.

  1. I deployed chirpstack chirpstack/chirpstack:4
  2. I configured a bunch of gateways, device profiles and devices
  3. Fast forward a month later, I started to see errors on the UI, more specifically errors like column device_profile.is_relay does not exist and Validator function error error=column device_profile.class_b_timeout does not exist which was very concerning.

Trying to dig further, I realised a few things:

  1. Checking at __diesel_schema_migrations table, I noticed that 2 migrations were executed a few weeks ago.
  2. Comparing my back-ups between the pre-migration and post-migration states, I noticed that indeed, that public.device_profile was severly crippled.

For reference here is the schema, post-migration, and my currently deployed chirpstack version is v4.10.1 (as seen from chirpstack UI):

CREATE TABLE public.device_profile (
    id uuid NOT NULL,
    tenant_id uuid NOT NULL,
    created_at timestamp with time zone NOT NULL,
    updated_at timestamp with time zone NOT NULL,
    name character varying(100) NOT NULL,
    region character varying(10) NOT NULL,
    mac_version character varying(10) NOT NULL,
    reg_params_revision character varying(20) NOT NULL,
    adr_algorithm_id character varying(100) NOT NULL,
    payload_codec_runtime character varying(20) NOT NULL,
    uplink_interval integer NOT NULL,
    device_status_req_interval integer NOT NULL,
    supports_otaa boolean NOT NULL,
    supports_class_b boolean NOT NULL,
    supports_class_c boolean NOT NULL,
    tags jsonb NOT NULL,
    payload_codec_script text NOT NULL,
    flush_queue_on_activate boolean NOT NULL,
    description text NOT NULL,
    measurements jsonb NOT NULL,
    auto_detect_measurements boolean NOT NULL,
    region_config_id character varying(100),
    allow_roaming boolean NOT NULL,
    rx1_delay smallint NOT NULL,
    abp_params jsonb,
    class_b_params jsonb,
    class_c_params jsonb,
    relay_params jsonb,
    app_layer_params jsonb NOT NULL
);

I understand my bad choice of using the 4 tag (instad of pinning it on some explicit version), which may have caused a newer version of chirpstack to be re-deployed when the infrastructure got restarted at some point of time.

But I would have never expected such a damage even if the deployment picked another version from the v4 branch.

Does anyone have any insights regarding why this may have happend?

Could be the 4.7 changes which made some database changes. Although I didn’t think that impacted the device_profiles data.

Could also be that one of your databases version is no longer supported, I’ve seen a lot of talk about old redis versions failing recently.

Could be the 4.7 changes which made some database changes. Although I didn’t think that impacted the device_profiles data.

That was also my first hunch, but noticed that even the v4.7 tag, has the is_relay field in migration.

Could also be that one of your databases version is no longer supported, I’ve seen a lot of talk about old redis versions failing recently.

Not sure where Redis is involved in the Postgres schema :thinking:, but I also don’t have full context. My Redis version is 7.4.1 (and Postgres is 17.2.0).


Some other thoughts:

  • Is chirpstack the only component that does DB migrations? Could it be that some of the other component (chirpstack-rest-api or chirpstack-gateway-bridge) applied that migration?

  • Also, in case it helps, here is the data from the __diesel_schema_migrations table:

    COPY public.__diesel_schema_migrations (version, run_on) FROM stdin;
    00000000000000	2024-12-07 21:55:24.355487
    20220426153628	2024-12-07 21:55:24.383123
    20220428071028	2024-12-07 21:55:24.38533
    20220511084032	2024-12-07 21:55:24.386856
    20220614130020	2024-12-07 21:55:24.392495
    20221102090533	2024-12-07 21:55:24.394456
    20230103201442	2024-12-07 21:55:24.39641
    20230112130153	2024-12-07 21:55:24.397875
    20230206135050	2024-12-07 21:55:24.399435
    20230213103316	2024-12-07 21:55:24.402537
    20230216091535	2024-12-07 21:55:24.404127
    20230925105457	2024-12-07 21:55:24.410898
    20231019142614	2024-12-07 21:55:24.412543
    20231122120700	2024-12-07 21:55:24.414788
    20240207083424	2024-12-07 21:55:24.416508
    20240326134652	2024-12-07 21:55:24.418439
    20240430103242	2024-12-07 21:55:24.421689
    20240613122655	2024-12-07 21:55:24.423634
    20240916123034	2024-12-07 21:55:24.425978
    20241112135745	2024-12-17 09:05:29.777724
    20250113152218	2025-05-06 10:13:27.227628
    20250121093745	2025-05-06 10:13:27.283628
    \.
    

Right, I thought version was a random primary key, seeded by the current date, but I am realizing now it is the date + version of the migration file :thinking:

And indeed, I see that these fields were dropped at 2025-01-13-152218_refactor_device_profile_fields (aka version 20250113152218) :see_no_evil:

Ok, so I think I answered the issue myself… chiprstack was probably bumped at v4.12.1 (possibly when deployed in another worker node - because that was the ‘latest’ v4 tag) and then bumped back to my initial v4.10.1 version (when it went back to the original worker node - because that was the ‘last cached’ v4 tag)

I think the red herring here was the fact that many fields were refactored on the device profile on the v4.12.1 release and I thought the DB was “damaged” :see_no_evil:

(And I guess when the older v4.10.1 version was re-deployed, the “down” migration is not executed, because I assume that’s explicit?)