Loraserver is receiving wrong/inconsistent data from some devices

Hello!

We have implemented Chirpstack (v4) in AWS some time ago and by now it have been working pretty fine except that a few days ago we have been receiving wrong data from some of our devices.

As we can see in the previous picture, the data received goes from one range to another which is not fine since the data that we should be receiving is a value that ramps up slowly.

The devices that fail do not appear to have a particular patron except for that in the place where they are usually powered off due to the place’s maintenance, electricity cuts, etc. So they do a re-join with the loraserver every time this happens. (All of them are OTAA devices)

We are currently trying to figure out what/where the problem is but nothing seems to be wrong at first glance. We already check the logs and there is nothing that indicates a problem, no warnings or error messages.

The first time this problem happened we could solve it by restarting the affected device so it re-joined (via OTAA) but we can not do this every time it happens since some of the devices are very far.

This gave us a little clue about what may be happening, maybe is an OTAA related issue going on. but what exactly? We think that it can be an AppSKey problem since the whole message is readable in Chirpstack as we see in the first pictures and the data is correct except for the payload which is decrypted using the AppSKey. So if the AppSkey used to encrypt the data is not the same that is stored in the loraserver the payload would be incorrectly decrypted which is exactly what we see.(correct me if I am wrong). If this is the problem, how can we resolve this? Is there a code issue or a human error for our part? ( @brocaar )

That said, we wonder if there is anyone that has/had the same problem and how they managed to solve it or if there is anyone that may have a clue in how to solve this.

This is an urgent problem that we need to fix so we really appreciate everyone’s suggestions about how to resolve this.

Thanks in advance,
Best regards.

P.D.

Additionally, we leave a brief resume of our AWS high availability Chirpstack architecture and we leave our Chirpstack configuration files (obviously without the sensible information such as endpoints, users or passwords) since it may be important in this issue .

As previously mentioned, our Chirpstack architecture is mounted in AWS. The architecture is highly available. We’re using Elastic Beanstalk to
deploy ChirpStack in multiple EC2 instances.For Postgres we’re using an RDS cluster, and for Redis an ElastiCache cluster. The RDS cluster has two endpoints, we’re using the primary one. Same for ElastiCache. AWS handles all the clustering under the table, and exposes those endpoints.
The configuration files:

[logging]
  level="info"
[postgresql]
  dsn="postgres://[postgres-user]:[postgres-password]@[postgres-endpoint]:[postgres-port]/chirpstack?sslmode=disable"
  max_open_connections=10
  min_idle_connections=0
  automigrate=false
[redis]
  servers=[
    "redis://[redis-endpoint]:[redis-port]",
  ]
  tls_enabled=false
  cluster=false
[network]
  net_id="000000"
  enabled_regions=[
    "au915_0"
  ]
[api]
  bind="0.0.0.0:8080"
  secret="[api-secret]"
[integration]
  enabled=["mqtt"]
  [integration.mqtt]
    event_topic="application/{{application_id}}/device/{{dev_eui}}/event/{{event}}"
    command_topic="application/{{application_id}}/device/{{dev_eui}}/command/{{command}}"
    server="ssl://[iot-core-endpoint]:[iot-core-port]/"
    json=true
    username=""
    password=""
    qos=0
    clean_session=false
    client_id=""
    ca_cert="[ca-cert-path]"
    tls_cert="[tls-cert-path]"
    tls_key="[tls-key-path]"
​```

[[regions]]
name=“au915_0”
common_name=“AU915”

[regions.gateway]
force_gws_private=false
[regions.gateway.backend]
enabled=“mqtt”
[regions.gateway.backend.mqtt]
event_topic=“gateway/+/event/+”
command_topic=“gateway/{{ gateway_id }}/command/{{ command }}”
server=“ssl://[iot-core-endpoint]:[iot-core-port]/”
username=“”
password=“”
qos=0
clean_session=false
client_id=“”
ca_cert=“[ca-cert-path]”
tls_cert=“[tls-cert-path]”
tls_key=“[tls-key-path]”

[[regions.gateway.channels]]
frequency=915200000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=915400000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=915600000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=915800000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=916000000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=916200000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=916400000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=916600000
bandwidth=125000
modulation=“LORA”
spreading_factors=[7, 8, 9, 10, 11, 12]

[[regions.gateway.channels]]
frequency=915900000
bandwidth=500000
modulation=“LORA”
spreading_factors=[8]

[regions.network]
installation_margin=10
rx_window=0
rx1_delay=1
rx1_dr_offset=0
rx2_dr=8
rx2_frequency=923300000
rx2_prefer_on_rx1_dr_lt=0
rx2_prefer_on_link_budget=false
downlink_tx_power=-1
adr_disabled=false
min_dr=0
max_dr=5
enabled_uplink_channels=[0, 1, 2, 3, 4, 5, 6, 7, 64]

[regions.network.rejoin_request]
enabled=false
max_count_n=0
max_time_n=0

  [regions.network.class_b]
    ping_slot_dr=8
    ping_slot_frequency=0```
2 Likes

Not an answer but the same thing happened with the guys I work with (I don’t normally deal with CS so I don’t know a whole lot about it, still trying to get there). In my case I’m seeing about a quarter of all devices sending scrambled payload data (everything else looks good in the JSON). You can also send downlinks to those devices but I believe they are being received scrambled as well because the modem does not react to them. Somehow half of the session must have been corrupted.

The only tip I can give you is let the modems monitor the connection and restart themselves. In my case if the modem doesn’t get time-syncs for a few days it will reboot.

Which version are you using / are you using the latest ChirpStack version?

1 Like

I asked the guys and they said about 4 months old, I don’t think anything for you to worry about, this is highly likely that there was human error involved.

Hello! we are using 4.0.3 version

Hi @brocaar! How are you? We still have this issue, is there any aditional information that could help to figure out whats happening? We will appreciate your help, thanks in advance!

You maybe running into fcnt issue fixed in version 4.1.1

Make get device-session for phypayload functions update f_cnt. · chirpstack/chirpstack@e3fae62 · GitHub.

2 Likes

How can I find out which version it is? Do you have to go to the API to see the version? Mine says " [ base url: , api version: 1.0.0 ]".
Does that mean it’s a really old version? The guys running this thing told me it was only a few months old when I asked them.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.