Downlinks failing

dewetdt · January 10, 2024, 10:50am

Hi,
I have been running my own LNS of chirpstack v4 (docker) . See image below

I am running some Senscap Helium gateways.
Chirpstack is receiving all the uplinks. However, any downlinks that I queue is not reaching the device.
The downlinks are logged in the

chirpstack/chirpstack:4
chirpstack/chirpstack-gateway-bridge:4
services.

All my devices are CLASS A. Any idea as to what may be going wrong?

sp193 · January 10, 2024, 11:22am

I see two gateway bridges. Which protocol does your gateway use?

If you use GWMP (the UDP Packet Forwarder), you need to ensure that UDP hole-punching can work. Do check your gateway’s logs for whether it is successfully receiving PULL_ACK (heartbeat responded to) from the LNS.

If the gateway has been going in and out of service in Chirpstack, please check that it is sending the statistics message often enough to meet the threshold set in Chirpstack (under the gateway details).

dewetdt · January 10, 2024, 11:43am

Hi @sp193

The gateways we are using is helium devices which runs the semtech UDP packet forwarder.

The gateways are using the helium router. There is no gateways in my chirpstack

On the gateway:
Jan 10 13:09:45 proton lora_pkt_fwd[1222]: INFO: [down] PULL_ACK received in 0 ms

Just some extra info:

Devices: Self developed, and 3rd party
Hotspot: Sensecap (Helium) hotspot (3rd party)
Packet Router: Semtech UDP packet router
LNS: Chirpstack V4 (docker) running on EC2
Console: Chirpstack Console running on the same EC2

From what I can see the LNS successfully queues a downlink to the Hotspot, but for some reason the data never reaches the end devices. It did work, but stopped working end of 2023.

dewetdt · January 11, 2024, 7:04am

@sp193
Can it be that the roundtrip to the device is taking too long? I have the LNS set up with the default 1s RxDelay

sp193 · January 11, 2024, 7:29am

You need to study the gateway’s logs to know. If the RTT is too high, then the downlink will fail to reach the gateway in time and you will get a rejected transmission with the TOO_LATE error. I suppose that such an event would also be echoed into the Events tab of Chirpstack, for the device.

I would check the gateway’s logs anyway. If the downlink is received, there is probably nothing wrong with the backend, unless there seems to be something wrong with its characteristics (like wrong channel, transmitted at the wrong timestamp, wrong datarate). Even if the request is received, you should confirm that the gateway indeed transmitted it.

dewetdt · January 11, 2024, 8:12am

I should probably have mentioned that when I load the device onto Helium Console and queue a Downlink from there, then everything works fine.

So I am pretty sure that the issue is with my docker implementation of the Chirpstack LNS

dewetdt · January 12, 2024, 8:14am

@sp193 I can confirm that the downlink does reach the gateway. Is it possible that the LNS server is under too much strain, resulting in packets reaching the gateway too “late” ?

dewetdt · January 12, 2024, 9:55am

@brocaar any ideas as to what the error may be? The downlinks for the join request successfully reaches the end devices, after which no downlinks gets there successfully.

The gateway receive the downlink (from what I can see in the logs), but I can see nothing on the device logs regarding traffic reaching the device.

dewetdt · January 15, 2024, 12:06pm

Hi @TRN
I managed to get a downlink through by setting the RX window to RX2 only. It seems that the latency is too high from our Servers in Cape Town to make the round trip to the Helium Router and back. Previously our setting was RX1/RX2. This is still a worry as the LNS should use RX2 if RX1 does not work, however it looks like this behaviour is not working as excpected.

Another settings option that can work is to set the RX window to : RX1/RX2
And to set the rx2_prefer_on_link_budget to true. This may however not work if the link budget for RX1 is better.

See the config of my chirpstack that is working (using RX2 window)

# RX window (Class-A).
    #
    # Set this to:
    # 0: RX1 / RX2
    # 1: RX1 only
    # 2: RX2 only
    rx_window=2

    # RX1 delay (1 - 15 seconds).
    rx1_delay=1

    # RX1 data-rate offset
    rx1_dr_offset=0

    # RX2 data-rate
    rx2_dr=0

    # RX2 frequency (Hz)
    rx2_frequency=869525000

    # Prefer RX2 on RX1 data-rate less than.
    #
    # Prefer RX2 over RX1 based on the RX1 data-rate. When the RX1 data-rate
    # is smaller than the configured value, then the Network Server will
    # first try to schedule the downlink for RX2, failing that (e.g. the gateway
    # has already a payload scheduled at the RX2 timing) it will try RX1.
    rx2_prefer_on_rx1_dr_lt=0

    # Prefer RX2 on link budget.
    #
    # When the link-budget is better for RX2 than for RX1, the Network Server will first
    # try to schedule the downlink in RX2, failing that it will try RX1.
    rx2_prefer_on_link_budget=false

dewetdt · January 15, 2024, 12:07pm

@brocaar do you have any idea as to why the fallback from RX1 to RX2 is not happening?

sp193 · January 16, 2024, 1:28am

Did you check the gateway’s logs? This LNS is written in Rust, which gets compiled into machine code. It is very unlikely for it to be overloaded unless you have perhaps tens of thousands of devices sending at once or have a very low-end server.

The fallback to RX2 only happens if the gateway uses GWMP (UDP Packet Forwarder) and rejected a downlink on RX1.

For these reasons, it is necessary that you look at your gateway’s logs to know whether it is really transmitting your downlinks. And why/why not.
Normally, any errors should be echoed in the Events tab for the device. But in case it does not, then you need to study the logs.

For example, if there is a network-related issue that prevents the gateway from getting the downlink request, downlinks cannot be issued and there will be no error.

Or, perhaps your device is somehow unable to receive downlinks on RX1 (e.g. due to having pre-configured settings that are incompatible what the LNS uses, like RX1DROffset). I did not ask in any earlier post, but did you test your devices before installation?

dewetdt · January 16, 2024, 5:55am

Hi @sp193

Thanks for the clarification.
We did test everything wbefore installation, and it was working for quite a while before we started seeing this behaviour.

I am starting to expect that the issue might be latency related. Forcing the downlink on RX2 seems to work. At least now all the devices in the field is working. I do not think the issue is GW related. If I moved the devices to the Helium LNS then the RX1 window is successfully utilised.

system · April 15, 2024, 5:55am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.