Chirpstack v4 ERROR DOWNLINK_PAYLOAD_SIZE exceeds

Jerome73 · January 29, 2023, 7:41am

Hi guys,
For downlinks, based on the code, the max_payload_size is calculated from rx2 dr:

// get remaining payload size
        let max_pl_size = self.region_conf.get_max_payload_size(
            self.device_session.mac_version().from_proto(),
            self.device_profile.reg_params_revision,
            self.device_session.rx2_dr as u8,
        )?;

In: chirpstack/data.rs at e78dac316acd3c2c33bfc4f0c48167d1c7458540 · chirpstack/chirpstack · GitHub

So this is different from the dr indicated in the last uplink.

Hope this helps,

dreese · January 29, 2023, 6:04pm

Thanks, I finally figured that out a couple days ago as well. The RX2 window is pegged to DR8 in US915, which has a max size of 53, and DR0 in EU8663, which has a max size of 51 (from the Regional Parameters spec v1.1rA).

What has been confusing me is there is effectively no RX1 window for class C devices. The code snippet posted by @Jerome73 is from set_tx_info_for_rx2 line 1235.

There is also a set_tx_info_for_rx1 which calculates max size as a function of the uplink DR and the region config rx_1_dr_offset value. However, set_tx_info_for_rx1 is not called for Class C devices. From _handle_schedule_next_queue_item line 194:

        if ctx._is_class_c() {
            ctx.get_class_c_device_lock().await?;
            ctx.set_immediately()?;
            ctx.set_tx_info_for_rx2()?;
        }
        if ctx._is_class_b() {
            ctx.set_tx_info_for_class_b_and_lock_device().await?;
        }
        if ctx._is_class_a() {
            return Err(anyhow!("Invalid device-class"));
        }

Note also the “Invalid device-class” error for Class A devices. This indicates the downlink message for Class A devices must already be queued before the uplink message. The uplink handler calls the downlink handle_response which ultimately results in setting both RX1 and RX2 tx info before sending the the response message. However, there is never anything to send for Class C devices since any downlink message for class C devices is is always sent immediately as an RX2 window message. As such, class C devices can never receive a downlink message in the RX1 window.

I have traced through the v3 Go code, and the logic appears the same for downlink messages, although I could swear I was able to send downlink messages larger than 53 bytes with v3. There is a difference in how the device lock is set in the uplink handling, though. Perhaps this is the difference. The v4 uplink handler does not seem to be taking the same RX1 window configuration rx1_delay into consideration. I might very well be missing something and/or not fully understanding the logic flow yet.

All that said, the current implementation appears to work contrary to my understanding of Class C downlink messages. Granted, my understanding could very well be incorrect. While the Regional Parameters doc does not specifically mention device class in the receive window sections, this article in the Semtech technical documents documentation, An In-depth Look at LoRaWAN® Class C Devices, specifically states:

Class C end devices implement the same two receive windows as Class A devices, but they do not close the RX2 window until they send the next transmission back to the server. Therefore, they can receive a downlink in the RX2 window at almost any time. A short window at the RX2 frequency and data rate is also opened between the end of the transmission and the beginning of the RX1 receive window, as illustrated in Figure 1.

While technically Chirpstack does implement the RX1 receive window for Class C devices, it seems impossible to insert a downlink message, as a response to an uplink message, into the system in such a way that it will be treated as being within the RX1 receive delay time period. To do so would require the uplink handler to wait until the configured rx1_delay time period has expired before initiating the RX2 downlink process, which seems non-trivial.

If my understanding of the Class C downlink handling is correct and this is a legitimate issue with Chirpstack, I am happy to write this up as an issue in GitHub. If my understanding is incomplete, I welcome more clarity.

For now, I will adjust my application to better handle downlink messages that are larger than 53 bytes (in US915, 51 in EU868) by breaking them into multiple downlink messages. This seems prudent even if the downlink RX1 window should be used, at least as a fallback. But then I’m not sure if the application layer can know which DR is going to be used for the downlink message. That is a discussion for another day.

brocaar · February 15, 2023, 2:48pm

I noticed this was missing in the v4 implementation (will be included in the next release):

Maybe your issue is related to this?

dreese · February 15, 2023, 5:06pm

Yes, I think this is the issue. Without a slight delay, there is nothing to send when uplink handler builds the downlink response. My integration is handled quickly, but not as quickly as the the downlink handle_response process completes.

In examining the network config defaults, it looks like the default delay is 100ms. If that needs to be increased, we simply add to the network config:

[network]
...
get_downlink_data_delay=<x>
...

I did not have this set in the v3 config, which has the same 100ms default, so it seems like this v4 change is exactly what is needed.

Thanks so much for looking into this. I hope my analysis helped. I know it helped me to better understand how the downlink and receive windows work.

Side question (maybe this should be a new topic, but directly related to this scenario): How would an integration know if it’s able to use the rx1 window (send a larger response) or has to plan for the the rx2 window (possibly send multiple smaller responses)?

It seems to me the only option is to track the downlink queue item ID and wait for the “ack” message containing that ID. If the “ack” never comes, try a smaller message.

Thanks again.

J_Paul · August 24, 2023, 1:25pm

Is there, or was there, something new on the subject? Today I ran into the “ERROR DOWNLINK_PAYLOAD_SIZE exceeds” again. The problem only occurs with “Class C”, if I switch to “Class A” it works. (region_config_id:“eu868”, dr 5, rssi:-56) Since relays are also switched on the node, I would not like to permanently switch to “Class A”. Is there another solution than setting the device to “Class A” via downlink, sending the configuration via downlink and then setting it back to “Class C” via downlink?

dreese · August 25, 2023, 1:29pm

I have not updated the Chirpstack server in a while. I’ll do that and verify whether the previous fix from February is still working as expected.

brocaar · August 26, 2023, 4:46pm

Would it be an option to increase the RX2 data-rate such that your downlink can be sent using Class-A and Class-C? There is never a guarantee that an downlink will be sent using Class-A or Class-C. E.g. if your device sends an uplink and your application sends a response, then ChirpStack will try to send the next item in the queue in the Class-A receive-window. However, it this is not possible (e.g. collision with other item in the gateway queue), then the item stays in the queue and it will be sent as Class-C downlink (in case Class-C is enabled).

In such case the two possibilities I see is smaller payloads or higher data-rates such that downlinks can be sent using Class-A and Class-C.

J_Paul · August 27, 2023, 9:01am

Thanks for taking the time to take a look at this. Certainly there are technical limitations. In my estimation, however, it is a limitation of being within Chirpstack. Regardless of the data rate, an “error” is generated from a certain number of characters, as was the case with “Class A” a few months ago. I’ll investigate this further to be absolutely sure.

dreese · October 31, 2023, 10:15pm

It has taken me a while to get back to checking on this issue. I am still seeing downlink items being discarded due to size issues. I am running Chirpstack 4.5.1.

CONTAINER ID   IMAGE                         COMMAND                  CREATED          STATUS          PORTS                                   NAMES
ef4e8fff07d2   chirpstack/chirpstack:4.5.1   "/usr/bin/chirpstack…"   11 minutes ago   Up 11 minutes   0.0.0.0:80->8080/tcp, :::80->8080/tcp   chirpstack-docker_chirpstack_1

Chirpstack log:

2023-10-31T22:02:39.048732Z TRACE chirpstack::downlink::scheduler: Starting class_b_c_scheduler_loop run
2023-10-31T22:02:39.048777Z TRACE chirpstack::downlink::scheduler: Getting devices that have schedulable queue-items
2023-10-31T22:02:39.053448Z TRACE chirpstack::downlink::scheduler: Got this number of devices with schedulable queue-items device_count=1
2023-10-31T22:02:39.053541Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Handle schedule next-queue item flow
2023-10-31T22:02:39.057715Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Selecting downlink gateway
2023-10-31T22:02:39.057750Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Checking if device has sent its first uplink already
2023-10-31T22:02:39.057757Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Getting Class-C device lock
2023-10-31T22:02:39.057821Z  INFO chirpstack::storage::device: Aquiring device lock dev_eui=0004a30b0026a2db
2023-10-31T22:02:39.058101Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Setting immediately flag
2023-10-31T22:02:39.058118Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Setting tx-info for RX2
2023-10-31T22:02:39.058134Z TRACE schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Getting next device queue-item
2023-10-31T22:02:39.061705Z  INFO schedule{dev_eui=0004a30b0026a2db}: chirpstack::storage::device_queue: Device queue-item deleted id=4e0fd991-a41a-4152-8dd0-abd0c88127a0
2023-10-31T22:02:39.061763Z  WARN schedule{dev_eui=0004a30b0026a2db}: chirpstack::downlink::data: Device queue-item discarded because of max. payload size dev_eui=0004a30b0026a2db device_queue_item_id=4e0fd991-a41a-4152-8dd0-abd0c88127a0
2023-10-31T22:02:39.062678Z TRACE chirpstack::downlink::scheduler: class_b_c_scheduler_loop completed successfully

This looks like the downlink message is still being scheduled in the RX2 window.

I am capturing log messages in my application, which reports the enqueue operation:

2023/10/31 22:02:33 stdout: 10:02PM DBG Response enqueued dev_eui=0004a30b0026a2db device_name=xxxxxx queue_item_id=4e0fd991-a41a-4152-8dd0-abd0c88127a0 size=58

partial log message:

dev_eui:"0004a30b0026a2db"  10:2 Level:ERROR Code:DOWNLINK_PAYLOAD_SIZE Description:Device queue-item discarded because it exceeds the max. payload size Context:map[item_size:58 max_payload_size:53 queue_item_id:4e0fd991-a41a-4152-8dd0-abd0c88127a0]} description="Device queue-item discarded because it exceeds the max. payload size" dev_eui=0004a30b0026a2db device_name=xxxxxx item_size=58 log_code=1 log_level=2 max_payload_size=53 queue_item_id=4e0fd991-a41a-4152-8dd0-abd0c88127a0

I tried setting the get_downlink_data_delay config to 1 second, which I know is long but this was just a test.

[network]

  # Network identifier (NetID, 3 bytes) encoded as HEX (e.g. 010203).
  net_id="000000"

  # Enabled regions.
  #
  # Multiple regions can be enabled simultaneously. Each region must match
  # the 'name' parameter of the region configuration in '[[regions]]'.
  enabled_regions=[
    "us915_1",
  ]

  get_downlink_data_delay="1000ms"

Am I missing something? Is there any more info I can provide?

Doug

brocaar · November 3, 2023, 3:18pm

The downlink is being scheduled as Class-C downlink.

In the upcoming ChirpStack version, there will be some changes to the scheduler. Each device row in the table has a scheduler next run after timestamp. A Class-A uplink will set this next run after into the future, making sure that there will be some “space” between the Class-A uplink and the next Class-C downlink for that device.

github.com

chirpstack/chirpstack/blob/master/chirpstack/src/uplink/data.rs#L556


      
          }
          
          async fn filter_rx_info_by_tenant(&mut self) -> Result<()> {
              trace!("Filtering rx_info by tenant_id");
          
              match filter_rx_info_by_tenant_id(
                  self.application.as_ref().unwrap().tenant_id,
                  &mut self.uplink_frame_set,
              ) {
                  Ok(_) => Ok(()),
                  Err(v) => {
                      // Restore the device-session in case of an error (no gateways available).
                      // This is because during the fcnt validation, we immediately store the
                      // device-session with incremented fcnt to avoid race conditions.
                      let d = self.device.as_ref().unwrap();
                      device::partial_update(
                          d.dev_eui,
                          &device::DeviceChangeset {
                              device_session: Some(d.device_session.clone()),
                              ..Default::default()
                          },

LeoA · February 1, 2024, 8:39am

Hello dear!

How I understand, the problem with Device queue-item discarded because it exceeds the max. payload size is still not solved even in ChirpStack Server v 4.6

The “Class C” End device is connected to ChirpStack Server and can send up to 222 bytes
But downlink is limited by the parameter RX2 data-rate. The solution is only to manually set rx2_dr=3 or higher

J_Paul · February 10, 2024, 6:32pm

@LeoA
I set rx2_dr=3 in /etc/chirpstack/region_eu868.toml and rebooted, but this did not solve the problem with level:“ERROR” code:“DOWNLINK_PAYLOAD_SIZE”. As already reported, the connection quality is very good and the problem only occurs with Dowlinks with “Class C”. Could you, or someone who found a solution for this, report back. I’ve been looking for a solution for a long time.

brocaar · February 26, 2024, 11:37am

What did you reboot? After changing a config setting, you should:

Restart ChirpStack
Wait until ChirpStack sends a mac-command to your end-device to make it aware about the change
Wait for the end-device to confirm this change

As an alternative, you can also reset the device to make sure it rejoins the network.

Only then the changed config will be used.

J_Paul · February 26, 2024, 2:22pm

Probably I was too impatient during my tests, after editing region_eu868.toml, restarting CS, restarting the device and a little patience, the downlink works as expected.
@brocaar
Thank you very much for your help.