LMIC Packets not received by the packet forwarder

Hi,

I have setup a Rak Pilot Pro with loraserver (after setting it first with TTN). I switched to loraserver as I am looking for building a small (~10 devices, single GW) private network for a proof of concept.

The setup was easy and my end device is able to join. I can see the activation from the loraserver UI. All the keys are matching and even got 1 uplink with sensor data.

However, after restarting the end device and a second successful join, now the loraserver is no longer seeing the device. Although, it appears the end device and the packet forwarder are sending the sensor data. It doesn’t seem to each the app server.

I am completely new on how to troubleshoot the lorasever. Any ideas what could be wrong, or where/how I should start investigating.

I am using loraserver 3.0.1 with pretty much all default configuration. The RAKWirless people have done a good job in integrating loraserver, it seems.

EDIT: I went in the APP CONFIGURATION and updated it without changing anything and now the device is seen, but still no Device Data is showing. The page just continuously spins.

Thanks in advance for helping.
Amir.

After some troubleshooting using the guide here https://www.loraserver.io/guides/troubleshooting/gateway/

I am finding out that the gateway isn’t receiving the lora packets from the end device consistently. The end device is sending its sensor data every 3 minutes (a payload of 8 bytes). When I use tcpdump, I can see that the sensor data is received only sporadically. Pings and gateway stats are receives systematically and whenever packets are received they are always forwarded successfully.

Now the very odd thing is that if I switch both the end device and the gateway to use TTN, I don’t see this issue at all. So I am kind of at loss as to what to do to further troubleshoot. Note also that my end device is around 10 meters away from the gateway.

The end device is one of these ESP32 + RFM95 from Heltec Automation. The sketch the Arduino is running is pretty much the example provided by the LMIC Arduino library.

As a side question, I’d like to know what development boards people are using typically for end nodes.

Thanks,
Amir.

Try modifying your Arduino sketch to print out the settings used each time it transmits, particularly frequency, spreading factor, and mode (LoRa or possibly FSK). It might be that your LoRaServer setup is suggesting or allowing the node to switch to a mode unsupported by your gateway configuration, especially in a nearby case where the signal would be very strong.

Are you seeing multiple joins from the node, or just one followed by a long period of uplinks, some of which get through and some of which do not?

The join isn’t so much a problem, although I am seeing that it is trying multiple frequencies. Eventually the join succeeds after some time. Sometimes even immediately.

I bumped the debug level in the LMIC library and here is what I see when the device sends the sensor data. That specific send, didn’t make it to the gateway.

What I don’t understand is why this isn’t a problem at all when using TTN. That is when only the packet forwarder is running?

10:16:10.852 -> 10423710: engineUpdate, opmode=0x900
10:19:10.702 -> ============================
10:19:10.739 -> T = 82 *F
10:19:10.739 -> H = 45 %
10:19:10.773 -> P = 1014 hPa
10:19:10.773 -> M = 85 %
10:19:10.806 -> ============================
10:19:10.806 -> 21673938: engineUpdate, opmode=0x908
10:19:10.881 -> EV_TXSTART
10:19:10.881 -> 21675714: TXMODE, freq=902500000, len=17, SF=7, BW=125, CR=4/5, IH=0
10:19:10.951 -> Will send right away
10:19:11.804 -> 21742609: setupRx1 txrxFlags 0x20 --> 01
10:19:11.842 -> start single rx: now-rxtime: 4
10:19:11.876 -> 21742740: RXMODE_SINGLE, freq=923900000, SF=7, BW=500, CR=4/5, IH=0
10:19:11.946 -> rxtimeout: entry: 21743558 rxtime: 21742734 entry-rxtime: 824 now-entry: 4 rxtime-txend: 62524
10:19:12.833 -> 21805853: setupRx2 txrxFlags 0x1 --> 02
10:19:12.867 -> start single rx: now-rxtime: 4
10:19:12.902 -> 21805985: RXMODE_SINGLE, freq=923300000, SF=12, BW=500, CR=4/5, IH=0
10:19:12.969 -> rxtimeout: entry: 21808549 rxtime: 21805978 entry-rxtime: 2571 now-entry: 4 rxtime-txend: 125768
10:19:13.076 -> 21813190: processRx2DnData txrxFlags 0x2 --> 00
10:19:13.110 -> 21816373: processDnData txrxFlags 00 --> 20
10:19:13.181 -> EV_TXCOMPLETE (includes waiting for RX windows)
10:19:13.218 -> 21822493: engineUpdate, opmode=0x900

Assuming this is the US915 regional parameters, your node transmitted on uplink channel 1 (ie the 2nd if numbering from 0) and looked for an RX1 reply a second later on downlink channel 1. That would be correct, if and only if your network is actually operating on the first sub-band. And since the receive window opens only a second later, we know is this ordinary traffic not a join request which uses a longer interval between uplink and downlink.

In actuality TTN in the US operates on the 2nd sub-band, using channels 8-15 for uplink and the same downlink channels 0-7 as replies must fold into the same range of downlink channels.

Given TTN worked for you, it’s quite probable that your gateway is actually configured for the 2nd subband, and your node has static configuration which makes it start out there, but you have LoRaServer misconfigured to think it is on the first sub-band. When you do a join or get an ADR exchange, you are sending down a channel map that moves the node from the actually supported 2nd subband to the unsupported first subband, at which point you cease getting any messages from it.

Either configure the correct channel list in LoRaServer or don’t configure one at all. The channel list is not involved in figuring out how to respond to nodes, those rules are set by region. Instead it is configuration information to optionally be sent down to the nodes, and if it is going to be sent down, it needs to be a configuration matching what the gateway can actually hear.

Another thing you should check is that you have the public vs. private LoRaWan prefix configured the same way in both your node and your gateway. TTN obviously uses the public setting, while you have a choice for your network. This mechanism is not particularly strong - packets with the “wrong” preamble can be rejected, but they can also fairly regularly “leak” through and be correctly received and decoded anyway, so a mismatch could be a cause of a network that works only unreliably. (Such leakage isn’t really a problem, as packets not matching a known device are discarded by a server; at most they distract one of the gateway chip’s 8 receive engines from paying attention to desired traffic and consume a little backhaul bandwidth)

1 Like

Hi Chris,

Thanks a lot for the explanation. It does make sense that there is some mismatch between Loraserver settings and how TTN operates, given that I am attempting to configure the LoraServer myself and there is little bit of a learning curve :slight_smile:

I basically set up the Rak gateway based on the rPi3 image RakWireless is providing, which certainly was initially set for working with TTN out of the box. They do offer an option to switch the LoraServer and that is the option for my initial setup.

Assuming this is the US915 regional parameters

Yes. Both the end node and the gateway are configured for the US.

In actuality TTN in the US operates on the 2nd sub-band , using channels 8-15 for uplink and the same downlink channels 0-7 as replies must fold into the same range of downlink channels.

I did specify channels 0…7 in my Gateway Profile settings and I am wondering whether that, indeed, is causing an issue. I read somewhere else on the forum that Gateway profiles weren’t strictly needed. But I got into some errors when I didn’t specify one.
So I will try what you suggested and not specify the channel list.

Another thing you should check is that you have the public vs. private LoRaWan prefix configured the same way in both your node and your gateway.

I did actually attempt that (at some point later last night), but again ran into some issues after restarting the loraServer.

There is probably another option, since I really don’t care much about TTN at the moment, is to use the LoraServer OS image installation. I am not sure how well it is supported for the Pilot Pro RAK7243 (which is basically an upgrade of their RAK831).

Anyway, thanks again for the insights. I’ll try the suggestions and report back :wink:

Amir.

If you have not changed the gateway’s own configuration since TTN this would indeed be incorrect - you would have to list channels 8-15 or no channels. Also beware LMiC may lock up if it finds no 500 KHz channel configured unless you modify the code to avoid trying to use one.

There is probably another option, since I really don’t care much about TTN at the moment, is to use the LoraServer OS image installation. I am not sure how well it is supported for the Pilot Pro RAK7243 (which is basically an upgrade of their RAK831).

One of RAK’s newer concentrators has a design flaw in the SPI implementation that requires that you drastically lower the SPI clock speed compared to the several MHz that was workable will a cleanly wired RAK831 or RAK833 or most other brands. I’m not sure if that would apply to the model you mention or not.

Actually a couple more things I can try to further troubleshoot:

  1. Run the whole thing back on TTN and see what thee end node tx/rx frequencies look like compared to using Loraserver
  2. I’ll try to capture the logs for when the ADR exchange actually does go through.

There is really not much in term of static settings on the node LMIC side. I am pretty much using the example here https://github.com/matthijskooijman/arduino-lmic/blob/master/examples/ttn-otaa/ttn-otaa.ino
with updated the keys and a sensor payload instead of the “hello world” string.
The sketch is running with
#define CFG_sx1276_radio 1
#define CFG_us915 1
I’ll take a closer look at the lmic config to see if there is anything that would cause it only work with TTN.
The script is called ttn_otaa.ino after all.

As a side question. I am curious to know what other people are using as far as end node development boards.

You are right, that could be an issue as well. I don’t think the gateway has any 500KHz channels configured. And I haven’t specified any. I think only 125 and 250 are defined.
So if LMIC is trying 500KHz that certainly would be a problem. There is probably an API to force lmic to use certain channel configurations. I’d have to look.

I didn’t see this before, but looks like RAK actually provides Lora-Server-OS images:
https://downloads.rakwireless.com/en/LoRa/LoRa-Server-OS/

Changing the Gateway Profile to add channels 0…15 doesn’t fix the issue.
Also that channels field is required as it turns out.

I also tried removing the gw profile completely and not specifying one in the gateway settings. That didn’t solve the issue either. I double checked the channel “map” in the config and all looks fine to me.

Here is the output when I restarted the packet-forwarder

Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: radio 0 enabled (type SX1257), center frequency 904300000, RSSI offset -166.000000, tx enabled 1, tx_notch_freq 0
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: radio 1 enabled (type SX1257), center frequency 905000000, RSSI offset -166.000000, tx enabled 0, tx_notch_freq 0
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 0> radio 0, IF -400000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 1> radio 0, IF -200000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 2> radio 0, IF 0 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 3> radio 0, IF 200000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 4> radio 1, IF -300000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 5> radio 1, IF -100000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 6> radio 1, IF 100000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora multi-SF channel 7> radio 1, IF 300000 Hz, 125 kHz bw, SF 7 to 12
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: Lora std channel> radio 0, IF 300000 Hz, 500000 Hz bw, SF 8
Aug 02 14:08:17 rak-gateway ttn-gateway[4655]: INFO: FSK channel 8 disabled

Make sure you have configured the enabled_uplink_channels in your loraserver.toml configuration file correctly. Changing the Gateway Profile does not change the channels that LoRa Server communicates to your device to use.

See also: https://forum.loraserver.io/t/who-is-using-the-gateway-profile-how-are-you-using-it/5091. The Gateway Profile will be deprecated as it doesn’t do what people think it does.

I suggest you complete the Gateway Profile completely and focus on the enabled_uplink_channels setting :slight_smile:

I think I read that post maybe 5 times :wink:

When I setup the gateway, I use the RakWireless provided console GUI (i.e sudo gateway-config). It allows me to switch between TTN and Loraserver and it also updates the loraserver.toml, based on the frequency I can choose from the menu.

So currently it is set to 
  # LoRaWAN regional band configuration.
  #
  # Note that you might want to consult the LoRaWAN Regional Parameters
  # specification for valid values that apply to your region.
  # See: https://www.lora-alliance.org/lorawan-for-developers
  [network_server.band]
  name="US_902_928"
  # LoRaWAN network related settings.
  [network_server.network_settings]
  enabled_uplink_channels=[8, 9, 10, 11, 12, 13, 14, 15]

After updating loraserver.toml according to https://www.loraserver.io/loraserver/install/config/ I am able to see some uplinks going through, usually after joining. But only 1 or 2 and then nothing.

[{"tmst":611286028,"time":"2019-08-03T00:00:26.150757Z","tmms":1248825645151,"chan":4,"rfch":1,"freq":904.700000,"stat":1,"modu":"LORA","datr":"SF10BW125","codr":"4/5","lsnr":10.0,"rssi":-59,"size":18,"data":"QNn/oQGAEQABOTPii/H9dhtG"}]}

And it doesn’t seem to matter whether I set the enabled_uplink_channels to all, the first 8 or last of the US915 frequency plan. The RAK Pilot Pro is only 8 channels and when set to working with TTN, it is set to the upper channels.

In my Gateway Profile I just set the list to 0…15.

Here is another interesting pattern I am seeing.

When I restart the loraserver and then reset the end device to perform a join. The first uplink is received (it happens on the next scheduled TX, 60 seconds after the join). Then after that it is complete silence.

20:12:52.409 -> 10018187: engineUpdate, opmode=0x908
20:12:52.479 -> EV_TXSTART
20:12:52.479 -> 10019965: TXMODE, freq=905300000, len=18, SF=10, BW=125, CR=4/5, IH=0
20:12:52.547 -> Packet queued
20:12:53.677 -> 10103149: setupRx1 txrxFlags 0x20 --> 01
20:12:53.712 -> start single rx: now-rxtime: 3
20:12:53.750 -> 10103280: RXMODE_SINGLE, freq=927500000, SF=10, BW=500, CR=4/5, IH=0
20:12:53.819 -> 10108374: process options (olen=0x5)
20:12:53.852 -> 10108382: decodeFrame txrxFlags 0x1 --> 21
20:12:53.920 -> 10109502: Received downlink, window=RX1, port=-1, ack=0, txrxFlags=0x21
20:12:53.993 -> EV_TXCOMPLETE (includes waiting for RX windows)
20:12:54.029 -> 10117445: engineUpdate, opmode=0x800

The trace from the end device above shows it actually got response in the RX1 window.

All subsequent uplinks after that are failing. The interesting part is when the device is using the TTN gateway, it never got a response back. Yet the uplinks were all successful.

I think I solve this issue (partly thanks to a better understanding of the needed configuration).
On the gateway side, I had to set the uplink channels from 8…15

  [network_server.band]
  name="US_902_928"
  # LoRaWAN network related settings.
  [network_server.network_settings]
  enabled_uplink_channels=[8,9,10,11,12,14,15]

But that’s not sufficient. On the end device using LMIC Arduino library, I also had to tell it which sub-band to use, based on the region frequency. Like below:

    //LMIC init.
    os_init();
    // Reset the MAC state. Session and pending data transfers will be discarded.
    LMIC_reset();
    // Disable link-check mode and ADR, because ADR tends to complicate testing.
    LMIC_setLinkCheckMode(0);
    // Set the data rate to Spreading Factor 7.  This is the fastest supported rate for 125 kHz channels, and it
    // minimizes air time and battery power. Set the transmission power to 14 dBi (25 mW).
    LMIC_setDrTxpow(DR_SF7,14);

    // For the US915 set the sub-band 1 (channels 8-15).
    LMIC_selectSubBand(1);

With that, not only the join time was blazing fast, but I now have an uplink sensor data being successfully transmitted every 60 seconds (for testing purpose). The actually interval will be more like 30-60 minutes.

The one thing I am not completely sure about is still the enabled_uplink_channels I have set it to match the sub-band on the node, because the channels are going to be provided by the gateway to the node after it joins, overwriting the sub-band selected before the join.
I don’t know if that is the case or not and if setting enabled_uplink_channels to an empty slice would actually still work or not.

But I think having both end exactly match will yield better performance overall.

Also thanks to everyone for providing with some very valuable clues.