Join-Request with no Join-Accept

Created an account to say you saved my life with this tip.
I was struggling with old hardware (Multitech Gateway with 1.0 LoRa card) and I was sure it was an issue with downlinks as that specific LoRa card is really old but it turns out I’ve set the gateway to private network (it seemed logical as I’m the only one who’ll be using it) and reverting this setting fixed the issue. The device is now reporting thanks to you!

I experienced the same issue where some devices would send a join request, but no join accept was sent, but I had a different cause. The odd thing for my case was I had seen other devices successfully join, but then once a device starting having join issues, a bunch of devices were having join issues.

After a lot of head scratching and poring over logs, I believe I discovered the core issue. I was running a v4 chirpstack server and gateway bridge in docker containers on a single ec2 instance in AWS. I had accidentally under-provisioned the ec2 node (it had less memory than the ec2 instance that was running my v3 application server, v3 network server and v3 gateway bridge in docker containers). I believe the gateway bridge docker process was getting starved and stopped processing the messages in a timely manner.

The first clue I had was when I was viewing the logs, I saw a successful join request/join accept interaction, but the join accept processing on the gateway bridge was delayed by 4 seconds. In the nominal case, the gateway bridge was processing the join accept within 0.0005 seconds, so this was dramatically different. I surmised the gateway bridge started getting farther and farther behind where it was not processing it in time to send a downlink in response to the uplink. I was seeing a visible slowdown to the chirpstack webpage, so I know the chirpstack server was working hard. Also the logs were almost full of chirpstack entries and almost no gateway bridge entries.

I’m going to start another thread to discuss gateway bridge requirements and recommendations.

Hopefully this information is helpful to someone else.

Ok, I spoke too soon. I’m still experiencing the join-request with no join-accept issue, even with a much larger ec2 instance. The gateway-bridge is NOT getting starved because the system utilization is very low.

I see the chirpstack sending the mqtt join_accept message to the mqtt broker, but I never see the gateway-bridge receiving that particular downlink_id.

I see other downlink messages being sent to the gateways in question and they are processing them correctly. It isn’t that all downlinks are failing to get through, only the join-accepts. Any clues? I’m totally at a loss and my full system is no longer working correctly.

In my region.toml config file, I tried increasing the mqtt qos=1 so that the downlink commands are retained until delivered. With aws, I can view the retained messages, but I don’t seem to have any downlink messages that are retained.

Note the logging entry does not specify if it was published with qos=1, so I’m not positive this was done.

I went ahead and changed my gateway-bridge mqtt config to add qos=1, which would be the other direction. I see the qos=1 for the event topics, but since I don’t see them as retained on my broker, they have been received correctly.

The core issue was due to a restriction in the AWS IOT Core MQTT Broker. They limit the number of subscriptions per connection to be 50. The gateway bridge has a subscription per gateway to receive the downlinks from the chirpstack server. Since I have over 100 gateways being served by my system, this limit was interfering with downlink messages, depending on which gateway was receiving the downlink. Unfortunately the way it is implemented, there are no errors in the mqtt client, so there were no errors in the logs and it was impossible to see that this limitation was being hit.

I actually found the issue by switching to the emqx mqtt broker. They have much better visualization into the subscriptions. But the issue with the emqx mqtt broker was that there limit was even stricter, only 10 subscriptions per connection. I could easily see via the emqx website that there were only 10 active subscriptions. It was then that I dug into the limits that are documented.

I switched to hivemq mqtt broker, which allows unlimited subscriptions per connection.

I’m wondering if it is possible to have an optional configuration parameter that specifies the limit of subscriptions per connection and then have the gateway bridge create N listeners to spread the subscriptions over a number of different connections.

1 Like

Your gateway bridge is at the server?
Then why you have more than 10 subscription per client?

Your 100 gateways share the same gateway bridge right?

Each gateway has it’s own subscription for downlink messages. The subscription is in the form of: “gateway/{{ .GatewayID }}/command/#” where the GatewayID is the specific gateway_id and not a wildcard.

I had initially thought it would use a wildcard, so then the number of subscriptions would not be very high. But you can segment your gateway bridges to service a subset of gateways (like per region), so you wouldn’t want the mqtt traffic to hit the gateway bridges that didn’t need it. When a gateway first registers with the gateway bridge, it generates the subscription to the mqtt broker.

1 Like

I see.
You are using their Serverless (free plan) in their cloud platform, right?
If yes, I would say it is the limitation for the serverless plan. It is not the limitation of EMQX itself.

Thanks a lot for your detail explaination.
Limitation for serverless plan.

This is Serverless plan.

No limit for EMQX