High load of gateway-bridge

sheenhx · August 17, 2020, 3:03pm

Hey Orne, we are using the most updated gw-bridge in our blockchain LPWAN server, now we have like 6000 gateways online and the bridge is down quite often.

We suspect There might be some sort of race condition / deadlock in the code that is easily exposed when the load is high.

Any hint?

bconway · August 17, 2020, 8:27pm

Interesting issue. How many instances/containers are you running?

sheenhx · August 17, 2020, 8:41pm

does it have something to do with instances? it is running NS+AS+gw-bridge+Blockchain service containder with Kubernetes

you can try our app in apple/google play called MXC datadash, we have 6k GWs online with two SX1302 (essentially two packet forwarder each gw, so 12k connections)

sheenhx · August 18, 2020, 11:33am

We aren’t able to reproduce the gateway-bridge problem on our test-server, running lorhammer with 30000 gateways and two real gateways are regularly updated and gateway-bridge is fine.

But in our blockchain supernode 6000 gateway crashed it . same environment.

brocaar · August 19, 2020, 1:59pm

Could you define:

the bridge is down quite often
in our blockchain supernode 6000 gateway crashed it

Does down mean it becomes less responsive, does it actually crashes and if so, what is the error?

sheenhx · August 19, 2020, 2:32pm

Hi, it means that the packet forwarder of SX1302 returns no PULL_ACK and all lora packets can’t be delivered to server.

Crashed it means we have the other supernodes that has like 100 gateways it is totally fine.

Weirdly there is no error or other things in our logs, so that is why we suspect it is a race condition / deadlock

sheenhx · August 27, 2020, 12:57pm

not a single case of the restart so far since we switched to UTC and deployed the new version of gw-bridge. Reason is still unknown though.