Redis: connection pool timeout (code: 2)

msadeghz · September 15, 2021, 10:51am

Hi friends,
I have lunched multiple network servers, multiple application servers on single machine. after some weeks, I see below error when I want to see a gateway data:

lpitka · September 21, 2021, 6:17pm

I am having the same problem. Any luck figuring out a solution?

brocaar · October 13, 2021, 2:43pm

A timeout occurs when the client is unable to connect within a reasonable amount of time to the (Redis) server. Either Redis is unable to respond to the request or there is a network issue causing the connection to fail.

Peter_Varga · September 7, 2022, 3:18am

Have you ever found a solution?

h.wagner · October 21, 2022, 8:04am

Hello. I see the same problem, in the Application server as well as the Network server, and they seem to be influencing each other:
redis-cli info | grep -i "client" gives me

# Clients
connected_clients:5
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:1
tracking_clients:0
clients_in_timeout_table:0
mem_clients_slaves:0
mem_clients_normal:68764
evicted_clients:0

when everything is working fine and idle. But when I open many Application Server tabs, e.g. /frames or /data, the numbers of connected_clients and blocked_clients are raising (by pretty much one per tab), until it hits a threshold at

connected_clients:40
blocked_clients:38

At exactly the same time, the Application server starts showing “redis: connection pool timeout (code: 2)” as a black notification in the lower left corner of a browser window and the Network server starts showing a lot of

Okt 21 08:55:59 lora chirpstack-network-server[29219]: time="2022-10-21T08:55:59.585312729+02:00" level=error msg="gateway/mqtt: acquire lock error" error="acquire lock error: redis: connection pool timeout" key="lora:ns:stats:lock:b827ebfffedbad45:23427c74-8aa9-4047-b5bb-8bbba07ba595" stats_id=23427c74-8aa9-4047-b5bb-8bbba07ba595

in sudo journalctl -f -x -u chirpstack-network-server.service | grep -i "error".

When I open even more browser tabs, the errors become more frequent and the redis seems to hit a ceiling at

connected_clients:41
blocked_clients:40

At the same time, the “Last seen” on /#/organizations/x/gateways becomes “a minute ago” instead of “a few seconds ago”

When I close the browser window containing all those Chirpstack tabs, the client numbers slowly decreases a bit but stays high (e.g. connected_clients:37,blocked_clients:33 after ~30 minutes) , the errors in the network server logs stop. If I open just some more new tabs, the errors start again, as soon as blocked_clients becomes 40.
I have had situations in the past, where the errors started in the night, when a lot of devices wake up and transmit and did not stop until I issued a sudo systemctl restart for redis, chirpstack-application-server or chirpstack-network-server (can’t be sure which of those did the trick).

I did some more experiments including service restarts:

restart chirpstack-application-server: connected_clients:4, blocked_clients:0 (when the browser window with all the tabs is still open, the number raises again step by step)
restart chirpstack-network-server: connected_clients:20, blocked_clients:16 (with the browser window still open, then later rising back to connected_clients:39, blocked_clients:36)
restart redis: connected_clients:3, blocked_clients:0

The VM, on which both the application server and the network server are running is as follows:

lscpu | grep "CPU(s)"
CPU(s):              2

chirpstack-application-server 3.17.8
chirpstack-network-server 3.16.5
Redis 6:7.0.5-1rl1~bionic1
Ubuntu 18.04.6 LTS

I see in the documentation of both Network and Application server

# Connection pool size.
#
# Default (when set to 0) is 10 connections per every CPU.
pool_size=0

and tried to play with it a bit, but neither small values like 2 (made the problem much worse) or high like 100 for pool_size remedied the problem.

@brocaar I see two issues here:

In my opinion, a connection pool timeout of the application server shouldn’t influence the network server. Essentially this would mean, that any user opening “too many” browser tabs would block the whole LoRaWAN network. Also, it’s unclear to me, why the restart chirpstack-network-server halfs (40->20) the client connections. I would have thought, the Network server only has 1-2 connections open while the Application server hogs. Are they maybe somehow “sharing” a pool, e.g. by using the same ID or so?
It looks like the servers, at least the Application server, is not re-using client connections a lot (I think somewhere I read the term “client connection leak” much like “memory leak”). Maybe some tweaking in the timeout times can be done? I googled a bit and found some resources:

Please tell me where I might be mislead or what I have done wrong or how I can provide more information or otherwise help in debugging and fixing this.

Cheers

Jerome · March 13, 2025, 3:45pm

I believe I just experienced the same issue.

Still using Chirpstack v3.

Today I opened the “lorawan frames” on multiple devices and when I came back a few hours later, reception from all gateways had ceased.

Since this is a production setup, I restarted chirpstack network and application, mosquitto and redis just in case, as none of them was in error state. I don’t remember if I closed the “lorawan frames” tabs at that moment or earlier.

Anyway, incoming data is back.

The error message in the logs “acquire lock error: redis: connection pool timeout” directed me here.

I’m not sure I understand exactly what happens, but I agree it’s a shame that connections from a single user on multiple tabs can sort of DDoS the whole system.

Not complaining, just trying to provide feedback.

If this is addressed already in v4, then sorry about the noise, and congratulations for that.

zeara · April 11, 2025, 9:48am

In my current situation of using chirpstackV4, if I tried to open several event data in chirpstack ui, such as 10 tabs.

actually, the connections opened later will keep waiting unless some of the previous event data connections are closed. but there is no error raising on ui.

It behaves as if it is waiting for an available connection. As long as the connection is available, data can be returned.

Jerome · April 14, 2025, 7:51am

@zeara I’m not sure I understand.

Using v4 and opening multiple tabs, are you saying that the last opened tabs wait quietly without any error message (ideally there should be an error message explaining why no data is shown) but sensor data still flows in? Or do you mean that (like in v3), connections from sensors can’t be opened and sensor data is lost?