NS Database connection times out randomly

UnderTheFigTree · August 23, 2021, 1:46pm

Hi.

I am running Chirpstack AS and NS servers in docker containers on an azure VM and these connect to hosted PostgreSQL database also in azure.

Everything runs perfectly until once every 6 or 7 days on average, the NS seems to loose connection with the database. The AS continues to function perfectly but I need to restart the NS container to establish connection again.

It seems like a single post to the database causes the error (i presume its random, below is the latest one) and then subsequently every database connection fails from the network server.

time=“2021-08-23T03:36:41.017769356Z” level=error msg=“class-b / class-c scheduler error” ctx_id=a038718a-ba99-4d4a-adb0-fb15dabcc255 error=“storage: begin transaction error: write tcp 172.29.0.3:48288->XXX.XXX.XXX.XXX:5432: write: connection reset by peer”

Subsequent error example:

time=“2021-08-23T03:36:44.253322762Z” level=error msg=“uplink: processing uplink frame error” ctx_id=7fca74c0-aec3-46b4-a42e-e7816b0a046e error=“run uplink response flow error: get next device-queue item error: select error: write tcp 172.29.0.3:48288->XXX.XX.XXX.XXX:5432: write: connection reset by peer”

I have checked the logs of the PostgreSQL server and there is no error at the time the NS gets its first error but post that timestamp there are no further connections from the NS logged in the database at all.

Please how can I troubleshoot further?

Thank you

AndyESYS · August 30, 2021, 8:46am

Chirpstack has to be updated to use lib/pq v1.9 or higher. Can you file a bug with chirpstack as and ns?

github.com/lib/pq

At time DB reconnect fail with read tcp [::1]:49528->[::1]:5432: read: connection reset by peer

opened 11:10AM - 11 Mar 19 UTC

meetme2meat

Well, I have a sample script which I have been using to understand whether or h…ow the lib/pq handle the reconnect. ``` package main import ( "database/sql" "fmt" "log" "time" "github.com/lib/pq" ) func main() { url := "postgres://werain:@localhost/postgres?sslmode=disable&application_name=reconnect" connStr, err := pq.ParseURL(url) conn, err := sql.Open("postgres", connStr) if err != nil { log.Fatal(err) } for { result, err := conn.Exec("select 1;") if err != nil { fmt.Println(err) time.Sleep(10 * time.Second) } fmt.Println(result.RowsAffected()) time.Sleep(5 * time.Second) } } ``` I see the reconnect working sometime but there are time when I see the following error ``` read tcp [::1]:49528->[::1]:5432: read: connection reset by peer ``` Now I'm not sure why it works sometimes and does not work the other time. Is this something need to handle by the individual client(i.e reconnect) here? NOTE: I'm not entirely sure if this issue is `database/sql` or `lib/pq` please help me guide through this

brocaar · September 6, 2021, 3:24pm

I have updated lib/pq and will issue a bugfix release soon.

UnderTheFigTree · September 13, 2021, 7:46am

Thank you for the update. Fixing this issue will help us greatly.