App Server Devices Not Visible Periodically

Hey,

We’ve been having an issue for a while, with both v2 and v3, where periodically the app server will stop displaying the list of devices. The rest of the functionality appears to work correctly. Restarting the app server fixes this.

I’ve tried turning on debugging to identify where the issue occurs, however I’m just getting a whole lot of sql queries printed to the log which look identical before and after reset; the results aren’t printed.

Does anyone have any suggestions as to how I might narrow down the cause of this? We’re running the server, gateway bridge, app server, mosquitto, postgres, and redis on the same Ubuntu server at the moment.

As the web-interfaces uses the REST API, the best would be to see if you can reproduce this when interacting directly with the REST API.

We have seen this, too. I reported it back in June, but because it is intermittent, it is difficult to diagnose. Best strategy would be to run repeated queries against the server until a failure occurs. At some point I had a hypothesis that it was an expired token issue.

Hey,

It’s taken a while for the issue to reoccur, but it has and I can’t find anything relevant in the logs.

I logged in with the API and I’m able to get a jwt token and query applications, gateways, organisations, users, etc. successfully, but any query to the /api/devices endpoint is met with an NGINX 504 gateway time-out.

Querying directly, without nginx (i.e. directly to port 8080) yields no response (no timeouts after several minutes).

In the app server log whenever I query /api/applications I see several queries with my username (args="[username]") prefixed. This does not happen when I query /api/devices. I can’t see any immediate entries in the log relating to my request (the log is quite spammy with debug enabled, however).

So it looks a bit like something is failing prior to the first log entry regarding the request.

Upon restarting the app server the /api/devices endpoint works immediately. There is a log entry produced with my username when I request the endpoint.

Any thoughts on how to extract further debug information?

Hey,

This appears to be occurring a couple of times per day at the moment; does anyone have any suggestions for narrowing down the cause on our installation?

Hey @brocaar,

So we’ve found that gRPC API continues to function correctly when the JSON endpoint stops returning for /devices/ queries. We still haven’t encountered any issues with any other endpoints, but the devices one is failing quite regularly at the moment.

We’re assuming there’s something in the JSON wrapper being affected, but can’t quite figure it out.

Do you have any suggestions for sections of the code we should be looking at?

Thanks for your time,

  • B
1 Like

What sort of response (headers, body, etc) do you get back from the REST API when this happens? Does it produce an error?

The REST API simply doesn’t return for the devices endpoint (via NGINX it has a 504 timeout, but direct it just never returns). All other endpoints (that I tested, like applications, organisations) continue to work as normal and return more-or-less immediately.