Application Server on Kubernetes - no settings and organization

Hi there,

I am trying to set up an application server as well as a network server in our kubernetes cluster. The pods are running and I now exposed the web access to ingress to access application server from outside.

Unfortunately the server seems to have som missconfiguration as two things are missing:
GET /api/internal/settings HTTP/2.0" 501
GET /api/organizations?limit=1 HTTP/2.0" 501

The web interface remains empty and I cannot configure anything.

It seems like settings and organizations are missing. Setting log level to DEBUG did not help - at least I could not find out the reason.

Any idea how to proceed?

Here are som further details:

  • chirpstack helm chart from https://undefinedhashtag.github.io/charts/

    • slightly modified (e.g. added an ingress definition for AS, modified access to MQTT via TLS etc.)
    • uses docker image chirpstack/chirpstack-application-server version 3.11.1
  • mosquitto helm chart from eclipse-mosquitto-mqtt-broker-helm-chart for MQTT instance with TLS authentication.

  • Postgres helm chart from bitnami

    • added the databases chirpas and chirpns manually
    • added two extensions to chirpas: CREATE EXTENSION pg_trgm; CREATE EXTENSION hstore;
  • Redis helm chart from https://charts.bitnami.com/bitnami

it could be a database issue, check it again.

Well - yes. It is probably the database. But what do you mean with check it again? Database is created and has some content:
postgres=# \c chirpas
You are now connected to database “chirpas” as user “postgres”.
chirpas=# \dt
List of relations
Schema | Name | Type | Owner
--------±---------------------------------±------±---------
public | api_key | table | postgres
public | application | table | postgres
public | code_migration | table | postgres
public | device | table | postgres
public | device_keys | table | postgres
public | device_multicast_group | table | postgres
public | device_profile | table | postgres
public | fuota_deployment | table | postgres
public | fuota_deployment_device | table | postgres
public | gateway | table | postgres
public | gateway_ping | table | postgres
public | gateway_ping_rx | table | postgres
public | gateway_profile | table | postgres
public | gorp_migrations | table | postgres
public | integration | table | postgres
public | multicast_group | table | postgres
public | network_server | table | postgres
public | organization | table | postgres
public | organization_user | table | postgres
public | remote_fragmentation_session | table | postgres
public | remote_multicast_class_c_session | table | postgres
public | remote_multicast_setup | table | postgres
public | service_profile | table | postgres
public | user | table | postgres
(24 rows)

chirpas=# select * from organization;
id | created_at | updated_at | name | display_name | can_have_gateways | max_device_count | max_gateway_count
----±------------------------------±------------------------------±-----------±-------------±------------------±-----------------±------------------
1 | 2020-10-05 12:19:31.387449+00 | 2020-10-05 12:19:31.387449+00 | chirpstack | ChirpStack | t | 0 | 0

OK I did some more troubleshooting and found something interesting. I did some tcpdump of the network traffic to see, what happens.

I could figure out, that after the /api/organizations?limit=1 request there is some communication to a service on port 8080 - which is resolved to my loadbalancer (some dns mismatch I guess).

Maybe it has nothing to with database itself.

Does the process connects to its own api via localhost:8080?

hey nice to see you using our helm-chart.
We moved it to https://gitlab.com/wobcom/chirpstack-helm
You can download the chart directly from https://harbor.service.wobcom.de/chartrepo/public

We are using flux to deploy everything. With that said, some of the charts we are using:
postgres-operator @ https://undefinedhashtag.github.io/charts/
redis-operator @ https://github.com/amaizfinance/redis-operator
vernemq helm @ https://github.com/vernemq/docker-vernemq/tree/master/helm/vernemq
prometheus-operator @ https://github.com/prometheus-operator/prometheus-operator

The latest version supports multiple replicas and prometheus monitoring. The older version had a bug where if you set the client_id in the secrets you’d get into a reconnect loop to mqtt. This dropped support for client_id in the secrets and we are using the pod’s name instead and not random.

App-Server Version: 3.12.2

Im working right now on adding Kafka to the mix. You are more then welcome to make a pull request at the gitlab repository and have a choice between nginx and istio.

1 Like

The external-service is the one responsible for the UI and the JSON API
https://www.chirpstack.io/application-server/api/
That should be the one being exposed. The “API Service” is the gRPC Endpoint and I think sometimes nginx-ingress (community) might have Issues with HTTP/2

Yes thats what I thought. Actually the service 8080 - which is the service chirpstack-as-external - is kept like it is defined in the chart. I only added a second service with port 80 and a short name pointing to the same pod port.

But how to find out if the chirpstack AS connects to itself on port 8080?

Some time ago I fugured out that my /etc/resolv.conf has som nasty setting which creates wrong fqdn queries: options ndots:5

Thats why I guess it has something to do with some DNS lookup and pointing to our loadbalancer (located outside of the cluster).

And forgot to mention, we don’t use nginx as ingress but traefik

you could try forwarding the port.
kubectl -n chirpstack port-forward service/chirpstack-as-external 8080:8080
and with a browser on: http://localhost:8080 just to rule out traefik. If you see the dashboard and can login, then the problem is the LB.

I already tried out with ssh tunnel to tunnel port 8080 to my local machine. Same behavior.

Just to clarify it: the access works pretty well but when it comes to those two requests, I see som traffic sourcing at pod and destination is my loadbalancer.

This lead me to the assumption that there migth be some external access from within the AS server during initial load. Does the server get some data from an external source?

I get the following content:

/api/internal/settings:
{“error”:"Not Found: HTTP status code 404; transport: received the unexpected content-type “text/plain; charset=utf-8"”,“code”:12,“message”:"Not Found: HTTP status code 404; transport: received the unexpected content-type “text/plain; charset=utf-8"”,“details”:[]}

/api/organizations?limit=1:
{“error”:"Not Found: HTTP status code 404; transport: received the unexpected content-type “text/plain; charset=utf-8"”,“code”:12,“message”:"Not Found: HTTP status code 404; transport: received the unexpected content-type “text/plain; charset=utf-8"”,“details”:[]}

@oll1d Could you post your ingress definition? I could merge it and try to reproduce the issue.

Ok after one week of vacation I am back and found some interesting startup behavior:

first of all, options ndots:5 in /etc/resolv.conf is really bad. Here is why:

when chrirpstack container strats up, I can see dns lookup for localhost. But localhost is not a full name so it gets additional domain:

localhost.iot.svc.cluster.local, localhost.svc.cluster.local, … and so on. These names are not resolved locally so they are forwarded to nodelocaldns (which is a coreDNS instance running on each node).

The resolver even tries external domain for resolving this service which leads to localhost.my.domain.de

At this point a mechanism of our domain resolver comes into play which responds with IP address of our loadbalancer, as this is our main entry point for everything.

From here this IP is used to connect to poret 8080 - which connects to the dashboard of our loadbalancer instead of 127.0.0.1:8080 which should instead be used.

If resolver would not have this option ndots:5, localhost would remain localhost an would be resolved by /etc/hosts

ahh I see. I could make it configurable with https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config

Setting ndots to 0 does the thing. Would be helpful to add it to your chart. Now it works as expected.

  dnsConfig:
    options:
      - name: ndots
        value: "0"

did the trick.

I wonder why other don’t run into this problem.

As I cannot find any other way to send you a message, I will use this thread: I found another configuration problem when using chripstack with TLS certificates:
In your helper you define the following:
{{- define “chirpstack.appserver.public.host” -}}
{{ template “chirpstack.appserver.fullname” . }}-api:8001
{{- end -}}
But I am using certificates containing a fqdn like chirpstack-as-api.svc.iot.cluster.local there is a communication error when network server posts data to application server complaining about cn does not match.

I modified code to
{{ template “chirpstack.appserver.fullname” }}-api.svc.iot.cluster.local:8001

This does the trick for my setup (which is not the solution in the end but just for the PoC)

Now it comes to some discussion about certificates:

  • shall we use certificates only with short names (hostnames)?
  • shall we use certs with fqdn?

In general/ more a discussion about chirpstack itself:
Which service is used in which direction? As we configure certs for both directions, it seems like AS is sending the certs to NS which in turn uses these certy for sending data to AS.
Maybe it is written somewhere but I did not find any clear chrichtectual description yet.

fqdn if you ask me.
maybe a better approach would be to integrate with cert-manager and let it create the necessary certificates. I actually wanted to use istio mTLS for that, but I cant enable sidecar-injection for the namespace just yet.

This might answer your question:
https://www.chirpstack.io/application-server/use/network-servers/

Could you please open Issues at https://gitlab.com/wobcom/chirpstack-helm that way I dont lose track of the Issues you are finding. That’d be awesome :slight_smile:

Thanks for your answer.

I tried to understand the content of the link you provided concerning network server. The architectual concept looks a little bit confusing to me - maybe it’s because application server incorporates network server and it is not the other way around that a network server gets credentials to connect to an application server.

I also posted an issue in gitlab so you can track this. It is just a copy of the text above.

It’s a “I got a key for the network-server, hey network-server here is a key so you can talk back to me”
In that way, one network-server can connect to multiple app-server, and an app-server can connect to multiple network-server. But maybe @brocaar can expand on that.