Hitchhiking on OpenShift's Observability using Custom Grafana Dashboards
OpenShift provides some observability features out-of-the-box, but which features are available depends on what your cluster admin has working and what you have access to. On the NERC, we are able to see container metrics in OpenShift Developer's built-in dashboard. To enable custom visualizations, metrics analysis, and alerting, we need to connect this data to an instance of Grafana which we control.
Finding the Prometheus Endpoint
Red Hat has a couple of articles (1, 2) about how to connect OpenShift to Grafana to build custom dashboards, however their approach is to connect to an internal "thanos-querier" service using a service account with a cluster role. Regular users cannot create cluster roles, so these solutions don't work for us. Nevertheless, it should be possible to query OpenShift's "built-in" Prometheus as regular users since we can do it from the OpenShift web console's developer view.
By pressing f12 to open up Firefox's devtools we can see that the page is making
requests to the URL https://console.apps.shift.nerc.mghpcc.org/api/prometheus-tenancy/api/v1/
,
and its API is the same as Prometheus' API. This looks like our way into OpenShift's internal
instance of Prometheus.
Authentication with prometheus-tenancy
The tricky part is authentication. We need to create a token so that Grafana can make requests
to the Prometheus API. To obtain a token, we need to create a ServiceAccount
, then we need to
give the ServiceAccount
the relevant permissions via a Role
and RoleBinding
to be able
to access the Prometheus API.
oc create sa external-grafana
token="$(oc create token --duration=99999h external-grafana)"
I got an example request by copying a query from devtools with the option "Copy as cURL". This gives me a command which I can use to test whether my token can authenticate with Prometheus.
The curl
command I copied is pasted below, with unnecessary -H
options deleted.
I also added -i
, which tells curl
to print the response headers.
curl -i 'https://console.apps.shift.nerc.mghpcc.org/api/prometheus-tenancy/api/v1/query?query=sum%28container_memory_working_set_bytes%7Bjob%3D%22kubelet%22%2C+metrics_path%3D%22%2Fmetrics%2Fcadvisor%22%2C+cluster%3D%22%22%2C+namespace%3D%22hosting-of-medical-image-analysis-platform-a88466%22%2Ccontainer%21%3D%22%22%2C+image%21%3D%22%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bjob%3D%22kube-state-metrics%22%2C+cluster%3D%22%22%2C+namespace%3D%22hosting-of-medical-image-analysis-platform-a88466%22%2C+resource%3D%22memory%22%7D%29&namespace=hosting-of-medical-image-analysis-platform-a88466' \
-H 'Accept: application/json' \
-H 'Pragma: no-cache' -H 'Cache-Control: no-cache' \
-H 'Cookie: openshift-session-token=sha256~XXXXXXXX; csrf-token=YYYYYYYY; ZZZZ=AAAA'
It returns something like:
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1698013210.062,"0.07837875246062992"]}]}}
So it works! It also works if I delete the csrf-token
cookie and everything which follows —
these values are not being checked.
Our next question is: how do we use the service account token instead of my browser session cookie
(which contains my user account oauth API token)? My instinct was to try -H "Authorization: Bearer $token"
,
however it did not work. Passing my oauth API token in using the Authorization
header did not work either.
Through trial and error, I found out that the /api/prometheus-tenancy
endpoint authenticates using the Cookie
header only.
curl -i 'https://console.apps.shift.nerc.mghpcc.org/api/prometheus-tenancy/api/v1/query?query=sum%28container_memory_working_set_bytes%7Bjob%3D%22kubelet%22%2C+metrics_path%3D%22%2Fmetrics%2Fcadvisor%22%2C+cluster%3D%22%22%2C+namespace%3D%22hosting-of-medical-image-analysis-platform-a88466%22%2Ccontainer%21%3D%22%22%2C+image%21%3D%22%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bjob%3D%22kube-state-metrics%22%2C+cluster%3D%22%22%2C+namespace%3D%22hosting-of-medical-image-analysis-platform-a88466%22%2C+resource%3D%22memory%22%7D%29&namespace=hosting-of-medical-image-analysis-platform-a88466' \
-H 'Accept: application/json' \
-H 'Pragma: no-cache' -H 'Cache-Control: no-cache' \
-H "Cookie: openshift-session-token=$token"
We get a different error message:
Forbidden (user=system:serviceaccount:hosting-of-medical-image-analysis-platform-a88466:external-grafana, verb=get, resource=pods, subresource=)
This is progress, and fortunately the response tells us exactly what we need. The external-grafana
service account needs permission to the pods
resource using the get
verb. Let's fix that by
creating a Role
and RoleBinding
. Just for good measure, we can give the Role
all read-only
verbs get
, list
, and watch
.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-tenancy-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: grafana-external-may-read-prometheus-tenancy
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-tenancy-reader
subjects:
- kind: ServiceAccount
name: external-grafana
Once I oc apply
-ed the Role
and RoleBinding
, the curl
using the service account's $token
worked!
Configuring Grafana
Finally, onto configuring Grafana:
- The URL to use is
https://console.apps.shift.nerc.mghpcc.org/api/prometheus-tenancy/?namespace=hosting-of-medical-image-analysis-platform-a88466
. It is important to include the query string?namespace=...
. - Authentication using the cookie is achieved by setting a "Custom HTTP Header"
Cookie
with the valueopenshift-session-token=...
. - "HTTP Method" must be set to "GET".
- "Custom query parameters" should be set to
namespace=...
- Metrics lookup does not work (see below) so in the "Misc" section I toggled the switch for "Disable metrics lookup".
And... It works!