In this article I will provide a working setup for log handling with Grafana Loki and the kubernetes logging operator. You need a kubernetes cluster with kube-prometheus-stack helm chart and logging-operator already installed. To keep it simple, there is no access control and we use filesystem storage for persistence.
Motivation
When setting up a Rancher managed RKE2 cluster, one often uses the Helm charts provided by Rancher to install frameworks for monitoring and logging. The rancher-monitoring chart is basically the kube-prometheus-stack helm chart. Although it is currently a few versions behind its upstream, you get all components for a ready-to-use metrics observability setup (Prometheus, Alertmanager and Grafana). With the rancher-logging chart things look different. What you get there is an (unconfigured) logging operator framework which allows to collect logs and forward them to a certain target, but the target itself is missing. As we have Grafana in place already, using Loki for log handling seems obvious. However, I did not find any working example setup, so here is one that should do the job.
The configurations in this article work out‑of‑the‑box, except for one dependency: In our setup Loki is installed in the namespace platform. If you use another namespace, adjust the Loki URL accordingly.
Configuring the Logging Operator
The logging operator uses Flows to collect the logs and Outputs to forward them to a target. We want to collect all logs of the cluster and send them to one single output, so we need just one ClusterFlow and one ClusterOutput.
The ClusterFlow is quite simple:
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
name: all-logs-to-loki
namespace: cattle-logging-system
spec:
globalOutputRefs:
- loki
The ClusterOutput needs a few more details. This is mainly due to the fact that the key for a useable Loki setup is to choose a good set of labels. And good mainly means to limit the combined label cardinality. Loki groups all logs into buckets, and for every existing label combination a separate bucket is created. So if your setup consists of four labels with 20 possible values each, you already end up with 20*20*20*20 = 160.000 potential buckets which is far more than Loki will be able to handle efficiently. There is no exact limit, but from my experience try to keep the real cardinality below 1.000. Hence, I strongly recommend to start with a very limited number of labels and only extend it if it really turns out to be too small for your needs.
The ClusterOutput in my setup is:
# See https://kube-logging.dev/docs/configuration/plugins/outputs/loki/
# See https://kube-logging.dev/docs/configuration/plugins/outputs/buffer/
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
# this is the name referenced by the ClusterFlow
name: "loki"
namespace: "cattle-logging-system"
spec:
loki:
# adapt the namespace in the URL if necessary
url: http://loki-gateway.platform.svc.cluster.local
buffer:
compress: gzip
flush_mode: interval
flush_thread_count: 10
# compromise between fast visibility of logs in Grafana and number of pushes
flush_interval: 5s
# nobody needs the thread label in the output.
include_thread_label: false
labels:
namespace: $.kubernetes.namespace_name
# We may even think of removing this one:
node: $.kubernetes.host
service_name: $.kubernetes.container_name
stream: $.stream
remove_keys:
# Those keys are written to every single log line, so we try to safe some space by removing what we don't need.
- $.logtag
- $.kubernetes.labels
- $.kubernetes_namespace
- $.kubernetes.docker_id
- $.kubernetes.pod_id
- $.kubernetes.container_hash
- $.kubernetes.container_image
- $.kubernetes.annotations
As you can see, this setup defines only four labels, all of them with very limited cardinality. For example, the pod_name is not part of it as pods are created and destroyed quite often and hence their cardinality would quickly become large. If searching for a specific pod, use a line filter instead.
Side note: As an alternative to labels and the content of log lines themselves, Loki also offers structured metadata as a way to store attributes related to log output. Recent versions of fluent bit support them, so feel free to give it a try.
Now as we are ready to send logs to Loki, let’s set up the target.
Loki
For Loki installation we use the Loki Helm chart provided by Grafana. The installation via helm is already documented here, so we concentrate on the values you need to configure.
For clusters with limited log volume using filesystem storage is totally sufficient. Of course, if you have object storage available, feel free to use it. For larger log volumes it will be necessary anyways.
Filesystem storage requires Loki to run in SingleBinary mode, which does not support scaling. The drawback is obvious: Loki cannot scale beyond the capacity of the node it runs on. But it keeps the setup simple. In our cluster we use Longhorn as storage provider, so the volume Loki stores its data is highly available and not bound to a certain node. Without Longhorn or a similar storage provider the setup will still work, but then Loki is pinned to the node where it came up on first start.
The complete values.yaml of our setup looks like this:
deploymentMode: "SingleBinary"
# If you do not run RKE2 or use a different DNS setup, you need to adapt or remove this definition
global:
dnsService: rke2-coredns-rke2-coredns
loki:
commonConfig:
replication_factor: 1
auth_enabled: false
monitoring:
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
storage:
type: 'filesystem'
schemaConfig:
# As of now, this is the latest available storage schema.
configs:
- from: "2024-05-01"
object_store: filesystem
store: tsdb
schema: v13
index:
prefix: index_
period: 24h
limits_config:
max_line_size: 0
ingestion_rate_mb: 500
# In case you want to use them :-)
allow_structured_metadata: true
volume_enabled: true
# Watch the usage of the volume and adjust, if needed.
retention_period: 180d
# Cool recent feature of Loki, tries to find patterns in your logs and groups them accordingly.
# Needs a recent version of Grafana
pattern_ingester:
enabled: true
ingester:
# Depending on your log volume this may need tuning
# wait up to 2 hours for chunks to be filled
chunk_idle_period: 120m
# uncompressed maximum block size per chunk
chunk_block_size: 4194304
lokiCanary:
enabled: false
test:
enabled: false
write:
replicas: 0
read:
replicas: 0
backend:
replicas: 0
singleBinary:
replicas: 1
service:
annotations:
prometheus.io/port: "3100"
prometheus.io/scrape: "true"
# Deploys a set of dashboards and alert rules useful to check and tune the setup
monitoring:
dashboards:
enabled: true
rules:
enabled: true
serviceMonitor:
enabled: true
A few settings may need attention:
- The values don’t include a namespace configuration. We do GitOps using Rancher Fleet, the namespace
platformis set there in fleet.yaml. - The volume size is not set in this config, so the default of 10Gi applies. Depending on your needs, set a different volume size or adjust the retention period
You now need to tell Grafana about Loki. This is done using a datasource definition. Grafana can be configured to load datasources from config maps. The definition in our case is:
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-datasource
namespace: cattle-monitoring-system
labels:
grafana_datasource: "1"
data:
loki-datasource.yaml: |-
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
# Adjust if using a different namespace
url: http://loki-gateway.platform.svc.cluster.local
jsonData:
timeout: 60
For Grafana to check for new datasources, alerts and dashboards in config maps, the following – strongly recommended – configuration has to be part of the values definition of the rancher-monitoring / kube-prometheus-stack chart, so add it if it is not yet there.
grafana:
sidecar:
dashboards:
enabled: true
searchNamespace: ALL
datasources:
enabled: true
searchNamespace: ALL
alerts:
enabled: true
searchNamespace: ALL
You should now see Loki as data source in Grafana and be able to search logs or dig through them with the recently added Explore / Drilldown features.
Caveats
- The setup does not contain any access control. As Loki is only reachable from within the kubernetes cluster, this should be ok in many cases. But ensure that authentication is active for your Grafana UI.
- There is no filtering on your logs, so it may contain sensitive data. If this is a problem, either the logging of your workloads or the ClusterFlow may need tuning.
- The default Loki dashboards don’t work well with the SingleBinary setup, so many visualizations show no data. Use the metrics explorer instead.
Happy searching!



