docs: Monitoring Stack (Prometheus + Loki + Grafana) dokumentiert
- kube-prometheus-stack + Loki-Stack auf rnk-wrk01 installiert - Grafana Ingress mit nip.io und int.elbpro.de - Storage: 3x Longhorn PVCs (5Gi Grafana, 20Gi Prometheus, 20Gi Loki) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
220
docs/12-monitoring.md
Normal file
220
docs/12-monitoring.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# 12 — Monitoring Stack (Prometheus + Loki + Grafana)
|
||||
|
||||
**Datum:** 2026-03-20
|
||||
**Namespace:** monitoring
|
||||
**Node:** rnk-wrk01 (Hauptkomponenten via nodeSelector)
|
||||
|
||||
---
|
||||
|
||||
## Übersicht
|
||||
|
||||
| Komponente | Aufgabe |
|
||||
|---|---|
|
||||
| **Prometheus** | Metriken scrapen, speichern, auswerten |
|
||||
| **Grafana** | Visualisierung von Metriken und Logs |
|
||||
| **Alertmanager** | Alerts auswerten und weiterleiten |
|
||||
| **Node Exporter** | System-Metriken von allen Nodes (CPU, RAM, Disk) |
|
||||
| **kube-state-metrics** | Kubernetes-Objekt-Status als Metriken |
|
||||
| **Loki** | Log-Aggregation (wie Prometheus, aber für Logs) |
|
||||
| **Promtail** | Log-Collector — läuft als DaemonSet auf allen Nodes |
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Schritt 1: Helm Repos hinzufügen
|
||||
|
||||
```bash
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
### Schritt 2: Namespace erstellen
|
||||
|
||||
```bash
|
||||
kubectl create namespace monitoring
|
||||
```
|
||||
|
||||
### Schritt 3: kube-prometheus-stack installieren
|
||||
|
||||
```bash
|
||||
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
|
||||
--namespace monitoring \
|
||||
--set grafana.adminPassword=bmw520AUDI \
|
||||
--set prometheus.prometheusSpec.nodeSelector."kubernetes\.io/hostname"=rnk-wrk01 \
|
||||
--set grafana.nodeSelector."kubernetes\.io/hostname"=rnk-wrk01 \
|
||||
--set alertmanager.alertmanagerSpec.nodeSelector."kubernetes\.io/hostname"=rnk-wrk01 \
|
||||
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn \
|
||||
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.accessModes[0]=ReadWriteOnce \
|
||||
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi \
|
||||
--set grafana.persistence.enabled=true \
|
||||
--set grafana.persistence.storageClassName=longhorn \
|
||||
--set grafana.persistence.size=5Gi \
|
||||
--wait --timeout=300s
|
||||
```
|
||||
|
||||
### Schritt 4: Loki + Promtail installieren
|
||||
|
||||
```bash
|
||||
helm upgrade --install loki grafana/loki-stack \
|
||||
--namespace monitoring \
|
||||
--set loki.persistence.enabled=true \
|
||||
--set loki.persistence.storageClassName=longhorn \
|
||||
--set loki.persistence.size=20Gi \
|
||||
--set promtail.enabled=true \
|
||||
--wait --timeout=300s
|
||||
```
|
||||
|
||||
### Schritt 5: Grafana Ingress erstellen
|
||||
|
||||
```bash
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: grafana-ingress
|
||||
namespace: monitoring
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: web
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: grafana.192.168.11.180.nip.io
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: kube-prometheus-stack-grafana
|
||||
port:
|
||||
number: 80
|
||||
- host: grafana.int.elbpro.de
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: kube-prometheus-stack-grafana
|
||||
port:
|
||||
number: 80
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Zugangsdaten
|
||||
|
||||
| Feld | Wert |
|
||||
|---|---|
|
||||
| **URL (nip.io)** | http://grafana.192.168.11.180.nip.io |
|
||||
| **URL (intern)** | http://grafana.int.elbpro.de |
|
||||
| **Benutzername** | `admin` |
|
||||
| **Passwort** | `bmw520AUDI` |
|
||||
|
||||
---
|
||||
|
||||
## Pod-Verteilung
|
||||
|
||||
```
|
||||
kubectl get pods -n monitoring -o wide
|
||||
```
|
||||
|
||||
| Pod | Node | Anmerkung |
|
||||
|---|---|---|
|
||||
| alertmanager-* | rnk-wrk01 | nodeSelector gesetzt |
|
||||
| kube-prometheus-stack-grafana-* | rnk-wrk01 | nodeSelector gesetzt |
|
||||
| prometheus-* | rnk-wrk01 | nodeSelector gesetzt |
|
||||
| kube-state-metrics-* | beliebig | kein nodeSelector |
|
||||
| kube-prometheus-stack-operator-* | beliebig | kein nodeSelector |
|
||||
| node-exporter-* | alle 3 Nodes | DaemonSet |
|
||||
| loki-0 | beliebig | kein nodeSelector |
|
||||
| loki-promtail-* | alle 3 Nodes | DaemonSet |
|
||||
|
||||
---
|
||||
|
||||
## Storage (Longhorn PVCs)
|
||||
|
||||
```bash
|
||||
kubectl get pvc -n monitoring
|
||||
```
|
||||
|
||||
| PVC | Größe | StorageClass |
|
||||
|---|---|---|
|
||||
| kube-prometheus-stack-grafana | 5Gi | longhorn |
|
||||
| prometheus-...-db-prometheus-... | 20Gi | longhorn |
|
||||
| storage-loki-0 | 20Gi | longhorn |
|
||||
|
||||
---
|
||||
|
||||
## Loki in Grafana einbinden
|
||||
|
||||
Loki ist **nicht automatisch** als Datasource in Grafana konfiguriert
|
||||
wenn getrennt installiert. Manuell hinzufügen:
|
||||
|
||||
1. Grafana öffnen → **Configuration → Data Sources → Add data source**
|
||||
2. Typ: **Loki**
|
||||
3. URL: `http://loki:3100`
|
||||
4. **Save & Test** → grüner Haken
|
||||
|
||||
---
|
||||
|
||||
## Prometheus in Grafana (Standard)
|
||||
|
||||
Prometheus ist bereits automatisch als Datasource eingetragen
|
||||
(wird vom kube-prometheus-stack Chart konfiguriert).
|
||||
|
||||
---
|
||||
|
||||
## Empfohlene Dashboards importieren
|
||||
|
||||
In Grafana: **Dashboards → Import → ID eingeben**
|
||||
|
||||
| Dashboard | ID | Inhalt |
|
||||
|---|---|---|
|
||||
| Kubernetes Cluster | `6417` | CPU, RAM, Pods, Nodes |
|
||||
| Node Exporter Full | `1860` | System-Metriken pro Node |
|
||||
| Loki Logs | `13639` | Log-Explorer |
|
||||
| Kubernetes Pods | `6336` | Pod-Status und Ressourcen |
|
||||
|
||||
---
|
||||
|
||||
## Helm Releases
|
||||
|
||||
```bash
|
||||
helm list -n monitoring
|
||||
```
|
||||
|
||||
```
|
||||
NAME NAMESPACE REVISION STATUS CHART
|
||||
kube-prometheus-stack monitoring 1 deployed kube-prometheus-stack-x.x.x
|
||||
loki monitoring 1 deployed loki-stack-x.x.x
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bekannte Eigenheiten
|
||||
|
||||
- **Node Exporter** und **Promtail** laufen als DaemonSet auf allen 3 Nodes —
|
||||
nodeSelector hat hier keine Wirkung (by design)
|
||||
- **Loki** landet möglicherweise auf rnk-wrk02 (kein nodeSelector gesetzt) —
|
||||
für Homelab akzeptabel
|
||||
- **Grafana-Passwort** beim `helm upgrade` neu setzen:
|
||||
```bash
|
||||
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
|
||||
--namespace monitoring \
|
||||
--reuse-values \
|
||||
--set grafana.adminPassword=NEUES_PASSWORT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Nächste Schritte
|
||||
|
||||
- [ ] Loki als Datasource in Grafana manuell eintragen (URL: `http://loki:3100`)
|
||||
- [ ] Dashboards importieren (IDs: 6417, 1860, 13639)
|
||||
- [ ] Wildcard-DNS `*.int.elbpro.de → 192.168.11.180` in Pi-hole eintragen
|
||||
- [ ] Alertmanager konfigurieren (E-Mail / Slack Benachrichtigungen)
|
||||
- [ ] Grafana in ArgoCD / GitOps einbinden
|
||||
Reference in New Issue
Block a user