Prometheus¶

Add Prometheus Helm repository¶

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Also we need to install kube-state-metrics Helm repo:

helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics

Update Helm repository list:

helm repo update

Run following command to see the charts:

helm search repo prometheus-community

Install Prometheus server (default) with Helm¶

helm install prometheus prometheus-community/prometheus --namespace monitoring --create-namespace

Expose services and access to the URL:

Prometheus:

kubectl port-forward service/prometheus-server 8080:80

Then access: http://localhost:8080

Alertmanager:

kubectl port-forward service/prometheus-alertmanager 8081:80

Then access: http://localhost:8081

AlertManager¶

Configure Alert-Manager¶

AlertManager is the tool that sends all notifications via mail or API. Prometheus server has all the alert rules, when an alert is triggered by a rule, Alertmanager will send the notification.

First step is to configure Alertmanager config file (alertmanager.yml)

In here we could add some customization parameters like:

Global: Where we can set our notification sender by SMTP (SendGrid, Gmail...), API (Slack, WeChat..) or simply set the timeouts.
Route: We set a sending logic here, how the mails and to who the notifications will be sent.
Inhibit Rules: Add another set of rules to set diffferent route rules to different receivers
Receivers: Here we set the different receivers. Maybe we need to send notifications to slack, mail, or both. In here we need to add as much receviers that we want to.

In this sample I will set up an API (via Slack) and a mail (via SendGrid)

global:
  # Sendgrid SMTP properties.
  smtp_smarthost: 'smtp.sendgrid.net:587'
  smtp_from: 'Alertmanager <alertmanager@cosckoya.io>'
  smtp_auth_username: 'apikey'
  smtp_auth_password: '<API KEY HERE>'

  # Slack API properties
  slack_api_url: 'https://hooks.slack.com/services/<API TOKEN HERE>'

  receivers:
  - name: mail
    email_configs:
    - to: "admins@mail.me"
      headers:
        Subject: "Alert ({{ .Status }}): {{ .CommonLabels.severity }} {{ .CommonAnnotations.message }} ({{ .CommonLabels.alertname }})"
      html: |
        Greetings,
        <p>
        You have the following firing alerts:
        <ul>
        {{ range .Alerts }}
        <li>{{.Labels.alertname}} on {{.Labels.instance}}</li>
        <li>Labels:</li>
        <li>{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}</li>
        <li>{{ end }}Annotations:</li>
        <li>{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}</li>
        <li>{{ end }}---</li>
        {{ end }}
        </ul>
        </p>

  - name: slack
    slack_configs:
    - channel: '#alerting'
      send_resolved: true
      icon_url: https://avatars3.githubusercontent.com/u/3380462
      title: |-
       [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
       {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
         {{" "}}(
         {{- with .CommonLabels.Remove .GroupLabels.Names }}
           {{- range $index, $label := .SortedPairs -}}
             {{ if $index }}, {{ end }}
             {{- $label.Name }}="{{ $label.Value -}}"
           {{- end }}
         {{- end -}}
         )
       {{- end }}
      text: >-
       {{ range .Alerts -}}
       *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

       *Description:* {{ .Annotations.description }}

       *Details:*
         {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
         {{ end }}
       {{ end }}

  route:
    group_wait: 10s
    group_interval: 5m
    receiver: mail
    repeat_interval: 10s
    routes:
    - match:
        #alertname: DeadMansSwitc
        severity: High
      repeat_interval: 1m
      receiver: slack

This parameters will setup AlertManager to send notifications to the declared "routes"; mail as default and Slack API when the match case is setted up.

Now we have to setup some alerts. These ones are just for demostration and testing that the alert triggers works.

  groups:
  - name: deadman.rules
    rules:
    - alert: deadman
      expr: vector(1)
      for: 10s
      labels:
        severity: Critical
      annotations:
        message: Dummy Check
        summary: This is a dummy check meant to ensure that the entire lerting pipeline is functional.
  - name: pod.rules
    rules:
    - alert: PodLimit
      expr: kubelet_running_pods > 0
      for: 15m
      labels:
        severity: High
      annotations:
        message: 'Kubelet {{$labels.instance}} is running {{$value}} pods.'

Last Step is to setup "alerting" in the prometheus server config file (prometheus.yml):

[...]
  alerting:
    alertmanagers:
      - static_configs:
        - targets:
          - prometheus-alertmanager:80
[...]

REFERENCES¶

- https://awesome-prometheus-alerts.grep.to/rules.html ¶

InfluxDB v1¶

Chronograf¶

Kapacitor¶

InfluxDB v2¶

Telegraf¶

Install InfluxDBv2 in Kubernetes¶

CMD> kubectl create namespace monitoring

Deploy Service Account¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cthulhu
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cthulhu-clusterrole
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: cthulhu
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cthulhu-role
  namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: cthulhu
  namespace: monitoring

Deploy our InfluxDB v2 server¶

apiVersion: v1
kind: Service
metadata:
  name: influxdb
  namespace: monitoring
spec:
  type: ClusterIP
  selector:
    app: influxdb
  ports:
  - name: api
    port: 9999
  - name: gui
    port: 8086
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: influxdb
  namespace: monitoring
spec:
  serviceName: "influxdb"
  selector:
    matchLabels:
      app: influxdb
  template:
    metadata:
      labels:
        app: influxdb
    spec:
      serviceAccount: cthulhu
      containers:
      - name: influxdb
        image: quay.io/influxdb/influxdb:v2.0.3
        resources:
          limits:
            memory: "128Mi"
            cpu: 1
        ports:
        - name: api
          containerPort: 9999
        - name: gui
          containerPort: 8086
        volumeMounts:
        - name: data
          mountPath: /root/.influxdbv2
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        volumeMode: Filesystem

Run a Job to configure the server¶

apiVersion: batch/v1
kind: Job
metadata:
  name: influxdb-setup
  namespace: monitoring
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: create-credentials
          image: quay.io/influxdb/influxdb:v2.0.3
          command:
            - influx
          args:
            - setup
            - --host
            - http://influxdb.monitoring:8086
            - --bucket
            - kubernetes
            - --org
            - InfluxData
            - --password
            - yogsothoth
            - --username
            - admin
            - --token
            - secret-token
            - --force

Deploy Telegraf¶

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegraf
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: telegraf
  template:
    metadata:
      labels:
        app: telegraf
    spec:
      serviceAccount: cthulhu
      volumes:
      - name: config
        configMap:
          name: telegraf
      containers:
      - name: telegraf
        image: telegraf:alpine
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf/telegraf.conf
          subPath: telegraf.conf
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf
  namespace: monitoring
data:
  telegraf.conf: |
    [global_tags]
      infra = "minikube"

    [agent]

      interval            = "10s"
      round_interval      = true
      metric_batch_size   = 1000
      metric_buffer_limit = 10000
      collection_jitter   = "0s"
      flush_interval      = "10s"
      flush_jitter        = "0s"
      precision           = ""
      debug               = false
      quiet               = false
      logfile             = ""
      hostname            = "telegraf"
      omit_hostname       = false

    [[outputs.influxdb_v2]]
      urls         = ["http://influxdb:8086"]

      organization = "InfluxData"
      bucket       = "kubernetes"
      token        = "secret-token"

    [[inputs.cpu]]
      percpu           = true
      totalcpu         = true
      collect_cpu_time = false
      report_active    = false

    [[inputs.disk]]
       ignore_fs = ["rootfs","tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

kubectl port-forward service/influxdb 8086:8086