Prometheus监控神器

Stella981
• 阅读 718

在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。

本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求进行选择,后面会详细讲解针对数据持久化的具体细节。

部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。

如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos 自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。

环境

我的本地环境使用的 sealos 一键部署,主要是为了便于测试。

OS

Kubernetes

HostName

IP

Service

Ubuntu 18.04

1.17.7

sealos-k8s-m1

192.168.1.151

node-exporter prometheus-federate-0

Ubuntu 18.04

1.17.7

sealos-k8s-m2

192.168.1.152

node-exporter grafana alertmanager-0

Ubuntu 18.04

1.17.7

sealos-k8s-m3

192.168.1.150

node-exporter alertmanager-1

Ubuntu 18.04

1.17.7

sealos-k8s-node1

192.168.1.153

node-exporter prometheus-0 kube-state-metrics

Ubuntu 18.04

1.17.7

sealos-k8s-node2

192.168.1.154

node-exporter prometheus-1

Ubuntu 18.04

1.17.7

sealos-k8s-node2

192.168.1.155

node-exporter prometheus-2

`# 给master跟node加标签
# prometheus
kubectl label node sealos-k8s-node1 k8s-app=prometheus
kubectl label node sealos-k8s-node2 k8s-app=prometheus
kubectl label node sealos-k8s-node3 k8s-app=prometheus
# federate
kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
# alertmanager
kubectl label node sealos-k8s-m2 k8s-app=alertmanager
kubectl label node sealos-k8s-m3 k8s-app=alertmanager

#创建对应的部署目录
mkdir /data/manual-deploy/ && cd /data/manual-deploy/
mkdir alertmanager  grafana  ingress-nginx  kube-state-metrics  node-exporter  prometheus

`

部署 Prometheus

创建Prometheus的storageclass配置文件

cat prometheus-data-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata:   name: prometheus-lpv provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer

创建Prometheus的sc的pv配置文件,同时指定了调度节点。

`# 在需要调度的Prometheus的node上创建目录与赋权
mkdir /data/prometheus
chown -R 65534:65534 /data/prometheus

cat prometheus-federate-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node1


apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-1
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node2
---          
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-2
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node3
`

创建Prometheus的RBAC文件。

`cat prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1 # api的version
kind: ClusterRole # 类型
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: # 资源
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"] 
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]


apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus # 自定义名字
  namespace: kube-system # 命名空间


apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef: # 选择需要绑定的Role
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects: # 对象
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system
`

创建Prometheus的configmap配置文件。

cat prometheus-configmap.yaml apiVersion: v1 kind: ConfigMap metadata:   name: prometheus-config   namespace: kube-system data:   prometheus.yml: |     global:       scrape_interval:     30s       evaluation_interval: 30s       external_labels:         cluster: "01"     scrape_configs:     - job_name: 'kubernetes-apiservers'       kubernetes_sd_configs:       - role: endpoints       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token       relabel_configs:       - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]         action: keep         regex: default;kubernetes;https     - job_name: 'kubernetes-nodes'       kubernetes_sd_configs:       - role: node       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token       relabel_configs:       - action: labelmap         regex: __meta_kubernetes_node_label_(.+)       - target_label: __address__         replacement: kubernetes.default.svc:443       - source_labels: [__meta_kubernetes_node_name]         regex: (.+)         target_label: __metrics_path__         replacement: /api/v1/nodes/${1}/proxy/metrics     - job_name: 'kubernetes-cadvisor'       kubernetes_sd_configs:       - role: node       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token       relabel_configs:       - action: labelmap         regex: __meta_kubernetes_node_label_(.+)       - target_label: __address__         replacement: kubernetes.default.svc:443       - source_labels: [__meta_kubernetes_node_name]         regex: (.+)         target_label: __metrics_path__         replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor       metric_relabel_configs:       - action: replace         source_labels: [id]         regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'         target_label: rkt_container_name         replacement: '${2}-${1}'       - action: replace         source_labels: [id]         regex: '^/system\.slice/(.+)\.service$'         target_label: systemd_service_name         replacement: '${1}'     - job_name: 'kubernetes-pods'       kubernetes_sd_configs:       - role: pod       relabel_configs:       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]         action: keep         regex: true       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]         action: replace         target_label: __metrics_path__         regex: (.+)       - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]         action: replace         regex: ([^:]+)(?::\d+)?;(\d+)         replacement: $1:$2         target_label: __address__       - action: labelmap         regex: __meta_kubernetes_pod_label_(.+)       - source_labels: [__meta_kubernetes_namespace]         action: replace         target_label: kubernetes_namespace       - source_labels: [__meta_kubernetes_pod_name]         action: replace         target_label: kubernetes_pod_name     - job_name: 'kubernetes-service-endpoints'       kubernetes_sd_configs:       - role: endpoints       relabel_configs:       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]         action: keep         regex: true       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]         action: replace         target_label: __scheme__         regex: (https?)       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]         action: replace         target_label: __metrics_path__         regex: (.+)       - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]         action: replace         target_label: __address__         regex: ([^:]+)(?::\d+)?;(\d+)         replacement: $1:$2       - action: labelmap         regex: __meta_kubernetes_service_label_(.+)       - source_labels: [__meta_kubernetes_namespace]         action: replace         target_label: kubernetes_namespace       - source_labels: [__meta_kubernetes_service_name]         action: replace         target_label: kubernetes_name       - source_labels: [__address__]         action: replace         target_label: instance         regex: (.+):(.+)         replacement: $1

创建Prometheus的Statefulset配置文件。

cat prometheus-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata:   name: prometheus   namespace: kube-system   labels:     k8s-app: prometheus     kubernetes.io/cluster-service: "true" spec:   serviceName: "prometheus"   podManagementPolicy: "Parallel"   replicas: 3   selector:     matchLabels:       k8s-app: prometheus   template:     metadata:       labels:         k8s-app: prometheus       annotations:         scheduler.alpha.kubernetes.io/critical-pod: ''     spec:       affinity:         podAntiAffinity:           requiredDuringSchedulingIgnoredDuringExecution:           - labelSelector:               matchExpressions:               - key: k8s-app                 operator: In                 values:                 - prometheus             topologyKey: "kubernetes.io/hostname"       priorityClassName: system-cluster-critical       hostNetwork: true       dnsPolicy: ClusterFirstWithHostNet       containers:       - name: prometheus-server-configmap-reload         image: "jimmidyson/configmap-reload:v0.4.0"         imagePullPolicy: "IfNotPresent"         args:           - --volume-dir=/etc/config           - --webhook-url=http://localhost:9090/-/reload         volumeMounts:           - name: config-volume             mountPath: /etc/config             readOnly: true         resources:           limits:             cpu: 10m             memory: 10Mi           requests:             cpu: 10m             memory: 10Mi       - image: prom/prometheus:v2.20.0         imagePullPolicy: IfNotPresent         name: prometheus         command:           - "/bin/prometheus"         args:           - "--config.file=/etc/prometheus/prometheus.yml"           - "--storage.tsdb.path=/prometheus"           - "--storage.tsdb.retention=24h"           - "--web.console.libraries=/etc/prometheus/console_libraries"           - "--web.console.templates=/etc/prometheus/consoles"           - "--web.enable-lifecycle"         ports:           - containerPort: 9090             protocol: TCP         volumeMounts:           - mountPath: "/prometheus"             name: prometheus-data           - mountPath: "/etc/prometheus"             name: config-volume         readinessProbe:           httpGet:             path: /-/ready             port: 9090           initialDelaySeconds: 30           timeoutSeconds: 30         livenessProbe:           httpGet:             path: /-/healthy             port: 9090           initialDelaySeconds: 30           timeoutSeconds: 30         resources:           requests:             cpu: 100m             memory: 100Mi           limits:             cpu: 1000m             memory: 2500Mi         securityContext:             runAsUser: 65534             privileged: true       serviceAccountName: prometheus       volumes:         - name: config-volume           configMap:             name: prometheus-config   volumeClaimTemplates:     - metadata:         name: prometheus-data       spec:         accessModes: [ "ReadWriteOnce" ]         storageClassName: "prometheus-lpv"         resources:           requests:             storage: 5Gi

创建Prometheus的svc配置文件

cat prometheus-service-statefulset.yaml apiVersion: v1 kind: Service metadata:   name: prometheus   namespace: kube-system spec:   ports:     - name: prometheus       port: 9090       targetPort: 9090   selector:     k8s-app: prometheus   clusterIP: None

部署创建好的Prometheus的相关资源文件

cd /data/manual-deploy/prometheus ls  prometheus-configmap.yaml # Configmap prometheus-data-pv.yaml # PVC prometheus-data-storageclass.yaml # SC prometheus-rbac.yaml # RBAC prometheus-service-statefulset.yaml # SVC prometheus-statefulset.yaml # Statefulset # 部署应用 kubectl apply -f .

验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态

`kubectl get pv
NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS     REASON   AGE
prometheus-lpv-0   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-1   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-2   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
kubectl -n kube-system get pvc 
NAME                           STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS     AGE
prometheus-data-prometheus-0   Bound    prometheus-lpv-0   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-1   Bound    prometheus-lpv-2   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-2   Bound    prometheus-lpv-1   10Gi       RWO            prometheus-lpv   2m16s

kubectl -n kube-system get pod prometheus-{0..2}
NAME           READY   STATUS    RESTARTS   AGE
prometheus-0   2/2     Running   0          3m16s
prometheus-1   2/2     Running   0          3m16s
prometheus-2   2/2     Running   0          3m16s

`

部署 Node Exporter

创建Demonset的node-exporter文件

`cd /data/manual-deploy/node-exporter/
cat node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: quay.io/prometheus/node-exporter:v1.0.0
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true


apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9100
    protocol: TCP
  selector:
    k8s-app: node-exporter`

部署

cd /data/manual-deploy/node-exporter/ kubectl apply -f node-exporter.yaml

验证状态

`kubectl -n kube-system get pod |grep node-exporter
node-exporter-45s2q                    2/2     Running   0          6h43m
node-exporter-f4rrw                    2/2     Running   0          6h43m
node-exporter-hvtzj                    2/2     Running   0          6h43m
node-exporter-nlvfq                    2/2     Running   0          6h43m
node-exporter-qbd2q                    2/2     Running   0          6h43m
node-exporter-zjrh4                    2/2     Running   0          6h43m

`

部署 kube-state-metrics

kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。

kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest

`cd /data/manual-deploy/kube-state-metrics/
cat kube-state-metrics-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["apps"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]


apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system


apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - ingresses
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
- apiGroups: ["policy"]
  resources:
  - poddisruptionbudgets
  verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
  resources:
  - certificatesigningrequests
  verbs: ["list", "watch"]


apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system


apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
`

创建kube-state-metrics的deployment文件

`cat kube-state-metrics-deloyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.6.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: k8s.gcr.io/addon-resizer:1.8.4
        resources:
          limits:
            cpu: 150m
            memory: 50Mi
          requests:
            cpu: 150m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - /pod_nanny
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics


apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics
`

部署

kubectl apply -f kube-state-metrics-rbac.yaml kubectl apply -f kube-state-metrics-deloyment.yaml

验证

kubectl -n kube-system get pod |grep kube-state-metrics kube-state-metrics-657d8d6669-bqbs8        2/2     Running   0          4h

kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现

kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。

部署 Alertmanager 集群

创建目录、赋权

k8s-m2 mkdir /data/alertmanager chown -R 65534:65534 /data/alertmanager k8s-m3 mkdir /data/alertmanager chown -R 65534:65534 /data/alertmanager

cd /data/manual-deploy/alertmanager/ cat alertmanager-data-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata:   name: alertmanager-lpv provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer

创建Alertmanager的pv配置文件

`cat alertmanager-data-pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m2


apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-1
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m3
`

创建Alertmanager的configmap配置文件

`cat alertmanager-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: 'yo@qq.com'
      smtp_auth_username: '3452@qq.com'
      smtp_auth_password: 'bhgb'
      smtp_hello: '警报邮件'
      smtp_require_tls: false
    route:
      group_by: ['alertname', 'cluster']
      group_wait: 30s
      group_interval: 30s
      repeat_interval: 12h
      receiver: default

      routes:
      - receiver: email
        group_wait: 10s
        match:
          team: ops
    receivers:
    - name: 'default'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true
    - name: 'email'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true
`

创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。

cat alertmanager-statefulset-cluster.yaml  apiVersion: apps/v1 kind: StatefulSet metadata:   name: alertmanager   namespace: kube-system   labels:     k8s-app: alertmanager     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     version: v0.21.0 spec:   serviceName: "alertmanager-operated"   replicas: 2   selector:     matchLabels:       k8s-app: alertmanager       version: v0.21.0   template:     metadata:       labels:         k8s-app: alertmanager         version: v0.21.0       annotations:         scheduler.alpha.kubernetes.io/critical-pod: ''     spec:       tolerations:         - key: "CriticalAddonsOnly"           operator: "Exists"         - effect: NoSchedule           key: node-role.kubernetes.io/master       affinity:         podAntiAffinity:           requiredDuringSchedulingIgnoredDuringExecution:           - labelSelector:               matchExpressions:               - key: k8s-app                 operator: In                 values:                 - alertmanager             topologyKey: "kubernetes.io/hostname"       containers:         - name: prometheus-alertmanager           image: "prom/alertmanager:v0.21.0"           imagePullPolicy: "IfNotPresent"           args:             - "--config.file=/etc/config/alertmanager.yml"             - "--storage.path=/data"             - "--cluster.listen-address=${POD_IP}:9094"             - "--web.listen-address=:9093"             - "--cluster.peer=alertmanager-0.alertmanager-operated:9094"             - "--cluster.peer=alertmanager-1.alertmanager-operated:9094"           env:             - name: NODE_NAME               valueFrom:                 fieldRef:                   fieldPath: spec.nodeName             - name: POD_IP               valueFrom:                 fieldRef:                   fieldPath: status.podIP             - name: POD_NAME               valueFrom:                 fieldRef:                   fieldPath: metadata.name           ports:             - containerPort: 9093               name: web               protocol: TCP             - containerPort: 9094               name: mesh-tcp               protocol: TCP             - containerPort: 9094               name: mesh-udp               protocol: UDP           readinessProbe:             httpGet:               path: /#/status               port: 9093             initialDelaySeconds: 30             timeoutSeconds: 60           volumeMounts:             - name: config-volume               mountPath: /etc/config             - name: storage-volume               mountPath: "/data"               subPath: ""           resources:             limits:               cpu: 1000m               memory: 500Mi             requests:               cpu: 10m               memory: 50Mi         - name: prometheus-alertmanager-configmap-reload           image: "jimmidyson/configmap-reload:v0.4.0"           imagePullPolicy: "IfNotPresent"           args:             - --volume-dir=/etc/config             - --webhook-url=http://localhost:9093/-/reload           volumeMounts:             - name: config-volume               mountPath: /etc/config               readOnly: true           resources:             limits:               cpu: 10m               memory: 10Mi             requests:               cpu: 10m               memory: 10Mi           securityContext:               runAsUser: 0               privileged: true       volumes:         - name: config-volume           configMap:             name: alertmanager-config   volumeClaimTemplates:     - metadata:         name: storage-volume       spec:         accessModes: [ "ReadWriteOnce" ]         storageClassName: "alertmanager-lpv"         resources:           requests:             storage: 5Gi

创建Alertmanager的operated-service配置文件

cat alertmanager-operated-service.yaml apiVersion: v1 kind: Service metadata:   name: alertmanager-operated   namespace: kube-system   labels:     app.kubernetes.io/name: alertmanager-operated     app.kubernetes.io/component: alertmanager spec:   type: ClusterIP   clusterIP: None   sessionAffinity: None   selector:     k8s-app: alertmanager   ports:     - name: web       port: 9093       protocol: TCP       targetPort: web     - name: tcp-mesh       port: 9094       protocol: TCP       targetPort: tcp-mesh     - name: udp-mesh       port: 9094       protocol: UDP       targetPort: udp-mesh

部署

cd /data/manual-deploy/alertmanager/ ls alertmanager-configmap.yaml alertmanager-data-pv.yaml alertmanager-data-storageclass.yaml alertmanager-operated-service.yaml alertmanager-service-statefulset.yaml alertmanager-statefulset-cluster.yaml kubectl apply -f .

OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。

Prometheus监控神器

本文分享自微信公众号 - Kubernetes技术栈(k8stech)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Wesley13 Wesley13
3年前
Java获得今日零时零分零秒的时间(Date型)
publicDatezeroTime()throwsParseException{    DatetimenewDate();    SimpleDateFormatsimpnewSimpleDateFormat("yyyyMMdd00:00:00");    SimpleDateFormatsimp2newS
Stella981 Stella981
3年前
KVM调整cpu和内存
一.修改kvm虚拟机的配置1、virsheditcentos7找到“memory”和“vcpu”标签,将<namecentos7</name<uuid2220a6d1a36a4fbb8523e078b3dfe795</uuid
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
3年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
9个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这