在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。
本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求进行选择,后面会详细讲解针对数据持久化的具体细节。
部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。
如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos
自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。
环境
我的本地环境使用的 sealos
一键部署,主要是为了便于测试。
OS
Kubernetes
HostName
IP
Service
Ubuntu 18.04
1.17.7
sealos-k8s-m1
192.168.1.151
node-exporter
prometheus-federate-0
Ubuntu 18.04
1.17.7
sealos-k8s-m2
192.168.1.152
node-exporter
grafana
alertmanager-0
Ubuntu 18.04
1.17.7
sealos-k8s-m3
192.168.1.150
node-exporter
alertmanager-1
Ubuntu 18.04
1.17.7
sealos-k8s-node1
192.168.1.153
node-exporter
prometheus-0
kube-state-metrics
Ubuntu 18.04
1.17.7
sealos-k8s-node2
192.168.1.154
node-exporter
prometheus-1
Ubuntu 18.04
1.17.7
sealos-k8s-node2
192.168.1.155
node-exporter
prometheus-2
`# 给master跟node加标签
# prometheus
kubectl label node sealos-k8s-node1 k8s-app=prometheus
kubectl label node sealos-k8s-node2 k8s-app=prometheus
kubectl label node sealos-k8s-node3 k8s-app=prometheus
# federate
kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
# alertmanager
kubectl label node sealos-k8s-m2 k8s-app=alertmanager
kubectl label node sealos-k8s-m3 k8s-app=alertmanager
#创建对应的部署目录
mkdir /data/manual-deploy/ && cd /data/manual-deploy/
mkdir alertmanager grafana ingress-nginx kube-state-metrics node-exporter prometheus
`
部署 Prometheus
创建Prometheus的storageclass配置文件
cat prometheus-data-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: prometheus-lpv provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer
创建Prometheus的sc的pv配置文件,同时指定了调度节点。
`# 在需要调度的Prometheus的node上创建目录与赋权
mkdir /data/prometheus
chown -R 65534:65534 /data/prometheus
cat prometheus-federate-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-0
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node1
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-1
spec:
capacity:
storage: 20Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-2
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node3
`
创建Prometheus的RBAC文件。
`cat prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1 # api的version
kind: ClusterRole # 类型
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: # 资源
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus # 自定义名字
namespace: kube-system # 命名空间
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef: # 选择需要绑定的Role
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects: # 对象
- kind: ServiceAccount
name: prometheus
namespace: kube-system
`
创建Prometheus的configmap配置文件。
cat prometheus-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system data: prometheus.yml: | global: scrape_interval: 30s evaluation_interval: 30s external_labels: cluster: "01" scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: 'kubernetes-cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor metric_relabel_configs: - action: replace source_labels: [id] regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$' target_label: rkt_container_name replacement: '${2}-${1}' - action: replace source_labels: [id] regex: '^/system\.slice/(.+)\.service$' target_label: systemd_service_name replacement: '${1}' - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - source_labels: [__address__] action: replace target_label: instance regex: (.+):(.+) replacement: $1
创建Prometheus的Statefulset配置文件。
cat prometheus-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: prometheus namespace: kube-system labels: k8s-app: prometheus kubernetes.io/cluster-service: "true" spec: serviceName: "prometheus" podManagementPolicy: "Parallel" replicas: 3 selector: matchLabels: k8s-app: prometheus template: metadata: labels: k8s-app: prometheus annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: k8s-app operator: In values: - prometheus topologyKey: "kubernetes.io/hostname" priorityClassName: system-cluster-critical hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: prometheus-server-configmap-reload image: "jimmidyson/configmap-reload:v0.4.0" imagePullPolicy: "IfNotPresent" args: - --volume-dir=/etc/config - --webhook-url=http://localhost:9090/-/reload volumeMounts: - name: config-volume mountPath: /etc/config readOnly: true resources: limits: cpu: 10m memory: 10Mi requests: cpu: 10m memory: 10Mi - image: prom/prometheus:v2.20.0 imagePullPolicy: IfNotPresent name: prometheus command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=24h" - "--web.console.libraries=/etc/prometheus/console_libraries" - "--web.console.templates=/etc/prometheus/consoles" - "--web.enable-lifecycle" ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: "/prometheus" name: prometheus-data - mountPath: "/etc/prometheus" name: config-volume readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 livenessProbe: httpGet: path: /-/healthy port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 resources: requests: cpu: 100m memory: 100Mi limits: cpu: 1000m memory: 2500Mi securityContext: runAsUser: 65534 privileged: true serviceAccountName: prometheus volumes: - name: config-volume configMap: name: prometheus-config volumeClaimTemplates: - metadata: name: prometheus-data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "prometheus-lpv" resources: requests: storage: 5Gi
创建Prometheus的svc配置文件
cat prometheus-service-statefulset.yaml apiVersion: v1 kind: Service metadata: name: prometheus namespace: kube-system spec: ports: - name: prometheus port: 9090 targetPort: 9090 selector: k8s-app: prometheus clusterIP: None
部署创建好的Prometheus的相关资源文件
cd /data/manual-deploy/prometheus ls prometheus-configmap.yaml # Configmap prometheus-data-pv.yaml # PVC prometheus-data-storageclass.yaml # SC prometheus-rbac.yaml # RBAC prometheus-service-statefulset.yaml # SVC prometheus-statefulset.yaml # Statefulset # 部署应用 kubectl apply -f .
验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态
`kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
prometheus-lpv-0 10Gi RWO Retain Available prometheus-lpv 6m28s
prometheus-lpv-1 10Gi RWO Retain Available prometheus-lpv 6m28s
prometheus-lpv-2 10Gi RWO Retain Available prometheus-lpv 6m28s
kubectl -n kube-system get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-data-prometheus-0 Bound prometheus-lpv-0 10Gi RWO prometheus-lpv 2m16s
prometheus-data-prometheus-1 Bound prometheus-lpv-2 10Gi RWO prometheus-lpv 2m16s
prometheus-data-prometheus-2 Bound prometheus-lpv-1 10Gi RWO prometheus-lpv 2m16s
kubectl -n kube-system get pod prometheus-{0..2}
NAME READY STATUS RESTARTS AGE
prometheus-0 2/2 Running 0 3m16s
prometheus-1 2/2 Running 0 3m16s
prometheus-2 2/2 Running 0 3m16s
`
部署 Node Exporter
创建Demonset的node-exporter文件
`cd /data/manual-deploy/node-exporter/
cat node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- image: quay.io/prometheus/node-exporter:v1.0.0
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true
hostPID: true
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
k8s-app: node-exporter
name: node-exporter
namespace: kube-system
spec:
ports:
- name: http
port: 9100
protocol: TCP
selector:
k8s-app: node-exporter`
部署
cd /data/manual-deploy/node-exporter/ kubectl apply -f node-exporter.yaml
验证状态
`kubectl -n kube-system get pod |grep node-exporter
node-exporter-45s2q 2/2 Running 0 6h43m
node-exporter-f4rrw 2/2 Running 0 6h43m
node-exporter-hvtzj 2/2 Running 0 6h43m
node-exporter-nlvfq 2/2 Running 0 6h43m
node-exporter-qbd2q 2/2 Running 0 6h43m
node-exporter-zjrh4 2/2 Running 0 6h43m
`
部署 kube-state-metrics
kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。
kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest
。
`cd /data/manual-deploy/kube-state-metrics/
cat kube-state-metrics-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: kube-system
name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["apps"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
- apiGroups: ["extensions"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
- apiGroups: ["policy"]
resources:
- poddisruptionbudgets
verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
resources:
- certificatesigningrequests
verbs: ["list", "watch"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
`
创建kube-state-metrics的deployment文件
`cat kube-state-metrics-deloyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.6.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.8.4
resources:
limits:
cpu: 150m
memory: 50Mi
requests:
cpu: 150m
memory: 50Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
k8s-app: kube-state-metrics
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
`
部署
kubectl apply -f kube-state-metrics-rbac.yaml kubectl apply -f kube-state-metrics-deloyment.yaml
验证
kubectl -n kube-system get pod |grep kube-state-metrics kube-state-metrics-657d8d6669-bqbs8 2/2 Running 0 4h
kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现
kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。
部署 Alertmanager 集群
创建目录、赋权
k8s-m2 mkdir /data/alertmanager chown -R 65534:65534 /data/alertmanager k8s-m3 mkdir /data/alertmanager chown -R 65534:65534 /data/alertmanager
cd /data/manual-deploy/alertmanager/ cat alertmanager-data-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: alertmanager-lpv provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer
创建Alertmanager的pv配置文件
`cat alertmanager-data-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv-0
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: alertmanager-lpv
local:
path: /data/alertmanager
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-m2
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv-1
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: alertmanager-lpv
local:
path: /data/alertmanager
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-m3
`
创建Alertmanager的configmap配置文件
`cat alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: 'yo@qq.com'
smtp_auth_username: '3452@qq.com'
smtp_auth_password: 'bhgb'
smtp_hello: '警报邮件'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 30s
repeat_interval: 12h
receiver: default
routes:
- receiver: email
group_wait: 10s
match:
team: ops
receivers:
- name: 'default'
email_configs:
- to: '9935226@qq.com'
send_resolved: true
- name: 'email'
email_configs:
- to: '9935226@qq.com'
send_resolved: true
`
创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。
cat alertmanager-statefulset-cluster.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: alertmanager namespace: kube-system labels: k8s-app: alertmanager kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.21.0 spec: serviceName: "alertmanager-operated" replicas: 2 selector: matchLabels: k8s-app: alertmanager version: v0.21.0 template: metadata: labels: k8s-app: alertmanager version: v0.21.0 annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: tolerations: - key: "CriticalAddonsOnly" operator: "Exists" - effect: NoSchedule key: node-role.kubernetes.io/master affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: k8s-app operator: In values: - alertmanager topologyKey: "kubernetes.io/hostname" containers: - name: prometheus-alertmanager image: "prom/alertmanager:v0.21.0" imagePullPolicy: "IfNotPresent" args: - "--config.file=/etc/config/alertmanager.yml" - "--storage.path=/data" - "--cluster.listen-address=${POD_IP}:9094" - "--web.listen-address=:9093" - "--cluster.peer=alertmanager-0.alertmanager-operated:9094" - "--cluster.peer=alertmanager-1.alertmanager-operated:9094" env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name ports: - containerPort: 9093 name: web protocol: TCP - containerPort: 9094 name: mesh-tcp protocol: TCP - containerPort: 9094 name: mesh-udp protocol: UDP readinessProbe: httpGet: path: /#/status port: 9093 initialDelaySeconds: 30 timeoutSeconds: 60 volumeMounts: - name: config-volume mountPath: /etc/config - name: storage-volume mountPath: "/data" subPath: "" resources: limits: cpu: 1000m memory: 500Mi requests: cpu: 10m memory: 50Mi - name: prometheus-alertmanager-configmap-reload image: "jimmidyson/configmap-reload:v0.4.0" imagePullPolicy: "IfNotPresent" args: - --volume-dir=/etc/config - --webhook-url=http://localhost:9093/-/reload volumeMounts: - name: config-volume mountPath: /etc/config readOnly: true resources: limits: cpu: 10m memory: 10Mi requests: cpu: 10m memory: 10Mi securityContext: runAsUser: 0 privileged: true volumes: - name: config-volume configMap: name: alertmanager-config volumeClaimTemplates: - metadata: name: storage-volume spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "alertmanager-lpv" resources: requests: storage: 5Gi
创建Alertmanager的operated-service配置文件
cat alertmanager-operated-service.yaml apiVersion: v1 kind: Service metadata: name: alertmanager-operated namespace: kube-system labels: app.kubernetes.io/name: alertmanager-operated app.kubernetes.io/component: alertmanager spec: type: ClusterIP clusterIP: None sessionAffinity: None selector: k8s-app: alertmanager ports: - name: web port: 9093 protocol: TCP targetPort: web - name: tcp-mesh port: 9094 protocol: TCP targetPort: tcp-mesh - name: udp-mesh port: 9094 protocol: UDP targetPort: udp-mesh
部署
cd /data/manual-deploy/alertmanager/ ls alertmanager-configmap.yaml alertmanager-data-pv.yaml alertmanager-data-storageclass.yaml alertmanager-operated-service.yaml alertmanager-service-statefulset.yaml alertmanager-statefulset-cluster.yaml kubectl apply -f .
OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。
本文分享自微信公众号 - Kubernetes技术栈(k8stech)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。