1) the preparing before install Prometheus Operator
Đầu tiên thì bạn cần cài đặt trước các CustomResourceDefinition
REPO URL: https://prometheus-community.github.io/helm-charts
CHART: prometheus-operator-crds:2.0.0
We need to press the sync button
I am sure that you also have a problem with prometheuses.monitoring.coreos.com
This message: “CustomResourceDefinition.apiextensions.k8s.io “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes“
Don’t be worried. you only sync again with replace way.
và tất cả các CustomResourceDefinition đã được apply và green color!
2) Install Prometheus Operator
Hiện giờ họ sẽ không provide chart Prometheus Operator và chúng ta phải sử dụng chart.
kube-prometheus-stack.
Nếu Alertmanager không thể start thì có thể là do bạn chưa xóa sạch component nào đó có liên quan để prometheus on other namespace.
Error (ts=2023-03-31T16:29:11.263Z caller=main.go:240 level=info msg="Starting Alertmanager" version="(version=0.25.0, branch=HEAD, revision=258fab7cdd551f2cf251ed0348f0ad7289aee789)" ts=2023-03-31T16:29:11.263Z caller=main.go:241 level=info build_context="(go=go1.19.4, user=root@abe866dd5717, date=20221222-14:51:36)" ts=2023-03-31T16:29:11.313Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-03-31T16:29:11.313Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="open /etc/alertmanager/config_out/alertmanager.env.yaml: no such file or directory" )
MountVolume.SetUp failed for volume “tls-secret” : secret “prometheus-kube-prometheus-admission” not found
https://github.com/prometheus-community/helm-charts/issues/1438
prometheusOperator: enabled: true admissionWebhooks: enabled: false certManager: enabled: true
Nếu bạn sử dụng mix giữa node windows và linux trong k8s thì có thể sử dụng value sau:
prometheus: prometheusSpec: nodeSelector: kubernetes.io/os: linux alertmanager: alertmanagerSpec: nodeSelector: kubernetes.io/os: linux prometheusOperator: nodeSelector: kubernetes.io/os: linux enabled: true admissionWebhooks: patch: nodeSelector: kubernetes.io/os: linux enabled: false certManager: enabled: true thanosRuler: thanosRulerSpec: nodeSelector: kubernetes.io/os: linux
Nhưng mà bạn vẫn sẽ phải thêm nodeSelector nếu helm chart không đủ configuration.
nodeSelector:
kubernetes.io/os: linux
3) Only Install Prometheus(single) for the special purposes.
Ở phần này mình chỉ muốn cài mình prometheus cho những mục đích đặt biệt.
Sau đây là file application of argocd.
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: prometheus-nimtechnology-staging namespace: argocd spec: destination: namespace: coralogix name: 'arn:aws:eks:us-west-2:XXXXXXXXX:cluster/staging-nimtechnology-engines' project: meta-structure source: repoURL: https://prometheus-community.github.io/helm-charts targetRevision: "23.2.0" chart: prometheus helm: values: | prometheus-node-exporter: enabled: false prometheus-pushgateway: enabled: false server: global: external_labels: cluster_name: staging-nimtechnology-engines retention: "1d" remoteWrite: - url: https://ingress.coralogix.us/prometheus/v1 name: 'staging-nimtechnology-engines' remote_timeout: 120s bearer_token: 'cxtp_XXXXXXXXXXXXXXXXXXX'
external_labels: is a configuration in Prometheus used to provide labels that are unique to the Prometheus instance. These labels are added to every time series that is collected by this Prometheus instance, as well as to alerts sent by the Alertmanager.
==> Mục đích là mình muốn nhận biết metrics này là của con prometheus nào
remoteWrite
is a feature in Prometheus that allows you to send the time series data that Prometheus collects to a remote endpoint. This can be used to integrate Prometheus with other monitoring systems or to send data to a long-term storage solution.
url
: The endpoint to which the data is written.name
: An optional identifier for the remote write target.remote_timeout
: The timeout for each write request to the remote endpoint.bearer_token
: A token used for authentication with the remote endpoint.
Everything OK
Secure “bearer_token” in remoteWrite
Nếu bạn sài template thì không thể push token lên github được
Vậy chúng ta sẽ sài: bearer_token_file
Bạn sẽ tạo 1 file secret:
apiVersion: v1 data: bearer-token-coralogix.txt: Y3h0cF9oVHU1RDBFdXRuXXXXXXXjU3SVBnMnU= kind: Secret metadata: name: prom-secret-files namespace: coralogix
rồi bạn sài extraSecretMounts
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: prometheus-nimtechnology-staging namespace: argocd spec: destination: namespace: coralogix name: 'arn:aws:eks:us-west-2:XXXXXXXXX:cluster/staging-nimtechnology-engines' project: meta-structure source: repoURL: https://prometheus-community.github.io/helm-charts targetRevision: "23.2.0" chart: prometheus helm: values: | prometheus-node-exporter: enabled: false prometheus-pushgateway: enabled: false server: global: external_labels: cluster_name: staging-nimtechnology-engines retention: "1d" remoteWrite: - url: https://ingress.coralogix.us/prometheus/v1 name: 'staging-nimtechnology-engines' remote_timeout: 120s bearer_token_file: /etc/secrets/bearer-token-coralogix.txt extraSecretMounts: - name: bearer-token-coralogix mountPath: /etc/secrets subPath: "" secretName: prom-secret-files readOnly: true
Add extra Scape configs
Nếu bạn add thên scrape_configs thì bạn thêm server.serverextraScrapeConfigs
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: prometheus-nimtechnology-staging namespace: argocd spec: destination: namespace: coralogix name: 'arn:aws:eks:us-west-2:XXXXXXXXX:cluster/staging-nimtechnology-engines' project: meta-structure source: repoURL: https://prometheus-community.github.io/helm-charts targetRevision: "24.0.0" chart: prometheus helm: values: | prometheus-node-exporter: enabled: false prometheus-pushgateway: enabled: false server: ##... # adds additional scrape configs to prometheus.yml # must be a string so you have to add a | after extraScrapeConfigs: extraScrapeConfigs: | - job_name: jmx-msk scrape_interval: 30s static_configs: - targets: - b-2.c1.kafka.us-west-2.amazonaws.com:11001