1) Overview of the situation.
Tình huống là như sau:
Bạn có 1 đoạn log trên hệ thông như sau:
time="2023-12-12T17:32:44Z" level=info msg="getRepoObjs stats" application=argocd/longhorn build_options_ms=0 helm_ms=14 plugins_ms=0 repo_ms=13 time_ms=126 unmarshal_ms=97 version_ms=0
Manager talk that:
Em dùng kĩ năng của một devops vẽ cho anh 1 chart time_ms của 1 application và em có thể lấy thông tin log trên.
Sau khi dùng hết kĩ năng để search thì tôi tìm ra 2 ứng cứ viên sáng giá là Logstash và Vector
Logstash thì thuộc họ nhà ELK một tool để control log rất nổi tiếng.
thanh niên này khá là nhanh, tuổi đời lâu, nên document cũng đầy đủ.
Có 1 điều mình không thích ở thanh niên này là nó ăn nhiều RAM.
Quay qua Vector:
Một ứng cứ viên tiềm năng, được việt bằng ngôn ngữ rust
Maybe It will be light and fast.
Ok giờ thì architecture như sau:
1) Opentelemetry sends the logs to Vector.
Để send logs từ otel-collector sang Vector thì mình có tham khảo 2 tài liệu.
https://www.techetio.com/2023/04/29/sending-opentelemetry-logs-to-vector-using-python/
https://intelops.ai/blog/connecting-the-opentelemetry-collector-to-vecor/
mình sẽ export logs sang cho vector thông qua otlphttp
The OTLP/HTTP exporter in the OpenTelemetry Collector is designed to send metrics, traces, and logs through HTTP using the OTLP format. This exporter supports traces
, metrics
, and logs
pipeline types. To include it in your configuration, you need to specify certain settings:
https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlphttpexporter
.... exporters: otlphttp: endpoint: "http://vector-headless.default:4318" .... service: pipelines: logs: exporters: - coralogix - otlphttp #look at processors: - k8sattributes - attributes/insert - filter/regexp_resource - batch receivers: - otlp - filelog
2) How to configure Vector
2.1) Install vector on Kubernetes
REPO URL: https://helm.vector.dev
CHART: vector:0.29.0
Bạn có thể tùy chọn thay đôi version nếu muốn.
2.2) Vector receives, process logs from the otel-collector
2.1.1) Source:
Chỗ này chúng ta sẽ nhận log từ opentelemetry:
https://vector.dev/docs/reference/configuration/sources/opentelemetry/
customConfig: api: enabled: true address: 127.0.0.1:8686 playground: true sources: otel_collector: type: opentelemetry grpc: address: '0.0.0.0:4317' http: address: '0.0.0.0:4318' # Add your log source here if needed sinks: console_logs: type: console encoding: codec: text inputs: - "otel_collector.logs" service: ports: - name: otel-collector-http port: 4318 protocol: TCP - name: metrics port: 9598 protocol: TCP
Có 2 điểm chú ý: Dec 13th, 2023
The opentelemetry
source only supports log events at this time.
Received log events will go to this output stream. Use <component_id>.logs
as an input to downstream transforms and sinks.
Nghĩa là khi bạn khai báo input của trên another component such as sink or transform thì nó sẽ là <component_id>.logs
Khi này nếu config ok thì bạn show log của vector sẽ thấy nó stdout khá nhiều log.
2) Transforms:
2.1) parse_key_value funtion in vector4
Khi transform chúng ta sẽ dùng type là remap
The “remap” transform type in Vector is primarily used for parsing, shaping, and transforming observability data, such as logs and metrics, within your data processing topology. This transform utilizes the Vector Remap Language (VRL), a language designed for the safe and efficient processing of observability data. VRL is expression-oriented and aligns closely with the data models used in Vector.
Bạn sẽ thấy nội dung của message có dạng key và value.
Trong remap chúng ta sẽ sử dụng function parse_key_value
Parses the value
in key-value format. Also known as logfmt.
- Keys and values can be wrapped with
"
. "
characters can be escaped using\
.
customConfig: api: enabled: true address: 127.0.0.1:8686 playground: true sources: otel_collector: type: opentelemetry grpc: address: '0.0.0.0:4317' http: address: '0.0.0.0:4318' # Add your log source here if needed transforms: parse_logs: type: remap inputs: - "otel_collector.logs" source: | . = parse_key_value!(string!(.message))
2.2) change key
Phần này mình có thể giải thích thêm
không phải key nào cũng có định dang dễ chịu như time-ms hoặc time_ms
mà nó sẽ là như thế này: grpc.time_ms
Nếu bạn cần phải convert từ grpc.time_ms sang grpc_time_ms thì chúng ta sẽ dùng: to_float
to_float: Coerces the value
into a float.
transforms: parse_logs: type: remap inputs: - "otel_collector.logs" source: | . = parse_key_value!(string!(.message)) extract_metrics: type: remap inputs: - parse_logs source: | grpc_time_ms, err_convert = to_float(.grpc.time_ms) if err_convert != null { log("Unable to convert To Float: " + err_convert, level: "error") } else { .grpc_time_ms = grpc_time_ms log(".grpc.time_ms value: " + to_string(grpc_time_ms), level: "info") }
2.3) Convert log to metrics by Vector
Đây là phần mà mình muốn nhất:
transforms: parse_logs: type: remap inputs: - "otel_collector.logs" source: | . = parse_key_value!(string!(.message)) extract_metrics: type: remap inputs: - parse_logs source: | grpc_time_ms, err_convert = to_float(.grpc.time_ms) if err_convert != null { log("Unable to convert To Float: " + err_convert, level: "error") } else { .grpc_time_ms = grpc_time_ms log(".grpc.time_ms value: " + to_string(grpc_time_ms), level: "info") } convert_to_metrics: type: log_to_metric inputs: - extract_metrics metrics: - type: gauge field: time_ms name: response_time_ms namespace: argocd tags: application: '{{ printf "{{ application }}" }}'
log_to_metric: Convert log events to metric events
trong phần example của vector thì bạn sẽ thấy
Nhưng vì dùng trong helm value nên là:
tags:
application: '{{ printf "{{ application }}" }}'
Dưới đây là các code sưu tầm.
2.4) (Optional)parse syslog by Vector
the configuration using the remap
transform:
- Parse the JSON Log: Use the
remap
transform to parse the JSON log. - Extract the
time_ms
Value: Extract thetime_ms
value from the nestedmessage
field. - Transform to Metric: Convert the extracted
time_ms
value into a metric.
Here is the revised YAML configuration:
customConfig: api: enabled: true address: 127.0.0.1:8686 playground: true sources: otel_collector: type: opentelemetry grpc: address: '0.0.0.0:4317' http: address: '0.0.0.0:4318' # Add your log source here if needed transforms: remap_argocd: type: remap inputs: - "otel_collector.logs" source: | parsed, err = parse_syslog(.message) if err != null { log_msg, err = if .message != null && is_string(.message) { "Unable to parse SysLog: " + .message } else { "Unable to parse SysLog: message field is missing or not a string" } if err != null { log("Error constructing log message: " + err, level: "error") } else { log(log_msg, level: "error") } } else { . = parsed } sinks: console_sink: type: console encoding: codec: text inputs: - remap_argocd service: ports: - name: otel-collector-http port: 4318 protocol: TCP
The parse_and_extract_time_ms
transform in the provided YAML configuration is a Vector transform that uses the remap
language. This transform is designed to parse a JSON log message and then extract a specific value (time_ms
) from it. Here’s a breakdown of each part of the transform:
# Parsing JSON and extracting time_ms
parse_and_extract_time_ms:
type: remap
inputs: ["my_source_id"]
source: |
. = parse_json!(string!(.message))
.time_ms = to_float!(.log_message.time_ms)
- Type:
type: remap
chỉ định rằng transform náy sử dụng Vector’sremap
language, mà nó là 1 powerful tool for việc sử lý log và metric data.
- Inputs:
inputs: ["my_source_id"]
specifies the input to this transform, which in this case ismy_source_id
. This should be the ID of a source or another transform that precedes this one in your Vector configuration.
- Source:
- The
source
field contains the actual remap script:. = parse_json!(string!(.message))
:string!(.message)
: phần này converts themessage
field of the log trong một string. cái dấu"!"
chỉ định rằng đây là 1 quả quyết – if the conversion fails, an error will be logged, and the event will be dropped.parse_json!(...)
: This part attempts(nỗ lực) to parse the stringifiedmessage
as JSON. Again, the!
asserts(khẳng định) that this must succeed, or an error will occur, and the event will be dropped.. = ...
: This sets the root object (.
) in the remap context to the result of theparse_json!
function. This means the entire log event is now replaced with the parsed JSON object.
.time_ms = to_float!(.log_message.time_ms)
:- This line extracts the
time_ms
value from the parsed JSON object. It assumes that after parsing the JSON, there is a fieldlog_message
which containstime_ms
. to_float!(...)
: This converts thetime_ms
value to a floating-point number. The!
asserts that this conversion must succeed..time_ms = ...
: This sets a new fieldtime_ms
at the root of the event with the converted floating-point number.
- This line extracts the
- The
2.5) (Optional)parse grok by Vector (same as logstash)
customConfig: api: enabled: true address: 127.0.0.1:8686 playground: true sources: otel_collector: type: opentelemetry grpc: address: '0.0.0.0:4317' http: address: '0.0.0.0:4318' # Add your log source here if needed transforms: remap_argocd: type: remap inputs: - "otel_collector.logs" source: | parsed, err_parsed = parse_grok( .message, "time=\"%{TIMESTAMP_ISO8601:timestamp}\" level=%{LOGLEVEL:level} msg=\"%{GREEDYDATA:msg}\" application=%{GREEDYDATA:application} build_options_ms=%{NUMBER:build_options_ms:int} helm_ms=%{NUMBER:helm_ms:int} plugins_ms=%{NUMBER:plugins_ms:int} repo_ms=%{NUMBER:repo_ms:int} time_ms=%{NUMBER:time_ms:int} unmarshal_ms=%{NUMBER:unmarshal_ms:int} version_ms=%{NUMBER:version_ms:int}" ) if err_parsed != null { log_msg, err_log_msg = "Unable to parse Grok: " + .message if err_log_msg != null { log(err_log_msg, level: "error") } log(log_msg, level: "error") } else { . = parsed } log2metric_argocd: type: log_to_metric inputs: - remap_argocd metrics: - type: gauge field: time_ms name: response_time_ms namespace: service tags: application: '{{ printf "{{ application }}" }}' sinks: console_sink: type: console encoding: codec: text inputs: - log_to_metric prometheus_exporter: type: prometheus_exporter inputs: - log2metric_argocd address: "0.0.0.0:1994" service: ports: - name: otel-collector-http port: 4318 protocol: TCP - name: metrics port: 1994 protocol: TCP
3) Sinks
Sinks trong vector là các bạn public kết quả của transfroms hay sources ra prometheus, stdout (console).
3.1) output result to console.
chúng ta sẽ thường xuyền output ra ngoài console (stdout) để sẽ kết quả
sinks: console_metrics: type: console encoding: codec: text inputs: - convert_to_metrics
3.2) Publish metrics via prometheus client
Lúc này bạn publish metrics thông qua metrics page http://xxx.xxx:metrics thì promeheus server mới scapes metrics được
convert_to_metrics: type: log_to_metric inputs: - extract_metrics metrics: - type: gauge field: time_ms name: response_time_ms namespace: argocd tags: application: '{{ printf "{{ application }}" }}' sinks: prometheus_exporter: type: prometheus_exporter inputs: - convert_to_metrics address: "0.0.0.0:9598"
giờ bạn curl vào localhost:9598/metrics
openvscode-server@openvscode-server-65d78d546b-n2tv2:~$ curl http://vector-headless.default:9598/metrics # HELP argocd_response_time_ms response_time_ms # TYPE argocd_response_time_ms gauge argocd_response_time_ms{application="argocd/kafka-strimzi"} 1367 1702489126145 argocd_response_time_ms{application="argocd/opentelemetry-collector"} 81 1702489125144 argocd_response_time_ms{application="argocd/argocd-image-updater"} 271 1702489125343 argocd_response_time_ms{application="argocd/ingress-nginx"} 277 1702489125343 argocd_response_time_ms{application="argocd/cilium"} 473 1702489125546 argocd_response_time_ms{application="argocd/harbor"} 463 1702489125546 argocd_response_time_ms{application="argocd/vector"} 188 1702489125144 argocd_response_time_ms{application="argocd/backstage"} 226 1702489125145 argocd_response_time_ms{application="argocd/rancher"} 190 1702489125144