[Vector by DataDog] Use Vector to parse and convert logs to anything.

Contents

1) Overview of the situation.

Tình huống là như sau:
Bạn có 1 đoạn log trên hệ thông như sau:

time="2023-12-12T17:32:44Z" level=info msg="getRepoObjs stats" application=argocd/longhorn build_options_ms=0 helm_ms=14 plugins_ms=0 repo_ms=13 time_ms=126 unmarshal_ms=97 version_ms=0

Manager talk that:

Em dùng kĩ năng của một devops vẽ cho anh 1 chart time_ms của 1 application và em có thể lấy thông tin log trên.

Sau khi dùng hết kĩ năng để search thì tôi tìm ra 2 ứng cứ viên sáng giá là Logstash và Vector

Logstash thì thuộc họ nhà ELK một tool để control log rất nổi tiếng.
thanh niên này khá là nhanh, tuổi đời lâu, nên document cũng đầy đủ.
Có 1 điều mình không thích ở thanh niên này là nó ăn nhiều RAM.

Quay qua Vector:

Một ứng cứ viên tiềm năng, được việt bằng ngôn ngữ rust
Maybe It will be light and fast.

Ok giờ thì architecture như sau:

1) Opentelemetry sends the logs to Vector.

Để send logs từ otel-collector sang Vector thì mình có tham khảo 2 tài liệu.
https://www.techetio.com/2023/04/29/sending-opentelemetry-logs-to-vector-using-python/
https://intelops.ai/blog/connecting-the-opentelemetry-collector-to-vecor/

mình sẽ export logs sang cho vector thông qua otlphttp

The OTLP/HTTP exporter in the OpenTelemetry Collector is designed to send metrics, traces, and logs through HTTP using the OTLP format. This exporter supports traces, metrics, and logs pipeline types. To include it in your configuration, you need to specify certain settings:
https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlphttpexporter

....
    exporters:
      otlphttp: 
        endpoint: "http://vector-headless.default:4318"
....
    service:
      pipelines:
        logs:
          exporters:
          - coralogix
          - otlphttp #look at
          processors:
          - k8sattributes
          - attributes/insert
          - filter/regexp_resource
          - batch
          receivers:
          - otlp
          - filelog

2) How to configure Vector

2.1) Install vector on Kubernetes

REPO URL: https://helm.vector.dev
CHART: vector:0.29.0

Bạn có thể tùy chọn thay đôi version nếu muốn.

2.2) Vector receives, process logs from the otel-collector

2.1.1) Source:

Chỗ này chúng ta sẽ nhận log từ opentelemetry:
https://vector.dev/docs/reference/configuration/sources/opentelemetry/

customConfig:
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: true
  sources:
    otel_collector:
      type: opentelemetry
      grpc:
        address: '0.0.0.0:4317'
      http:
        address: '0.0.0.0:4318'
    # Add your log source here if needed

  sinks:
    console_logs:
      type: console
      encoding:
        codec: text
      inputs:
        - "otel_collector.logs"

service:
  ports:
    - name: otel-collector-http
      port: 4318
      protocol: TCP
    - name: metrics
      port: 9598
      protocol: TCP

Có 2 điểm chú ý: Dec 13th, 2023
The opentelemetry source only supports log events at this time.
Received log events will go to this output stream. Use <component_id>.logs as an input to downstream transforms and sinks.

Nghĩa là khi bạn khai báo input của trên another component such as sink or transform thì nó sẽ là <component_id>.logs

Khi này nếu config ok thì bạn show log của vector sẽ thấy nó stdout khá nhiều log.

2) Transforms:

2.1) parse_key_value funtion in vector4

Khi transform chúng ta sẽ dùng type là remap

The “remap” transform type in Vector is primarily used for parsing, shaping, and transforming observability data, such as logs and metrics, within your data processing topology. This transform utilizes the Vector Remap Language (VRL), a language designed for the safe and efficient processing of observability data. VRL is expression-oriented and aligns closely with the data models used in Vector.

Bạn sẽ thấy nội dung của message có dạng key và value.

Trong remap chúng ta sẽ sử dụng function parse_key_value

Parses the value in key-value format. Also known as logfmt.

Keys and values can be wrapped with ".
" characters can be escaped using \.

customConfig:
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: true
  sources:
    otel_collector:
      type: opentelemetry
      grpc:
        address: '0.0.0.0:4317'
      http:
        address: '0.0.0.0:4318'
    # Add your log source here if needed

  transforms:
    parse_logs:
      type: remap
      inputs:
        - "otel_collector.logs"
      source: |
          . = parse_key_value!(string!(.message))

2.2) change key

Phần này mình có thể giải thích thêm
không phải key nào cũng có định dang dễ chịu như time-ms hoặc time_ms
mà nó sẽ là như thế này: grpc.time_ms

Nếu bạn cần phải convert từ grpc.time_ms sang grpc_time_ms thì chúng ta sẽ dùng: to_float
to_float: Coerces the value into a float.

  transforms:
    parse_logs:
      type: remap
      inputs:
        - "otel_collector.logs"
      source: |
          . = parse_key_value!(string!(.message))
    extract_metrics:
      type: remap
      inputs:
        - parse_logs
      source: |
          grpc_time_ms, err_convert = to_float(.grpc.time_ms)
          if err_convert != null {
            log("Unable to convert To Float: " + err_convert, level: "error")
          } else {
            .grpc_time_ms = grpc_time_ms
            log(".grpc.time_ms value: " + to_string(grpc_time_ms), level: "info")
          }

2.3) Convert log to metrics by Vector

Đây là phần mà mình muốn nhất:

  transforms:
    parse_logs:
      type: remap
      inputs:
        - "otel_collector.logs"
      source: |
          . = parse_key_value!(string!(.message))
    extract_metrics:
      type: remap
      inputs:
        - parse_logs
      source: |
          grpc_time_ms, err_convert = to_float(.grpc.time_ms)
          if err_convert != null {
            log("Unable to convert To Float: " + err_convert, level: "error")
          } else {
            .grpc_time_ms = grpc_time_ms
            log(".grpc.time_ms value: " + to_string(grpc_time_ms), level: "info")
          }

    convert_to_metrics:
      type: log_to_metric
      inputs:
        - extract_metrics
      metrics:
        - type: gauge
          field: time_ms
          name: response_time_ms
          namespace: argocd
          tags:
            application: '{{ printf "{{ application }}" }}'

log_to_metric: Convert log events to metric events

trong phần example của vector thì bạn sẽ thấy

Nhưng vì dùng trong helm value nên là:

tags:
  application: '{{ printf "{{ application }}" }}'

Dưới đây là các code sưu tầm.

2.4) (Optional)parse syslog by Vector

the configuration using the remap transform:

Parse the JSON Log: Use the remap transform to parse the JSON log.
Extract the time_ms Value: Extract the time_ms value from the nested message field.
Transform to Metric: Convert the extracted time_ms value into a metric.

Here is the revised YAML configuration:

customConfig:
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: true
  sources:
    otel_collector:
      type: opentelemetry
      grpc:
        address: '0.0.0.0:4317'
      http:
        address: '0.0.0.0:4318'
    # Add your log source here if needed

  transforms:
    remap_argocd:
      type: remap
      inputs:
        - "otel_collector.logs"
      source: |
        parsed, err = parse_syslog(.message)
        if err != null {
          log_msg, err = if .message != null && is_string(.message) {
            "Unable to parse SysLog: " + .message
          } else {
            "Unable to parse SysLog: message field is missing or not a string"
          }
          if err != null {
            log("Error constructing log message: " + err, level: "error")
          } else {
            log(log_msg, level: "error")
          }
        } else {
          . = parsed
        }

  sinks:
    console_sink:
      type: console
      encoding:
        codec: text
      inputs:
        - remap_argocd

service:
  ports:
    - name: otel-collector-http
      port: 4318
      protocol: TCP

The parse_and_extract_time_ms transform in the provided YAML configuration is a Vector transform that uses the remap language. This transform is designed to parse a JSON log message and then extract a specific value (time_ms) from it. Here’s a breakdown of each part of the transform:

# Parsing JSON and extracting time_ms
parse_and_extract_time_ms:
  type: remap
  inputs: ["my_source_id"]
  source: |
    . = parse_json!(string!(.message))
    .time_ms = to_float!(.log_message.time_ms)

Type:
- type: remap chỉ định rằng transform náy sử dụng Vector’s remap language, mà nó là 1 powerful tool for việc sử lý log và metric data.
Inputs:
- inputs: ["my_source_id"] specifies the input to this transform, which in this case is my_source_id. This should be the ID of a source or another transform that precedes this one in your Vector configuration.
Source:
- The source field contains the actual remap script:
  - . = parse_json!(string!(.message)):
    - string!(.message): phần này converts the message field of the log trong một string. cái dấu "!" chỉ định rằng đây là 1 quả quyết – if the conversion fails, an error will be logged, and the event will be dropped.
    - parse_json!(...): This part attempts(nỗ lực) to parse the stringified message as JSON. Again, the ! asserts(khẳng định) that this must succeed, or an error will occur, and the event will be dropped.
    - . = ...: This sets the root object (.) in the remap context to the result of the parse_json! function. This means the entire log event is now replaced with the parsed JSON object.
  - .time_ms = to_float!(.log_message.time_ms):
    - This line extracts the time_ms value from the parsed JSON object. It assumes that after parsing the JSON, there is a field log_message which contains time_ms.
    - to_float!(...): This converts the time_ms value to a floating-point number. The ! asserts that this conversion must succeed.
    - .time_ms = ...: This sets a new field time_ms at the root of the event with the converted floating-point number.

2.5) (Optional)parse grok by Vector (same as logstash)

customConfig:
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: true
  sources:
    otel_collector:
      type: opentelemetry
      grpc:
        address: '0.0.0.0:4317'
      http:
        address: '0.0.0.0:4318'
    # Add your log source here if needed

  transforms:
    remap_argocd:
      type: remap
      inputs:
        - "otel_collector.logs"
      source: |
        parsed, err_parsed = parse_grok(
          .message, 
          "time=\"%{TIMESTAMP_ISO8601:timestamp}\" level=%{LOGLEVEL:level} msg=\"%{GREEDYDATA:msg}\" application=%{GREEDYDATA:application} build_options_ms=%{NUMBER:build_options_ms:int} helm_ms=%{NUMBER:helm_ms:int} plugins_ms=%{NUMBER:plugins_ms:int} repo_ms=%{NUMBER:repo_ms:int} time_ms=%{NUMBER:time_ms:int} unmarshal_ms=%{NUMBER:unmarshal_ms:int} version_ms=%{NUMBER:version_ms:int}"
        )
        if err_parsed != null {
          log_msg, err_log_msg = "Unable to parse Grok: " + .message
          if err_log_msg != null {
            log(err_log_msg, level: "error")
          }
          log(log_msg, level: "error")
        } else {
          . = parsed
        }
    log2metric_argocd:
      type: log_to_metric
      inputs:
        - remap_argocd
      metrics:
        - type: gauge
          field: time_ms
          name: response_time_ms
          namespace: service
          tags:
            application: '{{ printf "{{ application }}" }}'

  sinks:
    console_sink:
      type: console
      encoding:
        codec: text
      inputs:
        - log_to_metric
    prometheus_exporter:
      type: prometheus_exporter
      inputs:
        - log2metric_argocd
      address: "0.0.0.0:1994"

service:
  ports:
    - name: otel-collector-http
      port: 4318
      protocol: TCP
    - name: metrics
      port: 1994
      protocol: TCP

3) Sinks

Sinks trong vector là các bạn public kết quả của transfroms hay sources ra prometheus, stdout (console).

3.1) output result to console.

chúng ta sẽ thường xuyền output ra ngoài console (stdout) để sẽ kết quả

  sinks:
    console_metrics:
      type: console
      encoding:
        codec: text
      inputs:
        - convert_to_metrics

3.2) Publish metrics via prometheus client

Lúc này bạn publish metrics thông qua metrics page http://xxx.xxx:metrics thì promeheus server mới scapes metrics được

    convert_to_metrics:
      type: log_to_metric
      inputs:
        - extract_metrics
      metrics:
        - type: gauge
          field: time_ms
          name: response_time_ms
          namespace: argocd
          tags:
            application: '{{ printf "{{ application }}" }}'

  sinks:
    prometheus_exporter:
      type: prometheus_exporter
      inputs:
        - convert_to_metrics
      address: "0.0.0.0:9598"

giờ bạn curl vào localhost:9598/metrics

openvscode-server@openvscode-server-65d78d546b-n2tv2:~$ curl http://vector-headless.default:9598/metrics
# HELP argocd_response_time_ms response_time_ms
# TYPE argocd_response_time_ms gauge
argocd_response_time_ms{application="argocd/kafka-strimzi"} 1367 1702489126145
argocd_response_time_ms{application="argocd/opentelemetry-collector"} 81 1702489125144
argocd_response_time_ms{application="argocd/argocd-image-updater"} 271 1702489125343
argocd_response_time_ms{application="argocd/ingress-nginx"} 277 1702489125343
argocd_response_time_ms{application="argocd/cilium"} 473 1702489125546
argocd_response_time_ms{application="argocd/harbor"} 463 1702489125546
argocd_response_time_ms{application="argocd/vector"} 188 1702489125144
argocd_response_time_ms{application="argocd/backstage"} 226 1702489125145
argocd_response_time_ms{application="argocd/rancher"} 190 1702489125144