Skip to content

NimTechnology

Trình bày các công nghệ CLOUD một cách dễ hiểu.

  • Kubernetes & Container
    • Docker
    • Kubernetes
      • Ingress
      • Pod
    • Helm Chart
    • OAuth2 Proxy
    • Isito-EnvoyFilter
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Vault
    • Longhorn – Storage
    • VictoriaMetrics
    • MetalLB
    • Kong Gateway
  • CI/CD
    • ArgoCD
    • ArgoWorkflows
    • Argo Events
    • Spinnaker
    • Jenkins
    • Harbor
    • TeamCity
    • Git
      • Bitbucket
  • Coding
    • DevSecOps
    • Terraform
      • GCP – Google Cloud
      • AWS – Amazon Web Service
      • Azure Cloud
    • Golang
    • Laravel
    • Python
    • Jquery & JavaScript
    • Selenium
  • Log, Monitor & Tracing
    • DataDog
    • Prometheus
    • Grafana
    • ELK
      • Kibana
      • Logstash
  • BareMetal
    • NextCloud
  • Toggle search form

[Kafka-connect] research on Kafka Connect Source and demo watch the changing file.

Posted on February 4, 2022February 27, 2022 By nim No Comments on [Kafka-connect] research on Kafka Connect Source and demo watch the changing file.

Bài này chúng ta sẽ tìm hiểu thử Kafka connect là cái quái j?
Và có 1 vài demo để dễ hiểu hơn.

Đầu tiên:
Nếu bạn muốn có người cầm tay chỉ việc vip hơn mình thì down khoá này nhé:
https://www.udemy.com/course/kafka-connect/

Contents

Toggle
  • 1) Kafka Connect Source
    • 1.1) STANDALONE MODE
      • 1.1.1) overview
      • 1.1.2) Practice.
        • 1.1.2.1) Run Kafka-connect on docker to learn.
        • 1.1.2.2) Create topic and kafka connect.
    • 1.2) DISTRIBUTED MODE.
      • 1.2.1) overview
      • 1.2.2) Practice
      • 1.2.2.1) create Connectors on UI

1) Kafka Connect Source

Hiệu đơn giản thì chúng ta sẽ lấy data từ 1 source nào đó (file, database, …) rồi write vào 1 topic trên kafka.

1.1) STANDALONE MODE

1.1.1) overview

Example: FileStreamSourceConnector STANDALONE MODE

  • Goal:
    • Read a file and load the content directly into Kafka
    • Run in a connector in standalone mode (useful for development)
  • Learning:
    • Understand how to configure a connector in standalone mode
    • Get a first feel for Kafka Connect Standalone

OK vậy phần này chúng ta chỉ dựng các components của Kafka hay Kafka connect để học các tính năng của nó. Về phần dựng production ntn đó thì sau khi tìm hiểu kha khá mình sẽ chỉ he.

1.1.2) Practice.

1.1.2.1) Run Kafka-connect on docker to learn.
version: '2'

services:
  # this is our kafka cluster.
  kafka-cluster:
    network_mode: "host"
    image: landoop/fast-data-dev
    environment:
      ADV_HOST: "192.168.101.36"         # Change to 192.168.99.100 if using Docker Toolbox
      RUNTESTS: 0                 # Disable Running tests so the cluster starts faster
    # ports:
    #   - 2181:2181                 # Zookeeper
    #   - 3030:3030                 # Landoop UI
    #   - 8081-8083:8081-8083       # REST Proxy, Schema Registry, Kafka Connect ports
    #   - 9581-9585:9581-9585       # JMX Ports
    #   - 9092:9092                 # Kafka Broker

  # we will use elasticsearch as one of our sinks.
  # This configuration allows you to start elasticsearch
  elasticsearch:
    image: itzg/elasticsearch:2.4.3
    environment:
      PLUGINS: appbaseio/dejavu
      OPTS: -Dindex.number_of_shards=1 -Dindex.number_of_replicas=0
    ports:
      - "9200:9200"

  # we will use postgres as one of our sinks.
  # This configuration allows you to start postgres
  postgres:
    image: postgres:9.5-alpine
    environment:
      POSTGRES_USER: postgres     # define credentials
      POSTGRES_PASSWORD: postgres # define credentials
      POSTGRES_DB: postgres       # define database
    ports:
      - 5432:5432                 # Postgres port

Chúng ta sẽ chạy kafka-cluster ở mode network host luôn nhé.

Chúng ta quan tâm để container Kafka nhé
sau đó bạn truy cập vào IP:3030 bằng browser
nếu hiện như mình là ok
1.1.2.2) Create topic and kafka connect.

Thực hiện exec vào container kafka.

docker exec -it kafka-connect_kafka-cluster_1 bash

Chúng ra cần tạo 1 folder và tạo 3 file

mkdir -p /tutorial/source/demo-1
cd /tutorial/source/demo-1

tạo file
vi worker.properties

# from more information, visit: http://docs.confluent.io/3.2.0/connect/userguide.html#common-worker-configs
bootstrap.servers=127.0.0.1:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
# we always leave the internal key to JsonConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter.schemas.enable=false
# Rest API
rest.port=8086
rest.host.name=127.0.0.1
# this config is only for standalone workers
offset.storage.file.filename=standalone.offsets
offset.flush.interval.ms=10000

bootstrap.servers: gọi vào IP kafka cluster
offset.storage.file.filename: nó lưu data ra 1 file
offset.flush.interval.ms: cái thời gian(s) nó sẽ watch file và nếu file có sự thay đổi nó sẽ write data vào topic

Tiếp tục tạo 1 file:
vi file-stream-demo-standalone.properties

# These are standard kafka connect parameters, need for ALL connectors
name=file-stream-demo-standalone
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
# Parameters can be found here: https://github.com/apache/kafka/blob/trunk/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceConnector.java
file=/tutorial/source/demo-1/demo-file.txt
topic=demo-1-standalone

file: chúng ta chỉ cho kafka connect watchs file bào và ở đâu?
topic: write data vào topic nào?

tạo 1 file
touch demo-file.txt

giờ dùng command để tạo 1 topic:

# create the topic we write to with 3 partitions
kafka-topics --create --topic demo-1-standalone --partitions 3 --replication-factor 1 --zookeeper 127.0.0.1:2181

Tạo kafka-connect:

# Usage is connect-standalone worker.properties connector1.properties [connector2.properties connector3.properties]
connect-standalone worker.properties file-stream-demo-standalone.properties
sau khi chạy câu lệnh mà thấy “INFO Created connector file-stream-demo-standalone (org.apache.kafka.connect.cli.ConnectStandalone:112)“
Vậy là ngon

Giờ trở về lại trang web:

Lúc này data bằng null
Minh đã save và ko xuống dòng thì kafka connect nó đã ko hiểu!

Quay trợ lại brower và F5:

Vậy đã watch file and write data vào topic thành công!

1.2) DISTRIBUTED MODE.

1.2.1) overview

Example: FileStreamSourceConnector DISTRIBUTED MODE

  • Goal:
    • Read a file and load the content directly into Kafka
    • Run in distributed mode on our already set-up Kafka Connect Cluster
  • Learning:
    • Understand how to configure a connector in distributed mode
    • Get a first feel for Kafka Connect Cluster
    • Understand the schema configuration option

1.2.2) Practice

1.2.2.1) create Connectors on UI

kafka-topics --create --topic demo-2-distributed --partitions 3 --replication-factor 1 --zookeeper 127.0.0.1:2181
Mình cần chỉnh sửa nội dụng trong text này.

Bạn copy nội dụng bên dưới và paste vào ô trên

# These are standard kafka connect parameters, need for ALL connectors
name=file-stream-demo-distributed
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
# Parameters can be found here: https://github.com/apache/kafka/blob/trunk/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceConnector.java
file=/tutorial/source/demo-1/demo-file.txt
topic=demo-2-distributed
# Added configuration for the distributed mode:
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true

Giờ chúng ta trở lại với topic trên browser:

Đây là data ở mục cũ.
giờ ghi thêm dòng mới.
Kiểm tra lại topic

Bạn thấy data đã được convert sang JSON nhờ vào Option này:
value.converter.schemas.enable=true

Bạn có thể chạy command sau để dễ tưởng tưởng hơn:

kafka-console-consumer --bootstrap-server localhost:9092 --topic demo-2-distributed --from-beginning
Data toàn jsons
Apache Kafka, Kafka Connect

Post navigation

Previous Post: [Kibana] Create a Tag cloud or TEXT Chart on Kibana!
Next Post: [Golang] Thiết kế model trong golang và echo framework.

More Related Articles

[Kafka] Kafka Console Producer CLI. Apache Kafka
[Kafka-connect] Single Message Transform: lesson 7 – TimeStampRouter and MessageTimestampRouter – Custom format topic name with timestamp Apache Kafka
[Kafka] UI control Kafka, Kafka-connect, … It’s akhq.io Apache Kafka
[Kafka-connect] Streaming the data of MySQL throughs Kafka-connect and Debezium plugin. Apache Kafka
IBM MQ -> RabbitMQ -> Kafka ->Pulsar: How do message queue architectures evolve? Kafka Connect
[Kafka-connect] Streaming the data of Postgresql through Kafka-connect and Debezium plugin. Apache Kafka

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tham Gia Group DevOps nhé!
Để Nim có nhiều động lực ra nhiều bài viết.
Để nhận được những thông báo mới nhất.

Recent Posts

  • [Azure] The subscription is not registered to use namespace ‘Microsoft.ContainerService’ May 8, 2025
  • [Azure] Insufficient regional vcpu quota left May 8, 2025
  • [WordPress] How to add a Dynamic watermark on WordPress. May 6, 2025
  • [vnet/Azure] VNet provisioning via Terraform. April 28, 2025
  • [tracetcp] How to perform a tracert command using a specific port. April 3, 2025

Archives

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

Categories

  • BareMetal
    • NextCloud
  • CI/CD
    • Argo Events
    • ArgoCD
    • ArgoWorkflows
    • Git
      • Bitbucket
    • Harbor
    • Jenkins
    • Spinnaker
    • TeamCity
  • Coding
    • DevSecOps
    • Golang
    • Jquery & JavaScript
    • Laravel
    • NextJS 14 & ReactJS & Type Script
    • Python
    • Selenium
    • Terraform
      • AWS – Amazon Web Service
      • Azure Cloud
      • GCP – Google Cloud
  • Kubernetes & Container
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Docker
    • Helm Chart
    • Isito-EnvoyFilter
    • Kong Gateway
    • Kubernetes
      • Ingress
      • Pod
    • Longhorn – Storage
    • MetalLB
    • OAuth2 Proxy
    • Vault
    • VictoriaMetrics
  • Log, Monitor & Tracing
    • DataDog
    • ELK
      • Kibana
      • Logstash
    • Fluent
    • Grafana
    • Prometheus
  • Uncategorized
  • Admin

Copyright © 2025 NimTechnology.