CDC is a design pattern that captures real time incremental changes to data. Some of its usecases include data replication between databases, building search index and data sync in microservices etc.
Kafka connect is a highly distributed, scalable, and fault tolerant kafka framework. It allows data integration between kafka and external systems. Debezium is a distributed open source platform for CDC built on top of Apache Kafka. It monitor changes in a database and trigger events to Kafka in real time! It has built in support for various connectors for databases like MySQL, MongoDB, PostgreSQL and others.
I will use docker compose file to quickly setup all instances that includes zookeeper, kafka, kafka consumer, kafka connect and mysql.
Our goal is to capture changes from mysql database in real time and publish them to a kafka topic, we will setup a consumer on this topic to detect those events. Here we will be using debezium mysql connector to detect changes on mysql database.
Here’s a docker compose to quickly spin up all the services. Its a modification of the original docker-compose by debezium
|
|
Start above docker compose using docker compose up
.
The services in the docker compose includes:
- zookeeper, kafka and connect are being used to setup kafka and kafka-connect, kafka-connect will read changes from the mysql database and publish the changes to a kafka topic
dbserver1.inventory.customers
- consumer is a kafka consumer it subscribes to the topic
dbserver1.inventory.customers
. - mysql is a debezium example image for mysql which comes with an inventory database.
|
|
Next, run the following command to check whether kafka connect is up and running if the connection is successful that means the kafka connect service is up and running.
|
|
Initially there will be no connector. We will register a debezium mysql connector to inventory database, it allows kafka connect to read changes from mysql db.
|
|
Check all the registered connectors using
|
|
It returns something like this, this time.
|
|
On the container logs you should see something like this once the connector is registered.
Next login to mysql and make changes. The changes will be detected by the debezium mysql connector and kafka connect will publish those to a kafka topic, we will also use consumer that would read the changes from the kafka topic.
We will update the customers
table.
|
|
Now check the kafka consumer logs. If you look closely at the logs it contains information about the field changes.
We can delete a record in the table customers
, we will remove the foreign key contraints first by deleting the record from the addresses
table.
Similary we can also detect insert operation to the table.
In case kafka goes down for sometime, on restart the debezium mysql connector will read the binlog file from last commit(as long as the binlog isn’t purged by the mysql server). More information around it can be found in the official documentation of debezium mysql connector here.