• IcerbergIdentity

Migrate Kafka to 
Google Cloud Pub/Sub !

Updated: Aug 18, 2019

1. Introduction

This guide describes a process for migrating a Kafka installation to Google Cloud Pub/Sub, while avoiding downtime or dropped messages.


This guide assumes that Kafka is currently running on the Google Cloud Platform, or at the very least there is a data connection available to stream messages between the existing Kafka installation and Google Cloud Pub/Sub.



  1. Each message topic needs to be migrated individually, but they can be done in parallel.

  2. Create a new topic in Cloud Pub/Sub to replace the existing Kafka topic

  3. Create a Pub/Sub publisher for the source connector

  4. Create a Pub/Sub subscriber for an application

  5. Configure a subscriber application to receive messages from the new Pub/Sub topic

  6. Configure Sink Connector to receive messages from Kafka and convert to Pub/Sub, writing to the new topic

  7. Configure a publisher application to publish messages directly to Pub/Sub

  8. Repeat steps 3-6 for each publisher and subscriber

  9. Decommission Sink Connector and delete topic from Kafka


2. Demonstration

Create a project in the Google Cloud Console. For this demo, the project name is kafka-migration-demo, but you will likely have to use a different project name.

Install Kafka.

Connect to the cloud shell to run all the following commands.

First, enable all the Google Cloud APIs that are required for this demo. These will be enabled automatically as part of the following steps, but doing it here speeds up the process later on.

$ gcloud services enable \
iam.googleapis.com \
cloudresourcemanager.googleapis.com \
containerregistry.googleapis.com

Set some environment variables to make the following commands simpler. Remember to replace the values with what makes sense for your environment.

$ PROJECT=kafka-migration-demo # Replace with your project name
$ ZONE=us-central1-a
$ KAFKA=kafka-1-vm # or the IP address/hostname for your kafka cluster broker

The demonstration will run inside a Google Kubernetes Engine cluster, as it makes deployment and updating of each stage easy. The process will still apply outside of Kubernetes.

Create a Kubernetes Engine cluster, called cluster-1.

$ gcloud container clusters create --zone $ZONE cluster-1
$ gcloud container clusters get-credentials --zone $ZONE cluster-1

Check out the demonstration code from the PSO repository. This demo repository contains code to demonstrate the process, as well as configuration for the Kafka Connector and for creating deployments in Kubernetes.

$ git clone https://pso.googlesource.com/application-development/kafka-to-pubsub-migration
$ cd kafka-to-pubsub-migration
$ sed -i.bak "s/kafka-to-pubsub-migration/$PROJECT/g" stage-*/*

To avoid having to give extra permissions to pods deployed in Kubernetes, we will create the Kafka and Pub/Sub topics before running the jobs.

# Create kafka topics.
$ wget http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.11-1.1.0.tgz
$ tar zxf kafka_2.11-1.1.0.tgz
$ sudo apt-get install default-jdk
$ for TOPIC in test_topic pws-offsets pws-config pws-status; do \
kafka_2.11-1.1.0/bin/kafka-topics.sh --zookeeper $KAFKA --create \
--topic $TOPIC --partitions 1 --replication-factor 1
done
# Create Pub/Sub Topic.
$ gcloud pubsub topics create test_topic
$ gcloud pubsub subscriptions create test_subscriber --topic test_topic

Build containers to run the publisher and subscriber. These containers are uploaded to the Google Container Registry and deployed to Kubernetes.

$ gcloud auth configure-docker
$ docker build -f stage-1/Dockerfile -t gcr.io/$PROJECT/publisher publisher && \
docker push gcr.io/$PROJECT/publisher
$ docker build -f stage-1/Dockerfile -t gcr.io/$PROJECT/subscriber subscriber && \

3. Stage 1: pre-migration


Run the test environment on Kubernetes. If your Kafka VM is not called kafka-1-vm then you'll need to edit stage-*/*.yaml and change the hostname there.

$ kubectl apply -f stage-1/environment.yaml
$ sleep 60
$ kubectl get pod
$ kubectl logs -f subscriber-5474ff8d8c-t5pz5 # Replace with your pod name

You should see a sequence of "Consumed message..." messages every second.


4. Stage 2: publish to Kafka, subscribe to Pub/Sub


Build Kafka Connector.

$ docker build -f stage-2/Dockerfile -t gcr.io/$PROJECT/kafka-connector stage-2 && \
docker push gcr.io/$PROJECT/kafka-connector
Grant the Kafka Connector service account permissions to read and write from Pub/Sub.

Grant the Kafka Connector service account permissions to read & write from Pub/Sub

$ gcloud iam service-accounts create kafka-connect
$ gcloud projects add-iam-policy-binding $PROJECT \
--member serviceAccount:kafka-connect@$PROJECT.iam.gserviceaccount.com \
--role roles/pubsub.publisher
$ gcloud projects add-iam-policy-binding $PROJECT \
--member serviceAccount:kafka-connect@$PROJECT.iam.gserviceaccount.com \
--role roles/pubsub.subscriber

Generate and download an authentication key to create a secret in Kubernetes.

$ gcloud iam service-accounts keys create key.json \
--iam-account=kafka-connect@$PROJECT.iam.gserviceaccount.com
$ kubectl create secret generic connect-key --from-file=key.json
$ shred key.json

Start Kafka connector.

$ kubectl apply -f stage-2/environment.yaml
$ sleep 60
$ IP=`kubectl get service kafka-connector -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`
$ curl -s -H "Content-Type: application/json" -H "Accept: application/json" \
http://$IP:8083/connectors/ -X POST -d @stage-2/sink-connector.json
$ kubectl get pod
$ kubectl logs -f subscriber-5474ff8d8c-t5pz5 # Replace with your pod name

You should see a sequence of "Consumed message..." messages every second.


5. Stage 3: publish and subscribe through Pub/Sub


Change the publisher to publish directly to Pub/Sub.

$ kubectl apply -f stage-3/environment.yaml
$ sleep 60
$ kubectl get pod
$ kubectl logs -f subscriber-5474ff8d8c-t5pz5 # Replace with your pod name

Once all publishers have been updated to publish directly to Pub/Sub, turn down the Kafka Connector.

$ kubectl delete deploy kafka-connector
$ sleep 60
$ kubectl get pod
$ kubectl logs -f subscriber-5474ff8d8c-t5pz5 # Replace with your pod name

You should still see the "Consumed message…" messages continue.

Iceberg

Identity

LINKS
ABOUT

info@icebergidentity.com

Tel:07476360923

SOCIAL
  • Black LinkedIn Icon
  • Black Twitter Icon

© 2023 by IcebergIdentity