Monday, 14 December 2015

Apache Kafka – MultiBroker + Partitioning + Replication

The main goal of this post is to demonstrate the concept of multi broker, partitioning and replication in Apache Kafka. At the end of this post, steps are included to setup multiple brokers along with partitioning and replication.

If you are new to Apache kafka, you can refer below posts to understand Kafka quickly.

About Multi Brokers
Setting up more than one broker in kafka cluster is called multi broker concept. Below mentioned diagram demonstrates the architecture of multi broker in Kafka cluster. If you are in hurry setting up multi broker, you can scroll down to the below mentioned steps.



About Partition in Kafka
Topic is one of the main abstractions in Kafka where partitions can be considered as subset of the topic. Partitions are managed by Kafka brokers. Producers are responsible to produce messages to topic/partitions. Each message in a partition is represented with unique id which is called ‘message offset’. Basically, this message offset can be understood as increasing logical time stamp within a partition. Consumers are responsible to request message from certain offset onward. For each consumer group, messages are guaranteed to be consumed at least once. Below diagram shows the basic architecture of partitions.


About Replication in Kafka
Suppose you are having 2 brokers (broker-1 and broker-2) and your message published to broker-1. What if your broker-1 fails due to some error? In this case your message will be lost and will never be consumed. To solve this problem, replication concept plays important role. Replication means, replicating messages among different brokers. Replication gives the guarantee that any published message should not be lost and consumed properly even broker fails due to program error or machine error. Replication provides better durability and higher availability. Both producers and consumers are aware about replication in Kafka. Below diagram shows the concept of replication.

There are so many other aspects to understand about replication in kafka. Each and every detail is not covered in this post. To get more detail please follow this link.

How to Setup Multi Broker
Well, we talked about enough theory. Now it's time to do some practical work. Setting up multiple brokers is straight forward. You just need to follow below steps to setup multiple brokers. 

1. First you need to start Zookeeper server. To run it, execute below command.
<kafka_dir>\bin\windows\zookeeper-server-start.bat ..\..\config\zookeeper.properties
2. Go to <kafka_dir>\config\server.properties file and make a copy of it at same location say ‘first- broker-server.properties’.

3. You just need to change couple of properties in first- broker-server.properties to setup first broker.
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

# The port the socket server listens on, it should be unique for each broker

port=9092

# A comma seperated list of directories under which to store log files

log.dirs=<kafka_dir>/kafka-logs/first-broker-server

# Zookeeper connection string. This this the host and port where your zookeeper server is running.

zookeeper.connect=localhost:2181
4. Go to <kafka_dir>\config\server.properties file and make another copy of it at same location say ‘second-broker-server.properties’.

5. Now change the properties in second-broker-server.properties for second broker.
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=2

# The port the socket server listens on, it should be unique for each broker

port=9093

# A comma seperated list of directories under which to store log files

log.dirs=<kafka_dir>/kafka-logs/second-broker-server

# Zookeeper connection string. This this the host and port where your zookeeper server is running.

zookeeper.connect=localhost:2181
6. Now you need to start both brokers. To start broker, execute below commands for all the brokers:

    Start first broker:
<kafka_dir>\bin\windows\kafka-server-start.bat ..\..\config\first-broker-server.properties
    Start second broker:
<kafka_dir>\bin\windows\kafka-server-start.bat ..\..\config\second-broker-server.properties
7. Now create topic 'multibrokertopic' with 2 partition and 2 replication. 
<kafka_dir>\bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 2 --partitions 2 --topic multibrokertopic
8. In this step you can see how to produce message to multiple brokers using command. To connect any producer to multiple brokers, we need to configure the list of brokers (comma separated list of <ip>:<port> where brokers are running). 
<kafka_dir>\bin\windows\kafka-console-producer.bat --broker-list localhost:9092,localhost:9093 --topic multibrokertopic
9. Now start the consumer to see the published messages.. 
<kafka_dir>\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic multibrokertopic
Hope you had great time reading this post. In next post, I will demonstrate how to implement producer in Java to send messages to multiple brokers along with partitioning.