Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

Kafka Server Setup

Justin Woo edited this page Oct 4, 2021 · 66 revisions

Requirement:

Follow below steps on all nodes where kafka is to be installed & configured (1 node / multi-node deployment of kafka)

1. Install kafka

Preferred way is to get the kafka rpm from below location & install the same. This will install kafka binaries, and create required 'kafka' user and 'kafka' group. Note that this user is 'nohome' user.

wget https://github.com/Seagate/cortx/releases/download/third-party-deps-1.0.0-0/third-party-centos-7.8.2003-1.0.0-0.tar.gz
tar -xvf third-party-centos-7.8.2003-1.0.0-0.tar.gz
cd centos-7.8.2003-2.0.0-*/commons/kafka
yum install kafka-2.13_2.7.0-el7.x86_64.rpm

Validate 'kafka' user and group has been created. If not found follow below steps to create them

1.a Create user kafka

We will use Kafka downloaded from Seagate's repository; and by default, that Kafka is configured to be run by kafka user in kafka group, thus we have to create such user and group.

sudo su 
adduser kafka
usermod -aG wheel kafka
echo "kafka ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/90-cloud-init-users

groupadd --force kafka
usermod --append --groups kafka kafka
exit

Installation Mode:

A. Kafka 1 node Setup

Kafka Configuration (server.properties)

The following has to be configured in /opt/kafka/config/server.properties

Configure the below if hostname or FQDN (fully qualified domain name) is used in zookeeper.connect

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://{hostname/FQDN}:9092

Configure the below to make delete(purge) interface work

# The interval at which log segments are checked to see if they can be deleted according 
# to the retention policies 
log.retention.check.interval.ms=1 
log.delete.delay.ms=1 
log.flush.offset.checkpoint.interval.ms=1 

Configure below to indicate log directory for kafka broker

log.dirs=/var/local/data/kafka

Make below changes to indicate data and log directory for zookeeper in file /opt/kafka/config/zookeeper.properties. Since kafka is nohome user, make use of below directories.

dataLogDir=/var/log/zookeeper
dataDir=/var/lib/zookeeper
Note:

If in case any directory like datadir/logdir is already present, then clean the content of that directory before starting zookeeper and kafka broker. Ensure that these directories have proper ownership (kafka:kafka). Use the following command to change the ownership

If in case datadir and datalogdir are not present, please create them.

mkdir -p /var/log/zookeeper
mkdir -p /var/lib/zookeeper
mkdir -p /var/local/data/kafka

# Make sure that kafka:kafka has access to the dataDir and logDir, including the parent directories
chown -R kafka:kafka /var/lib/zookeeper
chown -R kafka:kafka /var/log/zookeeper
chown -R kafka:kafka /var/local/data/kafka

Using systemctl command for controlling kafka and zookeeper services.

Enable services

systemctl enable kafka-zookeeper
systemctl enable kafka

Start services and check the status

systemctl start kafka-zookeeper
sleep 5 # (kafka service needs zookeeper service to be up and running.)
systemctl status kafka-zookeeper
systemctl start kafka 
systemctl status kafka
# Make sure that you see `Active: active (running)` when checking the status of both systems.

How to stop Services

systemctl stop kafka
systemctl stop kafka-zookeeper

B. Kafka 3 node Cluster Setup

Download the kafka rpm using the command and install it in all the nodes

curl "http://cortx-storage.colo.seagate.com/releases/cortx/third-party-deps/centos/centos-7.8.2003-2.0.0-latest/commons/kafka/kafka-2.13_2.7.0-el7.x86_64.rpm" -o kafka.rpm
yum install kafka.rpm

If above location is not reachable, then find the Kafka rpm in this tar image - https://github.com/Seagate/cortx/releases/download/third-party-deps-1.0.0-0/third-party-centos-7.8.2003-1.0.0-0.tar.gz

Kafka Configuration

Kafka configuration involves setting up server.properties, zookeeper.properties, creating myid file and setting correct ownership to datadir.

server.properties configuration

The following has to be configured in /opt/kafka/config/server.properties across nodes

Define a unique broker id for each kafka server.

broker.id=0 

Define a directory for storing of log files

log.dirs=/var/local/data/kafka

To form a cluster of 3 nodes, add a comma separated list of node and port addresses in the zookeeper.connect parameter so that if a zookeeper instance fails, the node will automatically try to connect to the next available address

zookeeper.connect= <node 1 address>:2181,<node 2 address>:2181,<node 3 address>:2181

Configure the below if hostname or FQDN is used in zookeeper.connect

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://{hostname/FQDN}:9092

Configure the below to make delete(purge) interface work,

# The interval at which log segments are checked to see if they can be deleted according 
# to the retention policies 
log.retention.check.interval.ms=1 
log.delete.delay.ms=1 
log.flush.offset.checkpoint.interval.ms=1 

Note : It is possible to have multiple kafka server instances on a single node. In that case we need to define separate server.properties file for each instance.

Set proper replication factor for metadata and transaction states. This is required in multi-node setup.

default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

zookeeper.properties configuration

Define the configuration for the zookeeper in the /opt/kafka/config/zookeeper.properties file by the following configuration parameters

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper     (myid file will be created inside this directory)>
dataLogDir=/var/log/zookeeper  (if not defined, then datadir will be used)>
clientPort=2181
server.1=<node 1 address>:2888:3888
server.2=<node 2 address>:2888:3888
server.3=<node 3 address>:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=24

The details for the configuration parameters can be found at https://zookeeper.apache.org/doc/current/zookeeperStarted.html.

Repeat the above steps for each node in the cluster.

Create myid file

In the dataDir folder, add a file myid and add the node id as 1 to the file in the first node. (This must be a single integer value).
Similarly, for nodes 2 and 3, add their respective ids in dataDir/myid file on the respective nodes.  

Set correct ownership of datadir and myid file to kafka:kafka

If in case any directory like datadir/logdir is already present, then clean the content of that directory before starting zookeeper and kafka broker. Ensure that these directories have proper ownership (kafka:kafka).

chown -R kafka:kafka <path/to/datadir>

Using systemctl command for controlling kafka and zookeeper services.

Enable services

Enable the services on each node.

systemctl enable kafka-zookeeper
systemctl enable kafka

Start Services

Start the services on each node.

systemctl start kafka-zookeeper
sleep 5 # (kafka service needs zookeeper service to be up and running.)
systemctl start kafka

How to stop Services

To Stop service on each node.

systemctl stop kafka
systemctl stop kafka-zookeeper