Some notes on setting up and configuring a Zookeeper ensemble. Many guides use some shortcuts to setting up the ensemble such as running multiple Zookeeper instances on a single machine or running them in Docker. Both require some workarounds to make it work. As a better example, this guide runs three Zookeeper instances across three different VMs.
It’s not quite production ready. There are still some caveats for this development grade setup:
- My servers are Google Compute Engine (GCE) instances on the default network with a public IP. In production, consider using a custom network, internal IP only and firewall rules to restrict access.
- My Zookeeper instances are not secured. Zookeeper supports secured encrypted connections for server-server (leader election) and client-server connections. We’ll look at Zookeeper security in a future post.
- Zookeeper data is written to the Operating System disk. In production, consider attaching a fast dedicated disk for this. You’ll need more than the 7GB or so unused space on the OS disk.
- Servers are GKE e2-small instances with 2 GB memory and 2 vCPUs. On production, a minimum of 4 GB available memory is recommended to avoid swapping to disk.
- I run Zookeeper from command line under the current user account. In production, run it as a service using a dedicated service account.
My starting point is three Debian GNU / Linux 10 (buster) instances, one in each of the three Availability Zones of europe-north1 Region. This is a sensible production configuration as it provides resilience against failure of a zone. If you’re running your ensemble on-premises, run each instance in a different data centre (if possible) to provide resilience against loss of a data centre.
Install and configure a single Zookeeper instance
Starting from a fresh Debian GNU / Linux 10 (buster) instance we need to install Java Runtime Environment (JRE). This is easy enough with apt-get. I’ve also installed wget and netcat utilities as we’ll need them later.
sudo apt-get update
sudo apt-get install default-jre wget netcat
Next, download the current Zookeeper binary package (3.6.3 at time of writing), extract it and set the owner to the current user account.
wget https://archive.apache.org/dist/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz
tar -xzf apache-zookeeper-3.6.3-bin.tar.gz
sudo mv apache-zookeeper-3.6.3-bin /usr/local/zookeeper
sudo mkdir /var/lib/zookeeper
sudo chown -R $USER: /var/lib/zookeeper
rm apache-zookeeper-3.6.3-bin.tar.gz
Now we need a basic Zookeeper configuration file. We’ll run this instance stand-alone for now so we can start from the provided example:
sudo cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
Edit the zoo.conf and just set the dataDir to the directory we created earlier:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
That’s all we need for a single instance. Start it up:
/usr/local/zookeeper/bin/zkServer.sh start
and verify it starts by tailing the logs:
tail -n 100 -f /usr/local/zookeeper/logs/zookeeper-$USER-server-zookeeper-1.out
Start 2 additional servers
We need two additional instances for our ensemble. For high availability we’ll run these instances in different GCE zones. We could create two new Debian GNU / Linux 10 (buster) instances and then repeat the steps above. However, it’s simpler just to clone the first instance.
In Google Compute Engine this is easily done by creating a machine image from the running instance and then creating two new instances from the image. Don’t forget to set the machine name and the region / zone of the new instances.
Once this is done, we now have three identical servers running in three zones of the same region. Assuming my GCE projectId is 12345 and I’m using europe-north1 region then the internal network hostnames of my three servers are:
- zookeeper-1.europe-north1-a.c.zookeeper-12345.internal
- zookeeper-2.europe-north1-b.c.zookeeper-12345.internal
- zookeeper-3.europe-north1-c.c.zookeeper-12345.internal
Configure the Zookeeper ensemble
To configure the Zookeeper ensemble, each instance needs a unique ID in a file called myid. On each server, run
cat > /var/lib/zookeeper/myid
and then type in the unique ID number (1 for the first server, 2 for the second, 3 for the third). This creates a file called myid that contains a number and nothing else.
Then add the ensemble configuration to each server’s zoo.cfg. Create one line for each instance in the ensemble in this format:
server.x=fqdn:2888:3888
where:
- x is the instance’s unique ID (1, 2 or 3 for our 3 node ensemble)
- fqdn is the Fully Qualified Domain Name of the server it runs on
- 2888 is the default peer connection port
- 3888 is the default leader election port
So our configuration looks like
server.1=zookeeper-1.europe-north1-a.c.zookeeper-12345.internal:2888:3888
server.2=zookeeper-2.europe-north1-b.c.zookeeper-12345.internal:2888:3888
server.3=zookeeper-3.europe-north1-c.c.zookeeper-12345.internal:2888:3888
Copy this block to each server’s zoo.cfg
Try it!
With the new configuration in place, start the first server with the same command as before:
/usr/local/zookeeper/bin/zkServer.sh start
and again tail the logs:
tail -n 100 -f /usr/local/zookeeper/logs/zookeeper-$USER-server-zookeeper-1.out
You’ll see that the server fails to start:
021-10-13 16:43:09,550 [myid:1] - WARN [QuorumConnectionThread-[myid=1]-3:QuorumCnxManager@400] - Cannot open channel to 2 at election address zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
This is expected. The Zookeeper ensemble will not start until it reaches quorum (2 out of 3 servers) so this server will wait until one of the other servers starts. When you run the start command on the second, the error clears on the first instance and the ensemble is started.
021-10-13 16:44:54,491 [myid:1] - INFO [LeaderConnector-zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/1
0.166.0.3:2888:Learner$LeaderConnector@370] - Successfully connected to leader, using address: zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:2888
2021-10-13 16:44:54,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumP
eer@864] - Peer state changed: following - synchronization
Run the start command on the third server to get a fully healthy three node Zookeeper ensemble.
Verify Zookeeper ensemble status
Unfortunately, there’s no single command to list all nodes in a Zookeeper ensemble. The best way is to use one of the Zookeeper four letter words to query the status of each server.
First, we need to whitelist the stat
four letter word. Add this line to the zoo.cfg and restart each Zookeeper instance:
4lw.commands.whitelist=stat
Then send the stat
four letter word to each server in the ensemble. Each server will return its status including mode: follower or leader. We expect one leader and all other servers to be followers.
The commands we’ll run are:
echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
Here’s a healthy Zookeeper ensemble:
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/10.166.0.3:53434[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/1.4286/6
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/10.166.0.3:37170[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 1/2.0/4
Received: 10
Sent: 9
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: leader
Node count: 5
Proposal sizes last/min/max: 48/48/48
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/10.166.0.3:43686[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0.0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5
Here’s an ensemble where one server is offline. Zookeeper is quorate with two out of three servers so this ensemble is still functioning:
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/10.166.0.3:53468[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/1.4286/6
Received: 12
Sent: 11
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/10.166.0.3:37204[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 1/2.0/4
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: leader
Node count: 5
Proposal sizes last/min/max: 48/48/48
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused
Finally, here is a non-quorate ensemble. Two servers are offline and so the remaining server reports that it cannot serve requests:
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
zookeeper-1.europe-north1-a.c.zookeeper-12345.internal [10.166.0.2] 2181 (?) : Connection refused
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
This ZooKeeper instance is not currently serving requests
stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused
[…] « Building a Zookeeper ensemble Testing Spring reactive WebClient » […]