Some notes on setting up and configuring a Zookeeper ensemble. Many guides use some shortcuts to setting up the ensemble such as running multiple Zookeeper instances on a single machine or running them in Docker. Both require some workarounds to make it work. As a better example, this guide runs three Zookeeper instances across three different VMs.
It’s not quite production ready. There are still some caveats for this development grade setup:
- My servers are Google Compute Engine (GCE) instances on the default network with a public IP. In production, consider using a custom network, internal IP only and firewall rules to restrict access.
- My Zookeeper instances are not secured. Zookeeper supports secured encrypted connections for server-server (leader election) and client-server connections. We’ll look at Zookeeper security in a future post.
- Zookeeper data is written to the Operating System disk. In production, consider attaching a fast dedicated disk for this. You’ll need more than the 7GB or so unused space on the OS disk.
- Servers are GKE e2-small instances with 2 GB memory and 2 vCPUs. On production, a minimum of 4 GB available memory is recommended to avoid swapping to disk.
- I run Zookeeper from command line under the current user account. In production, run it as a service using a dedicated service account.
My starting point is three Debian GNU / Linux 10 (buster) instances, one in each of the three Availability Zones of europe-north1 Region. This is a sensible production configuration as it provides resilience against failure of a zone. If you’re running your ensemble on-premises, run each instance in a different data centre (if possible) to provide resilience against loss of a data centre.
Install and configure a single Zookeeper instance
Starting from a fresh Debian GNU / Linux 10 (buster) instance we need to install Java Runtime Environment (JRE). This is easy enough with apt-get. I’ve also installed wget and netcat utilities as we’ll need them later.
1 2 | sudo apt-get update sudo apt-get install default-jre wget netcat |
Next, download the current Zookeeper binary package (3.6.3 at time of writing), extract it and set the owner to the current user account.
1 2 3 4 5 6 | wget https: //archive .apache.org /dist/zookeeper/stable/apache-zookeeper-3 .6.3-bin. tar .gz tar -xzf apache-zookeeper-3.6.3-bin. tar .gz sudo mv apache-zookeeper-3.6.3-bin /usr/local/zookeeper sudo mkdir /var/lib/zookeeper sudo chown -R $USER: /var/lib/zookeeper rm apache-zookeeper-3.6.3-bin. tar .gz |
Now we need a basic Zookeeper configuration file. We’ll run this instance stand-alone for now so we can start from the provided example:
1 | sudo cp /usr/local/zookeeper/conf/zoo_sample .cfg /usr/local/zookeeper/conf/zoo .cfg |
Edit the zoo.conf and just set the dataDir to the directory we created earlier:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/var/lib/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 |
That’s all we need for a single instance. Start it up:
1 | /usr/local/zookeeper/bin/zkServer .sh start |
and verify it starts by tailing the logs:
1 | tail -n 100 -f /usr/local/zookeeper/logs/zookeeper- $USER-server-zookeeper-1.out |
Start 2 additional servers
We need two additional instances for our ensemble. For high availability we’ll run these instances in different GCE zones. We could create two new Debian GNU / Linux 10 (buster) instances and then repeat the steps above. However, it’s simpler just to clone the first instance.
In Google Compute Engine this is easily done by creating a machine image from the running instance and then creating two new instances from the image. Don’t forget to set the machine name and the region / zone of the new instances.
Once this is done, we now have three identical servers running in three zones of the same region. Assuming my GCE projectId is 12345 and I’m using europe-north1 region then the internal network hostnames of my three servers are:
- zookeeper-1.europe-north1-a.c.zookeeper-12345.internal
- zookeeper-2.europe-north1-b.c.zookeeper-12345.internal
- zookeeper-3.europe-north1-c.c.zookeeper-12345.internal
Configure the Zookeeper ensemble
To configure the Zookeeper ensemble, each instance needs a unique ID in a file called myid. On each server, run
1 | cat > /var/lib/zookeeper/myid |
and then type in the unique ID number (1 for the first server, 2 for the second, 3 for the third). This creates a file called myid that contains a number and nothing else.
Then add the ensemble configuration to each server’s zoo.cfg. Create one line for each instance in the ensemble in this format:
1 | server.x=fqdn:2888:3888 |
where:
- x is the instance’s unique ID (1, 2 or 3 for our 3 node ensemble)
- fqdn is the Fully Qualified Domain Name of the server it runs on
- 2888 is the default peer connection port
- 3888 is the default leader election port
So our configuration looks like
1 2 3 | server.1=zookeeper-1.europe-north1-a.c.zookeeper-12345.internal:2888:3888 server.2=zookeeper-2.europe-north1-b.c.zookeeper-12345.internal:2888:3888 server.3=zookeeper-3.europe-north1-c.c.zookeeper-12345.internal:2888:3888 |
Copy this block to each server’s zoo.cfg
Try it!
With the new configuration in place, start the first server with the same command as before:
1 | /usr/local/zookeeper/bin/zkServer .sh start |
and again tail the logs:
1 | tail -n 100 -f /usr/local/zookeeper/logs/zookeeper-$USER-server-zookeeper-1.out |
You’ll see that the server fails to start:
1 2 3 4 5 6 7 8 9 10 11 12 13 | 021-10-13 16:43:09,550 [myid:1] - WARN [QuorumConnectionThread-[myid=1]-3:QuorumCnxManager@400] - Cannot open channel to 2 at election address zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:3888 java.net.ConnectException: Connection refused (Connection refused) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.base/java.net.Socket.connect(Socket.java:609) at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383) at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) |
This is expected. The Zookeeper ensemble will not start until it reaches quorum (2 out of 3 servers) so this server will wait until one of the other servers starts. When you run the start command on the second, the error clears on the first instance and the ensemble is started.
1 2 3 4 | 021-10-13 16:44:54,491 [myid:1] - INFO [LeaderConnector-zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/1 0.166.0.3:2888:Learner$LeaderConnector@370] - Successfully connected to leader, using address: zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:2888 2021-10-13 16:44:54,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumP eer@864] - Peer state changed: following - synchronization |
Run the start command on the third server to get a fully healthy three node Zookeeper ensemble.
Verify Zookeeper ensemble status
Unfortunately, there’s no single command to list all nodes in a Zookeeper ensemble. The best way is to use one of the Zookeeper four letter words to query the status of each server.
First, we need to whitelist the stat
four letter word. Add this line to the zoo.cfg and restart each Zookeeper instance:
1 | 4lw.commands.whitelist=stat |
Then send the stat
four letter word to each server in the ensemble. Each server will return its status including mode: follower or leader. We expect one leader and all other servers to be followers.
The commands we’ll run are:
1 2 3 | echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181 echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181 echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181 |
Here’s a healthy Zookeeper ensemble:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181 Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04 /08/2021 16:35 GMT Clients: /10 .166.0.3:53434[0](queued=0,recved=1,sent=0) Latency min /avg/max : 0 /1 .4286 /6 Received: 11 Sent: 10 Connections: 1 Outstanding: 0 Zxid: 0x400000006 Mode: follower Node count: 5 stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181 Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04 /08/2021 16:35 GMT Clients: /10 .166.0.3:37170[0](queued=0,recved=1,sent=0) Latency min /avg/max : 1 /2 .0 /4 Received: 10 Sent: 9 Connections: 1 Outstanding: 0 Zxid: 0x400000006 Mode: leader Node count: 5 Proposal sizes last /min/max : 48 /48/48 stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181 Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04 /08/2021 16:35 GMT Clients: /10 .166.0.3:43686[0](queued=0,recved=1,sent=0) Latency min /avg/max : 0 /0 .0 /0 Received: 1 Sent: 0 Connections: 1 Outstanding: 0 Zxid: 0x400000006 Mode: follower Node count: 5 |
Here’s an ensemble where one server is offline. Zookeeper is quorate with two out of three servers so this ensemble is still functioning:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181 Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04 /08/2021 16:35 GMT Clients: /10 .166.0.3:53468[0](queued=0,recved=1,sent=0) Latency min /avg/max : 0 /1 .4286 /6 Received: 12 Sent: 11 Connections: 1 Outstanding: 0 Zxid: 0x400000006 Mode: follower Node count: 5 stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181 Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04 /08/2021 16:35 GMT Clients: /10 .166.0.3:37204[0](queued=0,recved=1,sent=0) Latency min /avg/max : 1 /2 .0 /4 Received: 11 Sent: 10 Connections: 1 Outstanding: 0 Zxid: 0x400000006 Mode: leader Node count: 5 Proposal sizes last /min/max : 48 /48/48 stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181 zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused |
Finally, here is a non-quorate ensemble. Two servers are offline and so the remaining server reports that it cannot serve requests:
1 2 3 4 5 6 | stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181 zookeeper-1.europe-north1-a.c.zookeeper-12345.internal [10.166.0.2] 2181 (?) : Connection refused stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181 This ZooKeeper instance is not currently serving requests stuart_leitch_aka_stevie@zookeeper-2:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181 zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused |
[…] « Building a Zookeeper ensemble Testing Spring reactive WebClient » […]