Deploying Kafka is easy when compared to the effort required to deploying a complete Hadoop system. Also, there are multiple Ansible and Vagrant based deployment scripts available for Kafka:

None of the above solutions don’t come with EC2 support and creating EC2 VMs and then deploying Kafka on them using Ansible requires manual intervention. Or, it’s possible to automate this using a separate script to create VMs and then generate an Ansible inventory file that can use with one of the above solutions.

But Ansible comes with a nice EC2 module that can use to create EC2 VMs directly within an Ansible playbook and make those VMs available to rest of the playbook. This ansible playbook from https://github.com/milinda/KafkaOnEC2 uses Ansible EC2 module to create VMs and then deploy Zookeeper and Kafka on to those VMs. Most of the time I use EC2 spot instances and I have written this playbook to use spot instances. But you can customise it to use regular EC2 instances by spot_price and spot_wait_timeout configurations from Ansible EC2 task in kafka.yml

You can use group_vars/all to customise the Kafka cluster size, Zookeeper cluster size, instance types, EC2 region and spot instance pricing limits.

ec2:
  key: 2016july-ec2-keypair
  zookeeper_instance_type: m3.large
  kafka_instance_type: r3.large
  image: ami-9abea4fb
  region: us-west-2
  kafka_security_group: kafka
  zookeeper_security_group: zookeeper
  kafka_instance_count: 1
  zookeeper_instance_count: 1
  zk_spot_price: 0.2
  kafka_spot_price: 0.8

Before using this playbook you have to make sure following:

  • You have created an EC2 key pair (I have used 2016july-ec2-keypair key pair) in AWS region you are going to use, and you have access to the private key *.pem file.
  • You have security groups created for Kafka and Zookeeper deployments. I am using kafka and zookeeper security group with all ports open to the public. For production deployments, you may need to use a secure configuration where a particular set of ports are open to the public or your network based on your requirements.
  • You have to monitor spot instance pricing in the selected AWS region and decide proper values to use for zk_spot_price and kafka_spot_price.

How to use the Kafka playbook

You can use the following command to deploy a Kafka cluster on EC2 once you have done with configurations.

$ ansible-playbook --private-key=<AWS_key_file> -u ubuntu kafka.yml

If you are getting an error saying boto Python module is missing (I got this error in Mac OS X El Capitan) please use the ```inventory`` file provided with proper python interpreter location as below.

$ ansible-playbook --private-key=<AWS_key_file> -u ubuntu -i inventory kafka.yml

Please note that I haven’t done extensive testing, and there may be bugs or unsupported scenarios. So please feel free to fork, modify/fix and send pull requests.