Deploying Kafka is easy when compared to the effort required to deploying a complete Hadoop system. Also, there are multiple Ansible and Vagrant based deployment scripts available for Kafka:
- Ansible Playbook - Setup Kafka Cluster
None of the above solutions don’t come with EC2 support and creating EC2 VMs and then deploying Kafka on them using Ansible requires manual intervention. Or, it’s possible to automate this using a separate script to create VMs and then generate an Ansible inventory file that can use with one of the above solutions.
But Ansible comes with a nice EC2 module that can use to create EC2 VMs directly within an Ansible playbook and make those VMs available to rest of the playbook. This ansible playbook from https://github.com/milinda/KafkaOnEC2 uses Ansible EC2 module to create VMs and then deploy Zookeeper and Kafka on to those VMs. Most of the time I use EC2 spot instances and I have written this playbook to use spot instances. But you can customise it to use regular EC2 instances by
spot_wait_timeout configurations from Ansible EC2 task in
You can use
group_vars/all to customise the Kafka cluster size, Zookeeper cluster size, instance types, EC2 region and spot instance pricing limits.
ec2: key: 2016july-ec2-keypair zookeeper_instance_type: m3.large kafka_instance_type: r3.large image: ami-9abea4fb region: us-west-2 kafka_security_group: kafka zookeeper_security_group: zookeeper kafka_instance_count: 1 zookeeper_instance_count: 1 zk_spot_price: 0.2 kafka_spot_price: 0.8
Before using this playbook you have to make sure following:
- You have created an EC2 key pair (I have used 2016july-ec2-keypair key pair) in AWS region you are going to use, and you have access to the private key *.pem file.
- You have security groups created for Kafka and Zookeeper deployments. I am using kafka and zookeeper security group with all ports open to the public. For production deployments, you may need to use a secure configuration where a particular set of ports are open to the public or your network based on your requirements.
- You have to monitor spot instance pricing in the selected AWS region and decide proper values to use for
How to use the Kafka playbook
You can use the following command to deploy a Kafka cluster on EC2 once you have done with configurations.
$ ansible-playbook --private-key=<AWS_key_file> -u ubuntu kafka.yml
If you are getting an error saying
boto Python module is missing (I got this error in Mac OS X El Capitan) please use the ```inventory`` file provided with proper python interpreter location as below.
$ ansible-playbook --private-key=<AWS_key_file> -u ubuntu -i inventory kafka.yml
Please note that I haven’t done extensive testing, and there may be bugs or unsupported scenarios. So please feel free to fork, modify/fix and send pull requests.