Exposing Kafka on two different networks
Jul 22, 2017
- Categories
- Infrastructure
- Tags
- Cyber Security
- VLAN
- Kafka
- Cloudera
- CDH
- Network
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system wich functions like a publish/subscribe distributed messaging. It is designed for high throughput with built-in partitioning, replication, and fault tolerance.
This article was implemented using CDH 5.7.1 with Kafka 2.0.1.5 installed using parcels.
One of the clusters we are working on has the following network configuration:
- A “data” network exposing our edge, Kafka and master nodes to the outside world
- An “internal” network dedicated to the cluster for our worker nodes
We use Kafka for data ingestion and also to send processed data to another system exposing UIs for the analysts so we have:
- A Spark Streaming job consuming Kafka topics from YARN (our “internal” network)
- The other system’s app consuming Kafka topics from the outside (our “data” network)
Thus, Kafka must be available on two different networks. To do so, the following configuration must be applied on each Kafka broker in the kafka.properties safety valve
input and the Kafka nodes must share the same hostname on both networks:
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://<hostname>:9092
That’s it!
NB: Kafka is listening on every interface instead of just the one you need. Supposedly, Kafka accepts the following configuration to set specific IP addresses:
listeners=PLAINTEXT://<ip1>:9092,PLAINTEXT://<ip2>:9092
advertised.listeners=PLAINTEXT://<hostname>:9092
however, it will throw this exception on startup:
java.lang.IllegalArgumentException: requirement failed: Each listener must have a different port
at scala.Predef$.require(Predef.scala:219)
at kafka.server.KafkaConfig.validateUniquePortAndProtocol(KafkaConfig.scala:905)
at kafka.server.KafkaConfig.getListeners(KafkaConfig.scala:913)
at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:866)
at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:698)
at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:695)
at kafka.server.KafkaServerStartable$.fromProps(KafkaServerStartable.scala:28)
at kafka.Kafka$.main(Kafka.scala:58)
at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
and a variation of ”Each listener must have a different protocol” when changing the ports.