In the first part (previous article) we discussed containers and pods. Let us recall the main points. First of all, we saw how Kubernetes manages a virtual network of devices and routing rules. Then, we understood how one pod running on some cluster communicates with another pod running on some other cluster. This is possible only if the sender knows the receiver’s pod network IP address. For the ones who have not gone through the first part, please do so! Only then, you will understand how pods communicate.
The Pod networking cluster is quite neat and clean. But the problem is, it is not capable enough to create a durable system. The reason is that pods aren’t long-lasting. For example, a pod IP address can be used as an endpoint. But no one can guarantee that the address is permanent. During the time of pod re-creation, the address change. The re-creation occurs due to numerous reasons. So there are high chances of a new address.
A traditional solution to this problem is a reverse-proxy load balancer. How? The client connects to the proxy. It has a list of servers. The proxy forwards the client request to any of the servers in the list. So the proxy is a lifeline in this solution. Obviously, it has to be durable and secure. The proxy should have a big list of available servers and it should be smart enough to recognize the healthy ones. Now, this is another problem! But Kubernetes designers solved it. They came up with a solution that builds on the basic capabilities of the platform. All the requirements are fulfilled. The basis of this solution starts with a resource type. It is known as service. And this article is focused on the service. So let us begin!
It is important to understand the services theoretically as well as practically. If you remember, in the first article of the Kubernetes networking series, we considered a hypothetical cluster. It had two server pods. Can you recall? We discussed how those server pods communicate across the nodes. Taking it a step ahead, I want to discuss it with the help of an example. I will describe how a Kubernetes service initiates load balancing across all the server pods. This enables the client pods to operate as independent and durable pods. First of all, I will create server pods with the help of deployment.
kind: Deployment apiVersion: extensions/v1beta1 metadata: name: service-demo spec: replicas: 2 selector: matchLabels: app: service_demo_pod template: metadata: labels: app: service_demo_pod spec: containers: - name: simple-http image: python:2.7 imagePullPolicy: IfNotPresent command: ["/bin/bash"] args: ["-c", "echo \"<p>Hi from $(hostname)</p>\" > index.html; python -m SimpleHTTPServer 8080"] ports: - name: http containerPort: 8080
If you see the code, the deployment creates two http server pods. They respond on port 8080 with the hostname of the pod. The deployment is created using kubectl apply. You can see the pods running in the cluster. Now let us verify the pod network addresses through a query.
apiVersion: v1 kind: Pod metadata: name: service-demo-client1 spec: restartPolicy: Never containers: - name: demo-client1 image: alpine command: ["/bin/sh"] args: ["-c", "echo 'GET / HTTP/1.1\r\n\r\n' | nc 10.0.2.2 8080"]
The pod is created. Now, the command will run for its completion. For this, the pod will enter as a completed state.
In this code, there is nothing that shows which node the client pod was created on. But interestingly, it will still be able to reach the server pod and get a response back, regardless of where it ran in the cluster. This is possible because of the pod network.
Now, let us get back to our concern. If the server pod died, restarted, or rescheduled to some different node, what will happen to is IP? Obviously, it will change.
kind: Service apiVersion: v1 metadata: name: service-demo spec: selector: app: service_demo_pod ports: - port: 80 targetPort: http
It is a type of Kubernetes resource and causes a proxy to be configured. The service is configured to forward requests to a set of pods. The selector determines the type of traffic received by the set of pods. The traffic matches the labels assigned to the pods during the time of creation. The moment a service is created, it is immediately assigned an IP address. It will also start receiving requests on port 80.
It is possible to send the requests to the service IP directly. But it is better to use a hostname. It resolves to the IP address. To our advantage, Kubernetes offers an internal cluster DNS. This cluster resolves the service name.
apiVersion: v1 kind: Pod metadata: name: service-demo-client2 spec: restartPolicy: Never containers: - name: demo-client2 image: alpine command: ["/bin/sh"] args: ["-c", "echo 'GET / HTTP/1.1\r\n\r\n' | nc service-demo 80"]
This pod completes its execution. Then, the output depicts that the service forwarded the request to one of the server pods.
Now, you can run the client pod and see the output from both server pods. Each of them gets 50% of the requests. Do you wish to genuinely understand the concept of how this actually works? If yes, then I suggest beginning with the IP address that was assigned to our service.
Next, let us switch over the working of the service network.
In the image provided below, the IP that the test service was assigned represents an address. The mentioned address is on a network, but the network isn’t the same. The pods are on. Did you notice it?
If you compare it with the private network, both of them are different. Once again, the nodes are on! The pod network address range is not exposed through kubectl. One needs to use a provider-specific command for retrieving the cluster property. The same is applicable to the service network address range. For a Google Container Engine, the below mentioned can be done.
The network specified by this address space is known as the service network. Yes, we are discussing the service network. The services falling under the category of ClusterIP will be assigned an IP address on this network. Apart from this, there are other types of services as well. We will discuss them in the third part of this article series. For now, understand that ClusterIP is by default. It means that the service is allocated an IP address reachable from any pod in the cluster.
If we compare the pod network and the service network, both of them are virtual. But they differ as well. For example, consider the pod network range as 10.0.0.0/14. Now search for the hosts, the ones that make up the nodes. Obviously, you will search for them in your cluster, listing bridges, and interfaces. Post search, you will see actual devices that have been configured with addresses on this network. What are those addresses? They are the virtual ethernet interfaces for each pod. In addition to this, there are bridges that connect the pods to each other and to the exterior world.
Next, let us focus on the service network 10.3.240.0/20. For your satisfaction, execute “ifconfig” and I guarantee you that you will not find any devices configured with addresses on this network. Additionally, examine the routing rules at the gateway. The one that connects all the nodes. Even there, you won’t find any routes for this network. The service network doesn’t exist at all. Not as the connected interfaces at least. Interestingly, when we issued a request to an IP on this network, the request reached our server pod! It is running on the pod network. So how is this possible? Let us trace the packet and solve this mystery.
We have two nodes, one gateway connecting the nodes, and three pods. The gateway has its routing rules for the pod network. Node 1 has a client pod and a server pod. Node 2 has another server pod. First of all, the client makes an HTTP request to the service using the DNS name service-test. Notice the cluster DNS system, it resolves that name to the service cluster. The IP address is 10.3.241.152. The client pod ends up creating an HTTP request. This results in packets being sent with that IP in the destination field.
One specialty or unique feature of IP networking is packet forwarding during unexpected condition. So when an interface can’t deliver a packet due to the destination, the packet is forwarded to its upstream gateway. The destination issue generally occurs when no local device with the specified address exists. In our example, the virtual ethernet interface is the first interface that encounters the packets inside the client pod. The interface is on the pod network 10.0.0.0/14. It doesn’t know any devices with an address 10.3.241.152. So it will simply forward the packet to its gateway bridge cbr0. Bridges aren’t intellectuals, they just allow anyone to pass.
Can you find the host/node ethernet interface from the image? Go ahead, find it! It is on the network 10.100.0.0/24 and it doesn’t know any devices with the address 10.3.241.152. Due to this, the packet will go to the interface’s gateway. In the image provided, it is the top level router. From here, the packet is redirected to one of the live server pods. The packet is traveling like a wanderer, isn’t it?
The whole process seems magical. Somehow, the client is able to connect to an address without any regards to an interface. Additionally, the packets pop out at the precise destination in the cluster. But this isn’t magic, it is logic. The logic is provided by a software known as Kube-proxy.
Kube-proxy is the magician software that affects the configuration and behavior of several components in the cluster. So as per the name “Kube-proxy”, one can get a basic idea of it. The way a proxy works right? Doing something on behalf of someone else! But there is more to it that makes it quite different from a typical reverse-proxy.
A proxy passes traffic between clients and servers. The client connects inbound to a server and the proxy connects the client outbound to some server. The proxies run in a user space. This means that the packets travel in the user space and come back to the kernel space. All of this, through the proxy itself. The Kube-proxy was initiated as a user-space proxy. But there was a twist. The proxy needs an interface, for listening to the client connections and for connecting to the back-end servers. The interfaces available on a node are the host’s interface and the virtual ethernet interface.
So why not use an address on any of those networks? Actually, it isn’t a good idea. Why? Services need a stable, non-conflicting network space of their own. They also need a system of virtual IP’s. But there aren’t any actual devices on this network! One can use a pretend network in routing rules, etc. But there is no provision of listening on a port or opening a connection through an interface.
So how has Kubernetes found a way? Using a feature known as Netfilter and a userspace interface known as IPTables. Netfilter is a type of packet processing engine. It has its own rules and runs in the Kernel space. Every packet is checked at various points. The packets are verified according to the rules and actions take place according to it. One of the actions is to redirect the packet to some other destination. So do you get it? Netfilter is a type of Kernel space proxy!
Here, Kube-proxy opens a port on the localhost interface for listening to the requests from the test service. The Netfilter rules are inserted for re-routing the packets from the service IP to its own port. The request is then forwarded to a pod on port 8080. So the request from 10.3.241.152:80 magically becomes a request to 10.0.2.2:8080.
There is more to this. Userspace proxying is expensive due to marshaling packets. So in the Kubernetes 1.2, Kube-proxy was equipped to run in the IPTables mode itself. Due to this, the Kube-proxy usually ceases to be a proxy for inter-cluster connections. The tedious work of detecting and redirecting packets is delegated to Netfilter. All this delegation happens in the Kernel space. So the Kube-proxy role is automatically limited to keeping the Netfilter rules in sync.
All of our requirements are fulfilled and it helps in setting up a reliable proxy. But let us go through some checkpoints.
The service proxy system is durable: By default, the Kube-proxy runs as a system unit and will be restarted if it fails. The userspace proxy Kube-proxy represents the connection failure at a single point. So while running in IPTables mode, the system is pretty much durable in terms of local pods attempting connections.
The service proxy is aware of the healthy server pods: Yes, it can handle requests. Kube-proxy listens to the master API Server for any changes in the cluster. It uses IPTables for keeping the Netfilter rules in synchronization. For a new service, Kube-proxy gets the notification and creates rules. The rules are removed if the service is deleted.
Overall, everything we discussed helps in creating a highly-available cluster-wide durable facility. The proxying requests between the pods are dealt with care. But no system is perfect, it has a few downsides. One of them is that it only works as per the rulebook or the description for requests that originate inside the cluster. Another issue is regarding Netfilter. The requests arriving from outside the cluster complicate the origin IP. There isn’t a proper solution for this yet. But developers and contributors will come up with something better, pretty soon! For now, this is the end to the second part of the Kubernetes Networking. Please do share your thoughts regarding this article in the comments. See you soon in the next and the last part of the Kubernetes Networking article series.
Here is a link to the Kubernetes Bible, start learning from the basics now :
And here is the link to the Kubernetes Video Tutorial