|« Getting up and running with Sentinel RMS, C#, .NET, Windows 7 and avoiding BadImageFormatException||Useful B2Evolution SQL »|
Connection balancing across NLB using IIS and MaxKeepAliveRequestsSeptember 21st, 2011
We have written a video transcoding application which sits under a RESTful front end provided by IIS. The transcoding application is CPU bound, that is, the CPU is the first place to bottleneck and prevent the computer from doing more work. The heavy CPU is caused by video transcoding. This involves reading a unit of video from a video server, converting it to another format and squirting it out to a client. Transcoding video is a pipeline process which means there are huge performance advantages in processing a series of consecutive video units in a read-ahead fashion.
A normal web server could handle 2 or 3 orders of magnitude more requests than ours. As a result we found that it was more difficult to load balance across an NLB cluster because the number of new incoming connections was relatively small.
The application suite has been designed to be stateless in order to allow it to fit into a cluster architecture. We want to be able to scale outward more easily so in order to support more clients we can just add more boxes.
Our experiments have shown that 1 PC can support about 10 simultaneous clients before the system’s performance degrades to unusable levels. For each new PC we add to the cluster, we can get another 8-10 clients.
We would like to keep each client talking to the same cluster node for a short period so that we can get the benefit of pipe-lining requests, while at the same time we need to make sure that clients can move between cluster nodes in order to keep the load evenly balanced across the cluster.
Under IIS, HTTP KeepAlive allows a client to connect once, then make as many requests down the connection pipe as it likes before the client closes the connection. The server will hang on to each client until they go away. If KeepAlive is switched off the connection will be closed at the end of each request which may add significant overheads to dealing with clients that a geologically distant. HTTP KeepAlive works on layer 5 of the OSI model.
NLB has a similar option called Affinity. The Affinity can be either sticky or non-sticky (there are other states but for the purposes of this article they can all be condensed into these two). Stickiness ensures that the same client is always directed to the same cluster node. NLB works on layer 4 of the OSI model.
The simplest solution is to switch NLB Affinity to non-sticky and set HTTP KeepAlive to false. Each incoming request that arrives at the cluster will be directed to a choice of machines, make its request, get the data and then tear down everything and start again for the next request. With this set up we will not be able to take any advantage of the pipe-lining efficiency that could be had and as a result the platform will be able to support fewer clients overall.
Each one of these technologies has advantages and disadvantages. The advantage of using stickiness with NLB is that you can ensure that all requests for a client, for the lifetime of the client or that cluster node will be directed to the same place. That will be good for pipe-lining but bad for load balancing. The advantages and disadvantages for HTTP KeepAlive are similar except here you are at the mercy of what the client decides to do.
In experiments we have shown that if one of the nodes in the cluster goes down the NLB will notice and rebalance; diverting incoming traffic to another node in the cluster. The HTTP KeepAlive clients will simply reconnect to the next allocated node in the cluster and stay there for the rest of their lives. This means that when a downed node comes back up, it balances with the rest of the cluster to make sure the request distribution is correct. NLB will not sever existing connections so all the existing clients will stay where they are. Only new incoming connections will be allocated to the newly added cluster node. So what we find is that after a cluster node failure the rest of the nodes take up the slack and end up working extra hard, but when the failed node re-enters the cluster it sits there doing nothing.
If you were dealing with thousands of small requests it would be a different story; it probably wouldn’t matter so much because new clients are coming and going all the time.
What we need is a combination of KeepAlive and not KeepAlive on a non-sticky platform. Apache has a configuration option called MaxKeepAliveRequests. This option severs the connection to the client after this many requests (the default is 100). With this option we can have 100 consecutive requests over the same connection to enjoy the benefits of pipe-lining the requests and yet we are giving the system/platform a chance to balance itself on a regular basis.
IIS has no concept of limiting the number of requests a connection can service, which probably goes some way to explaining why IIS only has 15.73% of the web server market. I posted a question on ServerFault but didn’t get a satisfactory response. The one reply I did get was from some one saying that if my application was truly stateless I needed to switch off KeepAlive altogether and take the penalty for the re-connection. While the application is stateless there are advantages to be had from batching requests together. An answer of it can’t be done or is not supported is, in my opinion not an answer. What they actually mean is that it is not supported yet. In I.T. almost everything *is* possible as long as you know what to do.
IIS7 has a new pipeline module architecture that allows you to inject code into the processing of a request at any one of about 12 different stages. The run line passes through each module at each requested stage in order to modify the request’s response.
When the module is loaded in, it reads the MaxKeepAliveRequests number from the
web.config. For each request that comes in the module will remember the remote host, remote port and how many requests have been serviced by that combination. When the request is in its final stage we’ll check to see if the number of serviced requests is bigger than MaxKeepAliveRequests. If it is then we can inject a Connection: close into the response. This will make its way through IIS, safely closing the connection on it’s way out.
Surprisingly there was a great deal of confusion on MSDN documentation, blogs and forums surrounding how to force a close after a request. I found that
HttpResponse.Close() can chop the end off the reply,
HttpApplication.CompleteRequest() didn’t work because the request’s run line was already inside the
EndRequest section of the pipeline. So I went back to the specification and in RFC2616: Section 8 - Connections it talks about injecting Connection: close into the response header so that after the response is sent out the server closes the connection. The closure forces the client to reconnect. I tried this using a telnet client (and not a web browser) and can reveal that it is the server that closes the connection and not the client deciding.
I had thought about using the Session to store the request count but I didn’t think it would help. If a proxy server is talking to your cluster then it may be interleaving requests from several sources with different session identifiers. We are interested in the transport layer, and not the session layer. We must use values from the transport layer to differentiate the clients in order to spread the load.
Simply compile up this C# and add it to your IIS integrated process pipe line.
You’ll need to add the configuration option to the