Monday, August 25, 2008

Difference between for (<STDIN>) and while(<STDIN>) in Perl

Can you tell if there is a difference between "for (<STDIN>) { BLOCK}" and "while(<STDIN>) {BLOCK}" in Perl? They both might look like similar. When I used the first construct and gave a huge file as input, my Perl program crashed. When checked the top output after starting the program, I figured that the RSS size of the program kept growing and eventually crashed. Hmm ... when I switched to the second construct, not only the program ran successfully but also used less memory!

The difference is that for construct reads the entire file before it enters into the loop. But the while construct reads only a part of the file and buffers it. I wrote a program to count the number of lines in an input file in both the flavors. while loop version took only 1MB where as for loop version took 149 MB for an input file of 79 MB. On top of it, the while loop version took on an average 1.36 seconds, and the for loop version took around 2 seconds.

So if you are writing any program in Perl that reads files line by line, you must stick to the while() construct if you would like your program not to crash on large file inputs and to run faster.

Virtual IP and timeouts

Let us say you are writing a client application that connects to a server and maintains a persistent connection. Whenever there is some user intervention, for e.g. typing some input, the client application sends the user input to the server. And prints back the output produced by the server. (Sounds like telnet?)

One thing your client program doesn't know about is that the server it is connecting to makes use of a virtual IP! Consider the case when your client program initiates a connection and the connection is in ESTABLISHED state. Right in the middle of the connection, the server drops the virtual IP address and for some strange reason it never gets reassigned to another host. In that case, neither the client program nor the server program will know the event of removal of the IP address. The server application would still be listening in the removed IP address and would still be holding the ESTABLISHED connections through removed IP address. Likewise, the client would still be holding an ESTABLISHED connection to the server.

If you observe the way Linux handles writes on sockets, when the client program writes into the socket, the written bytes will be copied to the kernel buffer and the write call will immediately return. The kernel will perform a best effort delievery attempt to deliver the bytes that you have written. Remember that the packets would never be delivered to the server in case of virtual IP removal. Worst part of this is the client doesn't know about this at all.

If you look at the server part, some of the servers have an inactivity timeout after which the server will close the connection. For e.g. Apache. So the server wouldn't keep these open sockets forever.

Hmm ... Unless the client opens another connection to the server, which would timeout, there is no way the client would know that its unable to reach the server.

So the moral of the story is, if you suspect that the server might be using a virtual IP and it might be dropped and never be reassigned to another server, build both the client and the server programs with proper timeouts for read/write/inactivity. Yes, keeping a time out for inactivity is very important.