## Posts

Showing posts from 2008

### Monitors in Java

Monitors in Java are in reality mutex + conditional variables. Silly note, but I think understanding this is essential.

A theoretical background of monitors could be found in the wiki.

### Strace and defunct processes

We have a web application running in Tomcat. Today I ran into an issue that the web application started throwing some errors during start up. I wanted to figure what the application reads during the start up. So I started the whole Tomcat under strace, and started to analyze the strace output. Once I figured what was happening, I tried to stop strace with Ctrl-C, but it won't stop. So, I killed it with "kill -9". The strace is dead, but the Java process that was being traced went into a defunct state. This is what I was seeing when I did a "ps aux":
9929 1932 0.0 0.0 0 0 pts/0 Z 16:53 0:03 [java <defunct>]Hmm ... Not good! I tried to kill this process with "kill -9", but it won't work. If I try to start another instance of Tomcat, it complains that the port 8989, where it was supposed to listen, was already in use. But if I try to do a "netstat -pan", I couldn't find any process listening in that port eithe…

### You cannot do SMP in VMWare Player

VMWare player doesn't support SMP, though you might see a processor which is capable of SMP. Here is a quote from the VMWarePlayer manual:
Virtual SMP
VMware Player does not support Virtual SMP™. You cannot use VMware
virtual machine that has more than one virtual processor assigned.
For e.g. when I look at /proc/cpuinfo, I am seeing only one processor though my VMWare Player emulates "Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz". When I inspect the boot messages, I realized that my Linux switch to uniprocessor configuration while booting:
[ 7.135461] SMP alternatives: switching to UP code
[ 7.135523] Freeing SMP alternatives: 11k freed

So if you are trying to test any multithreaded appplication, beware that the results you will get in your VMWare Player's Linux and the real Linux will not be the same. Your VMWare Player's version is likely to give you worse results as its running in uniprocessor mode.

### Global Interpreter Lock in Python

Recently we were having an interesting discussion in our team about the threading and performance in Python. One of the key concerns was the Global Interpreter Lock (GIL) in Python. GIL is acquired before any thread accesses any of the Python objects. So if you are writing any C extensions and if you would like to manipulate any Python object, you must first acquire the lock and then only you can read/write/modify any object. Due to this, you will always have considerable of your code sequential.

There is a lot of discussion about GIL on the web. You will particularly find this page useful.

Someone suggested that Lua, by design, doesn't have any such constraints. It performs very well when it comes to multithreading. As I don't have much experience in programming in Lua, I don't know if that claim is true or not.

I am also not sure if Perl has any such global locks issue. I tried to Google it, but didn't find anything useful.

### A useful tool to print the signals status

I have written a useful tool that will help in analyzing the signal disposition for a given process. It will take the list of PIDs as input and will print the signal disposition of that process in a readable form. For e.g.
5877: SigQ: 0/4096
5877: SigIgn: SIGHUP, SIGUSR2, SIGWINCH, SIGIO (mask 0000000030001002)

The first column is the PID and the second column is the disposition and the the third column is the list of signals in a readable form.

By default, the script doesn't print the real-time signals. To force it, you can specify the -r option in the command line. Sometimes there are signals for which there is no name available (for e.g. Signal 0). If such is the case, it will simply say 'Signal X'. There is no option to print the signal status of all the threads under the given process ID…

### Identifying threads under a given process

This is quick note. The method to identify all the threads under a given process ID differs significantly between Linux kernel versions 2.4.x and 2.6.x.

In case of 2.4.x kernel, you should look for process IDs that are prefixed with dot (".") under the /proc folder. All these PIDs correspond to threads. If you open the /proc/.X/status file , the number appearing against "Tgid:" is the PID of the process that this thread corresponds to. So there is no direct way to tell the list of all the threads undre a given PID. The only way you can get that list is to iterate through all the directories that begin with dot under /proc, and check the "Tgid:" to be matching the PID in question.

In case of 2.6.x kernel, all you need to do is just to get the list of all the directories under /proc/PID/task. Each one of these IDs will correspond to one thread of the process. Also you will exactly know how many threads are there by checking the "Threads:" value in …

### What is 3G anyway?

I must admit that I didn't know the difference between 2G and 3G wireless technologies until recently. Whenever a discussion comes up among our friends about 3G, everyone seemed to be offering some opinion but me. So I ventured into understanding what is 3G. I know that I don't need to delve deeper to debate about 3G, but to understand it from a layman's perspective. One of the pages that I found was helpful was RadioShack's comparison page that tabulates the difference between different generations. That was a good starting point, from where I went to Wiki to learn more about each generations. I found the following pages to be helpful:

### Difference between for (<STDIN>) and while(<STDIN>) in Perl

Can you tell if there is a difference between "for (<STDIN>) { BLOCK}" and "while(<STDIN>) {BLOCK}" in Perl? They both might look like similar. When I used the first construct and gave a huge file as input, my Perl program crashed. When checked the top output after starting the program, I figured that the RSS size of the program kept growing and eventually crashed. Hmm ... when I switched to the second construct, not only the program ran successfully but also used less memory!

The difference is that for construct reads the entire file before it enters into the loop. But the while construct reads only a part of the file and buffers it. I wrote a program to count the number of lines in an input file in both the flavors. while loop version took only 1MB where as for loop version took 149 MB for an input file of 79 MB. On top of it, the while loop version took on an average 1.36 seconds, and the for loop version took around 2 seconds.

So if you are writing a…

### Virtual IP and timeouts

Let us say you are writing a client application that connects to a server and maintains a persistent connection. Whenever there is some user intervention, for e.g. typing some input, the client application sends the user input to the server. And prints back the output produced by the server. (Sounds like telnet?)

One thing your client program doesn't know about is that the server it is connecting to makes use of a virtual IP! Consider the case when your client program initiates a connection and the connection is in ESTABLISHED state. Right in the middle of the connection, the server drops the virtual IP address and for some strange reason it never gets reassigned to another host. In that case, neither the client program nor the server program will know the event of removal of the IP address. The server application would still be listening in the removed IP address and would still be holding the ESTABLISHED connections through removed IP address. Likewise, the client would still be …

### Gmail over HTTPS

I had always been wondering what is the problem with these email providers, that they don't let me view my emails over HTTPS. One of the main reasons I could think of was the increased load on the web servers where the HTTP connection terminates. They will have to spend extra CPU cycles to encrypt/decrypt the incoming and outgoing data. Similar load will be incurred on the user's machine as well.

But Google provides mail over HTTPS. And that is seriously a good news. To access your email over HTTPS, instead of typing www.gmail.com, type "https://mail.google.com".

Oh BTW, it is only the traffic comes from Google mail server to your desktop that is encrypted. Still the traffic between the mail servers go in unencrypted form. For e.g., if you are sending an email from your Google mail account to some account in Yahoo!, the mail will be sent in unencrypted form from Google server to the Yahoo! mail server.

Added on 07/29/2008: A friend of mine working in Google told me even…

### Setting the PuTTY window title from command line

I sometimes want to set the title of my PuTTY windows, like "Editor", "Compiler", etc. to identify distinct windows. I found the following script very useful. You can add that to your ~/.bash_profile. Once you login, you can set the title to whatever you want:

function wtitle {
if [ "$TERM" == "xterm" ] ; then # Remove the old title string in the PS1, if one is already set. PS1=echo$PS1 | sed -r 's/^\\\\$.+\\\\$//g'
export PS1="$\033]0;1 - \u@\h:\w\007$$PS1" else echo "You are not working in xterm. I cannot set the title." fi } The above function will make the window title to be whatever argument you give followed by the usual user@host:workingdirectory. I think this should work with any xterm client. Not just PuTTY. (I haven't tested with any other xterm client.) For e.g. to set the window title to be Editor, you would give th… ### Changing color schemes in vim You can change the color scheme of vim by the following command: :color The list of available color schemes could be found under /usr/share/vim/vim63/colors/ (If you are using a different version of vim then vim63 might be different for you). If you see file called blue.vim under this directory, to make use of that color scheme, you should give: :color blue If you would like to make the change permanent, add this line to your ~/.vimrc file. ### Configuring the core file name pattern in Linux First of all, you must make sure that you have set proper ulimit in your shell. You can check this by giving "ulimit -a" command. If the core file size is set to be 0, you can make it unlimited by giving "ulimit -c unlimited". Refer to your shell's man page to know how to set this. The child always inherits the ulimit from its parent process. You can configure your system such that when an application dumps core the name of the core file has some meaningful name instead of just the bare word core. There are two files you should modify under /proc/sys/kernel configure this. /proc/sys/kernel/core_pattern - This contains pattern of the core file name. The following patterns are allowed: %% output one '%' %p pid %u uid %g gid %s signal number %t UNIX time of dump %h hostname %e executable filename /proc/sys/kernel/core_uses_pid - If this file contains a non-zero val… ### Finding native-endian in Java I was getting curious about if its possible to write a Java program that finds if the underlying native platform is little- or big-endian. I guess its not possible to write such a program without having part of the code in C/C++ and using JNI. If anyone reading this blog feels otherwise, please let me know. There is an API available from NIO to find out the native-endian. I think this API should use some native code underneath. ### Serious security issue with IRCTC website The login form of the IRCTC web site is being submitted over HTTP in plain text. This is a very serious issue since both your user ID and password could be sniffed by someone. One thing that I observed was that they have a HTTPS server running and this server is capable of receiving login requests. I think it was a bug in the code that the developer gave the URL as HTTP instead of HTTPS. How to overcome the issue? This is not a clean approach but works fine. You can copy and paste the URL "https://www.irctc.co.in/cgi-bin/bv60.dll/irctc/services/login.do?userName=XXX&password=YYY" in your browser. Replace the XXX with your user ID and YYY with your password. I tried to send a feedback about this to the site admin or someone in charge. Pathetic ... I could not find any link/email address in that website to do this. Hopefully someone from IRCTC will read this blog and fix the issue. ### WTH is wrong with Tata Indicom Tata Indicom has an amazing (?!) web site to manage all your accounts with them online. I don't know WTH is wrong with them, none of the login pages are being submitted over HTTPS. Yes, any n00b running a sniffer can sniff out your password and any other sensitive information you give with a little effort. On top of this, the page was submitted to an IP address, instead of a URL, which was beyond my wild imagination. I had to run a whois query on APNIC server just to confirm if I am talking to one of their servers. I am surprised how on earth Tata Indicom claims to be the number one (or one of the top) telecom service provider in India, if they don't even know the seriousness of their user's identity. ### Good tutorials on Emacs Lisp There are two pages that I would highly recommend to get started with Emacs Lisp. Xah Lee's tutorial on Emacs Lisp.Steve Yegge's Emergency Elisp.Of course you should always have the Emacs Lisp reference manual handy. ### Customizing colors in comint package in Emacs The default foreground of prompt (gdb, shell, etc.) in the comint package in Emacs is blue. If you are working in PuTTY, the default background is black. Imagine blue on black background. Looks nasty! You can customize the color of the prompt by using the comint-highlight-prompt variable. For e.g. I am using yellow foreground in my system. Here is how I have customized it by adding these lines in my ~/.emacs: (copy-face 'default 'comint-highlight-prompt) (set-face-foreground 'comint-highlight-prompt "yellow") In short, the first lines creates a font-face variable by copying the default face and the second line changes the foreground value of that font-face variable to "yellow". Scott A. Kuhl's .emacs file helped me a lot in understanding how to customize this. Thanks to him. ### Time saving tip to connect PuTTY in one click You can create a short cut in your Quick Launch folder that looks like: \path\to\putty.exe -load "session name" Session name could be obtained from your PuTTY dialog box when you launch it normally. This saves me a lot of time, since most of the time I connect to the same host. There are other interesting questions available under PuTTY FAQ. ### Customizing the colors in ls output This posting is specific to customizing the dircolors in bash shell in Linux. It may or may not be applicable to other shells and other platforms. When the bash shell is started, it executes all the shell scripts under the directory /etc/profile.d/*.sh. You might find other shell scripts under this directory (for e.g. colorls.csh), but those scripts are not to be executed by the bash shell. These files are executed within the current shell's environment, not in a separate execed shell (like a dotshell). One of these files is colorsls.sh. A set of rules in colorls.sh search for the user specific dircolors file. This file could be any one of the ~/.dircolors, ~/.dircolors.$TERM, ~/.dir_colors or ~/.dir_colors.\$TERM. In case the user has more then one of these files, whichever comes last will be taken.

Let us stick to the convention of using ~/.dircolors file for our customization. It is very easy to create this file by giving the following command:
dircolors --sh > ~/.dircolors
Then y…

### If you have time to waste: C++ or Java which one is faster

Okay. This one is just as useless argument as anyone could imagine: C++ or Java - which one is faster? The long discussion could be read from here. The original poster of the thread (Razii) came up with this idea of reading the whole bible verse by verse and sort it and write it back to disk using programs written in both Java & C++. In doing so he will compare the time taken by both Java & C++ programs. The (seemingly wrong) conclusion he came upto was that Java is faster than C++.

Let me give you my gist of it. I copied and pasted the same piece of code (and fixed the issue of including the iterator header file and dividing by 1000 to get the milliseconds) , and ran the same test in my Linux box. Well, almost always the C++ program was considerably faster than the Java program. Many a times my C++ program took only 50% of the time taken by the Java program. My Java version is "Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)" and my g++ version …

### FTP connections over NATed VMWare virtual machine

I was trying to install the FreeBSD 7.0 in my VMWare virtual machine (VM). After I downloaded the installer disk's ISO image, I connected that ISO file as my CDROM drive in my VM and booted my VM. I should mention that my VM is in NAT mode, as I had decided to install the FreeBSD over the network.

When the installation tries to download the distro files from the FTP servers, it just kept saying it could not connect to the FTP server. I wasted around an hour or so without thinking about the basics (Damn!) that I am behind a NAT and I am trying to download files using active FTP mode. Then I tried to connect in passive mode and everything worked fine.

I don't want to repeat what many people have already clearly explained about passive mode of FTP. Here is an excellent tutorial that talks about active and passive modes of FTP.

Moral of the story is, don't forget that active FTP doesn't work (generally :-) if you are behind a NAT.

I would like to add only one thing: I think i…

### Event Completion Framework in Solaris 10

Simiar to kqueue framework in FreeBSD, Solaris 10 introduced a framework called Event Completion Framework (ECF). ECF is a very powerful concept that can be used when an application wants to wait for asynchronous events - like read/write events on sockets. Traditionally one would use poll()/select() for these events. There is enough discussion already on how primitive these mechanisms are and how they don't scale well on large number of file descriptors. Another advantage with kqueue and ECF is that you can wait on different kind of activities, not just activities on file descriptors. For e.g. when a process calls fork, or when a process calls exit, etc.

There are a few resources that would be very useful in understanding these frameworks:
Robert Benson's article on ECFSample program given in Bart Smaalder's blog
Jonathan Lemon's paper on kqueueOh btw, I think Linux is yet to have a mechanism as powerful as these.

### Random notes on proc & process status

proc file system gives a view of the processes, including kernel, running in the Linux system. Writing into some of the files under /proc changes the state of the processes in memory. For e.g. you can change how much maximum shared memory you want the kernel to allow by changing the file /proc/sys/kernel/shmmax. This will take effect without rebooting the system, since it directly modifies the value of that particular kernel variable. But most of these changes are transient. Hence you will lose them when you reboot your system.

You can examine the state of the process using proc file system. Each process running in the system will have an entry under /proc/pid, where pid is the process ID. There are good tutorials that explain about each entry under this directory. You can google for them. But I could not find explanation for some entries, particularly status file. Thats the objective of this blog.

For a single-threaded process, you will find all the information about the state of the …

### Monitoring TCP connection open/close in Ethereal

I found the following filter expression to be useful when I need to monitor only the TCP connection open/close.
(tcp.flags.syn == 1 || tcp.flags.fin == 1 || tcp.flags.reset == 1)

For e.g. when you wish to monitor all the SSL connections opened and closed, the following Filter is good:
(tcp.port == 443) && (tcp.flags.syn == 1 || tcp.flags.fin == 1 || tcp.flags.reset == 1)You can also combine the IP address of the server you are interested in, like:
(tcp.port == 443) && (tcp.flags.syn == 1 || tcp.flags.fin == 1 || tcp.flags.reset == 1) && (ip.addr == X.X.X.X)
Monitoring the connection open/close activity helps in understanding the client behavior. For e.g. generally Firefox and IE both open multiple (I have seen three or four at the max) https connections and reuse the same SSL session ID. These connections are concurrently open and most of them don't get closed from the browser side until the web server closes them.

### SSL session reuse - how to find if supported?

Before I delve deeper, its a good idea to be clear about SSL session reuse. Every time when a client (browser, curl, etc.) connects to a server over SSL, the server creates a session for that connection. This session ID is sent as a part of the Server Hello message. This is to make things efficient, in case the client has any plans of closing the current connection and reconnect in the near future. Most of the servers have a time out for these sessions (I think 24 hours is a common value, unless pressed for space).

When the client connects to the same server again, it can send the same session ID as a part of the Client Hello. The server will first look up if it can find any sessions with that ID. If found, the same session will be reused. Thus the time spent in verifying the certs and negotiating the keys is saved. If the server cannot find a matching session, then it responds with a new session ID and its certificate in Server Hello message. The client knows that it has to verity the…

### Learning Perl - ebooks & resources

I ventured into brushing up my Perl knowledge, as its really a long time since I read a book on Perl. I found the following resources useful:
Beginning Perl by Simon Cozens. I think this is one of the best books to begin Perl programming.
The perldoc website. This is not really book or a cogent order of tutorials, but an excellent reference once you are familiar with the basics.
A collection of resources to learn Perl from the www.freeprogrammingresources.com website.And there is always perldoc, to refer to topics/functions/modules off line.

### Wierd error while setting up SFTP

Recently I had to set up SFTP access in my RedHat Linux box. After uncommenting the "Subsystem sftp ..." in the /etc/ssh/sshd_conf file, I restarted the sshd. When I tried to login, I am prompted for password but immediately I was getting this error message:
Received message too long 1214606444Hmm ... I couldn't make out what could be the reason. Then when I Googled I cam across this page. I had the same issue in my .bashrc file where I had echoed a welcome message :-(. I confirmed that by taking the hex value of the number above (which is 0x48656C6C) which are the ASCII values of 'H', 'e', 'l' and 'l' respectively. The actual message was "Hello, have a great session!".

Moral of the story: Don't have any unnecessary echo messages in your .bash_profile or .bashrc files.