Tuesday, January 30, 2007

Deterministic destruction of Objects in Java - An idea

I love programming in C++, despite the hues and cries against it as a "most programmer unfriendly" language. I have my own reasons to like C++. Top two reasons being:
  • Bjarne was born in 1950, it was the year when my father was born.
  • Initial C++ (C With Classes) was released in 1979, it was the year when I was born. :-)
Nevertheless, there is one aspect of C++ that I find very useful and will definitely help prevent resource leaks. It is the destructor of an object. In simple terms, whenever we exit a lexical block (a block enclosed within "{" and "}"), all the objects constructed within that block are guaranteed to be destructed. The guarantee is true even when the exit happens due to thrown exceptions.

This guarantee is a very strong weapon for any programmer. It makes destruction of an object a deterministic phenomena. Objects created in heap using new key word require the user to explicitly destroy them, and thats not what I am discussing here.

One of the good programming practice is to acquire and release any resource within the same lexical scope, as much as possible. (I hear you yelling "Its not always possible, fella" and I agree with you!)

If we look at the object life cycle models provided by C++ and Java, C++ implicitly supports the above mentioned practice. But in Java, every user of the object must remember to invoke the destructor's equivalent (freeResources, resetAll, etc). finalize method in Java is altogether for a different purpose, and the language inventors themselves strongly discourage using and relying upon finalize method.

I was chewing this idea for sometime and this is what I feel.
Garbage collection and object destruction are two different things. Whenever a lexical scope exit happens, the JVM has the list of objects within that scope. JVM can decide if an object still has any references to it, or it is safe to be destructed. Thus destructed object can wait in heap till GC occurs.
My thought is not too deep to be incorporated into a production release! But it certainly is worthy enough to be considered.

Is there a rationale behind not supporting deterministic object destruction in Java?

Monday, January 29, 2007

Notes on ObjectOutputStream.writeObject()

If you write the same object twice into the ObjectOutputStream using writeObject() method, typically you would expect that the size of the stream should increase approximately by the size of the object (and all the fields within that recursively). But it wouldn't happen so.

It is very critical to understand how writeObject() method works. It writes an object only once into a stream. The next time when the same object is written, it just notes down the fact that the object is already available in the same stream.

Let us take an example. We want to write 1000 student records into an ObjectOutputStream. We create only one record object, and plan to reuse the same record within a loop so that we save time on object creation. We will use setter methods to update the same object with next student's details. If we use writeObject() to carry out this task, changes made to all but the first student's records will be lost. (Go ahead and try the program given below)

To achieve the objective stated above, you must use writeUnshared() method call. (Change the writeObject() method to writeUnshared() method and convince yourself)

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.IOException;
import java.io.Serializable;

class StudentRecord implements Serializable {
public String name;
public String major;
}

public class ObjectStreamTest {
public static void main(String[] argv) throws IOException, java.lang.ClassNotFoundException {
// Open the Object stream.
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("objectfile.bin"));

// Create the record that will be reused.
StudentRecord rec = new StudentRecord();

// Write the records.
rec.name = "John"; rec.major = "Maths";
oos.writeObject(rec);
rec.name = "Ben"; rec.major = "Arts";
oos.writeObject(rec);
oos.close();

// Read the objects back to reconstruct them.
ObjectInputStream ois = new ObjectInputStream(new FileInputStream("objectfile.bin"));
rec = (StudentRecord)ois.readObject();
System.out.println("name: " + rec.name + ", major: " + rec.major);
rec = (StudentRecord)ois.readObject();
System.out.println("name: " + rec.name + ", major: " + rec.major);
ois.close();
}
}

Thursday, January 25, 2007

Software Development Best Practices Conference 2007

Last week Friday I attended the Software Development Best Practices Conference 2007. It was an eventful day. There were two presentations which made me feel that I got much more in return than what I paid for. They are "Better Software - No matter what" by Dr. Scott Meyers and "Securing Software Design and Architecture: Uncut and Uncensored" by Dr. Herbert Thompson. In the photo, I am seen with Dr. Scott Meyers. (Thanks to Abhishek Pandey from Intuit for the photo)

You can see the presentation slides of Dr. Scott Meyers in the SD Expo web site.

Other sponsored speakers discussed more about their companies and the products that they were advertising, which is quite understandable.

Dr. Thompson's speech was lively and full of information. He shared three incidents that happened in the past that drove him mad to believe that "bugs are everywhere" and security is the most critical aspect of any product. Of the three incidents, I loved the Bahamian Adventure of Soda Machines! A couple of his best books can be viewed here.

Bottom line: I am deeply convinced that one can break any software.

Monday, January 15, 2007

Bjarne's Interview in MIT Technology Review

As refreshing and thought provoking as ever. I would strongly urge you to read Bjarne's interview (Part 1) completely. Of all the answers, I like the following one in particular.
Expressing dislike of something you don't know is usually known as prejudice. Also, complainers are always louder and more certain than proponents--reasonable people acknowledge flaws.

I believe this answer is true for our personal life too
, just as much it is true in the context as it is presented here. The second part of the interview is available here.

Friday, January 12, 2007

What is robots.txt?

Before I explain why and what of robots.txt file, let me give you an incident that beat us off board sometime back. (For security reasons, I have excluded the names.)

We used to serve real-time/delayed quotes to our customers. One of the customers wanted to provide searching functionality for their web site. Hence they bought and indexing/searching application which had a crawler at its core. The customer had a list of Top 10 Active stocks and their respective delayed quotes in their landing page. Some stocks, would have rapid fluctuations in their prices. The crawler (at that time what we called a "stupid crawler," without knowing robots.txt file) started indexing the landing page as rapidly as the values change. For each request, the customer's application server started sending ten quote requests to us. Lucky we! We had a very robust infrastructure that our server didn't come down. But at the end of two days we had thousands and thousands of quote requests, which surged our service graph to an unprecedented level!

Well that was the story. We figured the issue by looking at the logs and informed the customer about the issue. Last I heard, they had disabled the search facility.

Thats where the robots.txt comes into picture to save us.
robots.txt a rules file that every crawler reads before it crawls a particular site.
It has the list of directories whose contents will dynamically change and hence must not be indexed by the crawler. It also facilitates crawler specific rules. For e.g. If www.mywebsite.com/robots.txt is present as follows, then Google wouldn't index www.mywebsite.com.
User-agent: googlebot
Disallow: /
You can read much more about robots.txt here.

Monday, January 08, 2007

Learning a system and the use of profiler

Here is a question: When you are given a huge system with source code and asked to learn the system, where will you start? Think for a moment and answer.

My answer goes like this:
  • Run the system through a debugger that would give you a fair idea about the system (where to start, what are all the functions called, etc.)
  • Run the system under truss or strace (or whichever tool is applicable to your platform), which will give you a very good idea of what are all the resources the system is using. (INI files, resource files, etc)
  • Observe what functions are called while different functionalities are accessed in the system (what happens in my server after I click the "Submit" button, what is the function invocation sequence when I login, etc.)
If you venture into studying the system brute force by going through the source code at random points, you might waste time at unnecessary places. The activities mentioned above should help you at least which piece of source code you should look at. Needless to say, the items above are just the beginnings and you should eventually go through the source code to understand the full system.

Let us take the case of a Java system. How would you find out the sequence of function calls when some functionality is invoked? Running your program under a debugger is a lot more painful. So let that not be your first resource, if not the last. We really need a digest of the function calls, rather than we stepping through the execution.

I came across this excellent tutorial by Andrew Wilcox on building our own call profiler. I believe that this is an excellent starting point. You can find more about how to write your own profiler here. Remember that you will have to slightly modify the source code present in the tutorial to make it log all the function calls, rather than just the function being profiled. That tutorial lists a decent set of references too. You might find most of them very useful, particularly the one on the JNI (Java Native Interface).

You are most welcome to share your thoughts on the ideas presented. I would be more than glad to hear them.

Friday, January 05, 2007

System.identityHashCode() - What is it?

Today I learnt about a function called System.identityHashCode(). To understand where it is used, let us consider the following program.
//
// What will be the output of toString() if we override hashCode() function?
//
public class HashCodeTest {

public int hashCode() { return 0xDEADBEEF; }

public static void main(String[] argv) {
HashCodeTest o1 = new HashCodeTest();
HashCodeTest o2 = new HashCodeTest();

System.out.println("Using default toString():");
System.out.println("First: " + o1);
System.out.println("Second: " + o2);

System.out.println("Using System.identityHashCode():");
System.out.println("First: " + System.identityHashCode(o1));
System.out.println("Second: " + System.identityHashCode(o2));
}
}
This program overrides the function hashCode() which is perfectly legal. As a result of this, you cannot find out the real identity of the object as it would be printed in the default toString() method. The output turns out to be:
Using default toString():
First: HashCodeTest@deadbeef
Second: HashCodeTest@deadbeef
Using System.identityHashCode():
First: 27553328
Second: 4072869

Sometimes you might with to print the identity along with your own message, when you override toString() method. In such instances identityHashCode() function comes handy. If you look at the second part of the program output, the identity hash code for both the objects are unique.