Thursday, April 30, 2009

A good reference on processor cache

I found this good reference on processor cache while I was looking for some material to refresh my cache knowledge. Hope this is useful for you too.

Friday, April 17, 2009

How to measure time to execute a query when using DBCP+JdbcTemplate

Keep the following things in your mind if you are planning to measure the time taken to execute a query when you make use of Apache DBCP + Spring JdbcTemplate.

You will never be able to find out the time taken to execute a query in the database by measuring the the time it takes to complete any variant of JdbcTemplate.query...() or JdbcTemplate.execute...() . Making use of any of these functions involve the following steps:
  1. Time taken to borrow a connection from the pool
  2. Time taken to create a statement from the connection and bind the parameters if needed
  3. Time taken to send the query to the database
  4. Time taken to execute the query in the database (this is the time you are specifically interested in)
  5. Time taken to receive the result from the database (depending on the fetch size you have set, this will vary). For UPDATEs, only the update count is sent back from the database.
  6. Time taken in the row mapping (if you are making use of RowMapper) or time taken to consume all the rows from ResultSet (if you are making use of ResultSetExtractor)
  7. Time taken to return the connection to the pool
If you are sending the request over a long latency line, the time you are measuring is bound to be off by a few milliseconds due to steps 3 and 5. For e.g. to reach from west coast to east coast in US, it might takes 50 ms or more.

If you are making use of a connection pool that has all the connections borrowed all the time, then the time taken in steps 1 and 7 are significant. For e.g. during my experiments I have seen some threads waiting as long as 100 ms when there are 10 connections and 23 threads using the connection pool.

So the most reliable way of measuring the time spent in the database will involve the following things:
  1. Make sure you understand the average round-trip time between the database and your application. You will have to use this one for your analysis. You can use ping to measure this. If ICMP is not allowed in your network, make use of netcat.
  2. Make use of StatementCallback since it will help you in measuring time taken from steps 2 to 6 above. In other words the measurement will not be skewed by the DBCP borrow and return.
  3. Make use of PreparedStatementCreator variants of query() so that you will know the time when the statement has been created.
  4. If you are making use of RowMapper, the approximate time the query completed is when you start processing the first row. If you are making use of ResultSetExtractor, the time when extractData is called is the approximate query completion time. If you are making use of RowCallbackHandler, make use of ResultSet.isFirst() to decide the approximate query completion time.
  5. There is no way of excluding the time taken in DBCP when making use of batchUpdate().
  6. There is no way of excluding the time taken in returning a connection back to the pool when making use of update() functions.
Hope these guidelines will be useful to you. I followed them when I had to measure the time difference between the database and the host where my application was running. It is very tricky in the presence of the DBCP since borrowing and returning always skews the numbers.

Wednesday, April 01, 2009

Is my DBCP configuration bad or my database slow?

Let us say you are making use of DBCP in your application. You are running a load test on your application and you find that the response time is terrible. You suspect that it is the database that is messing up all the performance. Well, as an application developer that would be my first response! But how do I prove that it is the database that is causing the issues, and not my application. The first thing you should be checking is the number of load testing client threads versus the number of connections in your DBCP. If you have more number of client threads than the number of connections in your DBCP, there is a likelihood of your application being the root cause for the bad response.

The easiest way to confirm this one is by using the maxWait parameter of DBCP. You just set the maxWait to a value that you expect the average waiting time to be. For e.g. in my case I never expect the wait time to be more than 30 milliseconds. Once you set this watch for errors in your application logs. You will find that a lot of the following exceptions:
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object

This simple yet powerful experiment will help you identify where the long running transactions are spending most of the time, waiting in the queue for getting a connection or executing in the query in the database.

On a side note, I would like to point out that the queuing policy of DBCP is unfair. Meaning, there is no guarantee that the first request will get a connetion from the pool first. This may be because of the fact that the GenericObjectPool might rely upon the Object.notifyAll() to notify the waiting threads.

SublimeText 3/Anaconda error

When I installed Anaconda manually by downloading and untarring the file (as given in the manual installation instructions here ), I got th...