Saturday, June 27, 2009

Adding META tag to your blogger page layout

When I wanted to submit my blog for Google AdSense, I had to first confirm the ownership to my blog. It can be proved in two ways: either by adding a META tag in the HEAD section or by adding a HTML file in the site. Since I have my blog in the, the easiest way for me to confirm my ownership is to add a META tag in my blog layout.

It is very easy to add this tag. Follow these steps.
  1. Login to your blogger account
  2. Click on Layout and then Edit HTML
  3. Search for the following phrase: all-head-content. This is the line that tells the blogger engine as to what needs to be included in the HEAD tag.
  4. Just before this line, not just before all-head-content, but before the line, include your meta tag. Most likely you will be including a META tag that looks like:
    <meta name='verify-v1' content="Some base64 encoded string"/>
  5. Save your changes and view your blog and confirm that your new META tag appears on the header.
Remember that though you specify name and content, when when blogger serves the page it will appear as content and name. Don't get confused. Another common issue is make sure this META tag appears as the first tag in the HEAD section as the Google doesn't recognize if it appears later in the HEAD section.

Tuesday, June 16, 2009

The mysterious ORA-03111 error

Recently one of the applications that I developed started throwing exceptions, that had the following message:
SQL state [72000]; error code [1013]; ORA-03111: break received on communication channel
When I googled around, I couldn't come across anything useful. Sadly enough most of the sites just showed the documentation for that error, without any explanation from anyone experiencing that issues. So here you go, with the best possible explanation that I could come up with.

My application sets two things on the connection that is throwing this exception:
  • It sets the fetchSize to be 2500 rows
  • It sets the query timeout to be 10 seconds
The database server and the application are separated over a long latency network (actually there is a NetEm box that emulates the long latency between these two boxes) which has a latency characteristic of 50+/-5 milliseconds. This is the whole setup.

It is important to understand how the timeout is handled by the Oracle client (in my case JDBC client). Once the query is successfully submitted, the client starts a clock for the timeout. Once the timeout is reached, the client sends an URG message to the Oracle server. The moment Oracle server receives this message, it knows that the client wants to cancel the operation that it was carrying on, no matter what stage the operation is in.

So take a couple of cases. Assume the operation is a SELECT query that will result in 10000 rows. If the Oracle server hasn't even started fetching the results, most likely the client's request would be responded immediately with an error code ORA-01013, which has a description like:
SQL state [72000]; error code [1013]; ORA-01013: user requested cancel of current operation
But imagine the server has fetched the rows and in the process of pumping the resultset back to the client. If the client requests the Oracle server to cancel the operation while still there is pending data in the socket to be delivered, it just adds the ORA-03111 packet at the end of the pending packets and lets the client knows that the operation has been cancelled while there is pending data to be delivered.

Look at the tcpdump output below:
23:13:08.613007 IP jdbc_client.48681 > orcle_server.1521: P 2543:3174(631) ack 2342 win 11908
23:13:18.635068 IP jdbc_client.48681 > orcle_server.1521: P 3174:3175(1) ack 265693 win 65535 urg 1
23:13:20.472561 IP orcle_server.1521 > jdbc_client.48681: P 398520:398615(95) ack 3186 win 65535
0x0000: 0015 c5ec 12a8 0021 1c1d c0c3 0800 4500 .......!......E.
0x0010: 0087 5023 0000 3406 bad6 c0a8 fd9a c0a8 ..P#..4.........
0x0020: fc8b 05f1 be29 a792 459f a091 06cc 5018 .....)..E.....P.
0x0030: ffff 67c2 0000 005f 0000 0600 0000 0000 ..g...._........
0x0040: 0402 04e3 0203 f500 0001 0300 0300 0000 ................
0x0050: 0000 0000 0000 0000 0000 0001 0100 0000 ................
0x0060: 0033 4f52 412d 3033 3131 313a 2062 7265 .3ORA-03111:.bre
0x0070: 616b 2072 6563 6569 7665 6420 6f6e 2063 ak.received.on.c
0x0080: 6f6d 6d75 6e69 6361 7469 6f6e 2063 6861 ommunication.cha
0x0090: 6e6e 656c 0a nnel.

Pay special attention to the times when the SELECT query was sent (21:13:08) and when the cancel request as an URG packet was sent (21:13:18), and when the Oracle sends the last TNS packet that has the error code ORA-03111 (21:13:20).

The cancel request as an URG packet was sent after 10 seconds because as I mentioned earlier my query timeout is 10 seconds.

So now the million dollar question: What should I do if I am facing this issue in my application?

Follow these simple steps:
  • First make sure that your query can be completed within the timeout that you have specified. If you consistently face this exception, try increasing your timeout.
  • That might help to get rid of the exception, but not the root cause. The root cause usually is a database that is not optimized for the query that you are executing or a bad network.
  • To find out if its the database that is the issue, try executing the same query in a host closer to the network. Or try executing the same query hitting the database from a different network. If you are convinced the database is the issue, try to tune it.
  • To find if it is the network that is having the issue, try to do a tcpdump and analyze if there are any out of order deliver of packets. Or dropped packets. If yes, then try to fix the network.
In my case, it turned out to be the bad configuration in the NetEm that was causing too many packets to be delivered out of order and too many duplicated packets. Remember I was introducing a variance of 10 ms (i.e. my packets could be delayed anywhere from 45 ms to 55 ms, as per my configuration). In real cases, at least in a well maintained production network, the variance will not be more than 1 ms.

Since I am not an expert in Oracle, I would be happy if anyone reading this blog entry has something to add on top of what I have told here. And I sincerely believe that this posting would help whoever is facing this issue.

Saturday, June 13, 2009

Availability calculation

I wanted to understand how the availability is calculated for systems. I came across an excellent article and found it very useful. You can read it from here.

Tuesday, June 09, 2009

Yet another top 10 list for Firefox

But this one is useful. The title of the blog entry is "Top 10 Firefox plugins for web developers & designers".

Effective use of Timeouts

It is very important to pay special attention to the timeouts and set them with proper values for any blocking operation. Any good library API must provide a way to set the timeout for any blocking operation.

Recently we identified one of the issues in production network because one of the programs that we deployed in production had set right timeout value. I just thought I must reiterate the fact that its critical to set the timeout values to optimum level so that we can fail fast and catch issues.