Monday, September 08, 2014

Is accepting Optional as argument a code smell?

I was reviewing some piece of code and came across a function that was taking Optional as argument. In the same class, another function was taking a couple of Optional as arguments.

When I thought about it, I felt that taking an Optional should be avoided. You can return an Optional from a function, but avoid taking Optional arguments. Before calling a function, you should check for the presence or absence of the values that you are passing. Hence a function taking one or more Optional is a code smell.

One argument that can possibly presented in favor of taking Optional as argument is every caller checking for the presence of the arguments. Consider the code example below:



If you pay attention, the error handling from the caller's point of view is ugly if the caller wants to report correct error. Much worse, the error check is done after using the values.

A side effect of product() taking Optional is that it must return an Optional. Otherwise it has to throw an exception. They both can be avoided if it doesn't take the Optional argument.

A simplified version will be:



Tuesday, July 29, 2014

A minor annoyance with two argument logging in SLF4J and Scala

I am using Scala for one of my recent projects. For logging, I am using SLF4J/LOGback. There is one minor annoyance while you are trying to log two argument messages like:
logger.info("Some log with arg1 [{}] and arg2 [{}].", arg1, arg2)
While you compile with sbt you will get the following error:
[error] both method info in trait Logger of type (x$1: String, x$2: <repeated...>[Object])Unit
[error] and  method info in trait Logger of type (x$1: String, x$2: Any, x$3: Any)Unit
[error] match argument types (String,String,String)
 
If you are getting this error, a small trick that I did to avoid casting to AnyRef or Object was to just add a null argument at the end which will force the Scala compiler to make use of the vararg version. LOGBack just ignores extraneous arguments. Like this:
logger.info("Some log with arg1 [{}] and arg2 [{}].", arg1, arg2, null)

Disclaimer: I am a Scala rookie, hence take my advice with a pinch of salt!
 

Monday, July 21, 2014

The most important keyboard shortcut you should know in IntelliJ IDEA

TLDR; If you are new to IntelliJ (like me), use "Cmd+Shift+A" to find your way around.

Recently I started developing an application in Scala. I am an avid Eclipse user, but unfortunately I felt that the support for Scala is a bit clunky as of today. It was driving me nuts to see some of the Java classes highlighted in red as unknown classes, but restarting the Eclipse or refreshing project was removing those errors.

Based on my research, I felt that IntelliJ IDEA has better support for Scala. So I wanted to give it a spin. So far I feel so good that I decided to give it a try.

I prefer to get things done using keyboard shortcuts, touching the mouse only for absolutely essential needs. The most important keyboard (and in my opinion the most awesome) shortcut that you should know in IntelliJ is "Cmd+Shift+A". This will show you an input box where you can enter a partial string and you will be presented with the list of action or options that is relevant to the search term. For e.g. you can enter "override" and presented with actions relevant to "overriding". 

For Emacs users, this is very similar to "apropos" command. For Eclipse users, this is similar to "Cmd+Shift+L", but only better.

Saturday, May 24, 2014

An absolute essential thing you should know about Hibernate cache

You might have used Hibernate for your ORM needs. The most important thing that you should about Hibernate cache is that it is implemented using a Map and a reference every entity that you have read from the database is kept in that Map. The key of that Map is the Entity ID (which could be just a Long or some form of composite primary key you have defined and wrapped in an EntityKey). The value stored is the entity itself (proxied).

If you read a 1000 entities (or rows) from the database, a reference to each one of them will be kept in the Map. If those entities have relationships specified (one-to-many or many-to-many) with other entities and those entities should be loaded eagerly, you are asking Hibernate to load a lot more than 1000 entities.

If you venture into processing a large set of rows (say 100,000 rows), a reference to every one those rows is going to be kept in the Map. Sometimes the amount of memory needed could be so large that your process might run out of memory (as mine did).

So what is the life cycle of this Map? This map is part of the "persistence context" that is referenced in Hibernate session. Hence when a Hibernate session is created, this Map contains no elements. As you keep reading or inserting entities, this Map accumulates references to these entities. When you close the session the Map is cleared.

If you would like to deal with a large volume of entities, given below is a strategy that you can follow:

The disadvantage of this method is that if there is one or more entities that are referred across that batches, they will be read again once per batch. But the main advantage is that you will keep your memory utilization low.

Tuesday, April 08, 2014

Heartbleed bug in OpenSSL

There was a serious vulnerability reported in the OpenSSL library that can let the attacker to dump memory contents from the server. Thus the attacker can perform offline analysis of the memory contents and identify sensitive information like private key of the server, key material for SSL sessions, decrypted data that is in memory, etc. This affects any server that uses OpenSSL to implement HTTPS.

I thought I will share some material in one place that will be helpful for people to understand the problem better.

  • Description of the problem can be found here.
  • A simple Python script to test your servers can be found here or you can use this site.
  • NVD entry for this issue can be found here.
  • How some of the companies are responding: Heroku, AWS, Lastpass.
Hope that helps.

Saturday, April 05, 2014

Two important traits of building reliable distributed systems

Designing a distributed system is hard enough. Even harder to design a distributed system that is reliable. There are many best practices that you can follow to make a reliable distributed system. Based on an issue that I recently troublehooted, there are a couple of them that I think are critical:

  • Enabling TCP keep-alive between the processes if you are using TCP
  • Performing all IO operations with a time out
My advise is based on my experience in Linux. But I think it should be applicable to other operating systems as well. When a host goes down with a kernel panic, none of the established connections are closed by sending a FIN or RESET packet. This cauases trouble that the peer process doesn't know about the other end of the communication being gone. When you enable TCP keep-alive, the kernel sends a zero length packet as per the configuration. Hence if the peer has died due to kernel panic, the zero length packet will not be ACKed. Thus, the peer death is detected. If the peer has died and rebooted, when the zero length packet is sent, the peer responds with RESET packet. Thus the best way to detect peer death is enabling keep-alive.

For Linux, to enable keep-alive, please refer to this page. The most important thing is after you open the socket, you should use setsockopt() to set the SO_KEEPALIVE option.

Using timeouts is one way to detect that the service quality is degrading. For e.g. when you try to fetch data from an external system, the external system might be choking due to too many requests and may not be as responsive as it should be. But if you build retry mechanisms based on timeouts, you have to be extra careful. Because sending more requests to an external system might make the situation worse than it should be.