Showing posts from April, 2014

Heartbleed bug in OpenSSL

There was a serious vulnerability reported in the OpenSSL library that can let the attacker to dump memory contents from the server. Thus the attacker can perform offline analysis of the memory contents and identify sensitive information like private key of the server, key material for SSL sessions, decrypted data that is in memory, etc. This affects any server that uses OpenSSL to implement HTTPS.

I thought I will share some material in one place that will be helpful for people to understand the problem better.

Description of the problem can be found here.A simple Python script to test your servers can be found here or you can use this site.NVD entry for this issue can be found here.How some of the companies are responding: Heroku, AWS, Lastpass. Hope that helps.

Two important traits of building reliable distributed systems

Designing a distributed system is hard enough. Even harder to design a distributed system that is reliable. There are many best practices that you can follow to make a reliable distributed system. Based on an issue that I recently troublehooted, there are a couple of them that I think are critical:

Enabling TCP keep-alive between the processes if you are using TCPPerforming all IO operations with a time out My advise is based on my experience in Linux. But I think it should be applicable to other operating systems as well. When a host goes down with a kernel panic, none of the established connections are closed by sending a FIN or RESET packet. This cauases trouble that the peer process doesn't know about the other end of the communication being gone. When you enable TCP keep-alive, the kernel sends a zero length packet as per the configuration. Hence if the peer has died due to kernel panic, the zero length packet will not be ACKed. Thus, the peer death is detected. If the peer h…