Thursday, March 22, 2007

Understanding filter, map and reduce

Most of the computation problems that I have faced in the past could easily be solved using a mix and match of filter, map and reduce operations. These operations could be performed on any set of objects that can be iterated.

Filtering operation is one of the most fundamental operations that we perform more frequently than we think. In simple terms, we define a predicate and check if this predicate is true for each object in the set, iterating over them one by one. Whenever the predicate is true, we append the object to the output, otherwise we ignore. Consider grep tool as an example. The lines in the file(s) are iterable. If the current line is L, the the predicate is the question: "does L contain the pattern XYZ?". The output set has utmost as many elements as input set has.

Mapping is the operation of producing an output for each element in the input, by performing a function on that input. Unlike filtering, which used a predicate to check, the map uses a function. The map operation produces exactly the same number of elements as its input set. An example is printing the square of the all the numbers from 1 to N.

Reduction is an iterative operation performed on each element using that element and the partial result obtained in the previous step. So reduction operation takes two inputs: a function and an initial value to perform the computation with the first element. Reduce operation has exactly only one output.

Python provides language level support for these three operations. In C++ too you could perform these operations using a combination of remove_copy_if (for filtering), transform (for mapping) and partial_sum (for reduction).

Here comes the most interesting part of all. I claimed that most of the computation problems could be solved easily by using a combination of these three operations. Let us take an example and explore.

Example of billing a subscriber for his phone calls
Let us assume that we are receiving a huge volume of call records. We would like to calculate the monthly bill amount for user A. How would we do that? In three steps:
  1. Extract all the records for user A. (This is filtering with predicate: "is this call record for user A?")
  2. Find the day and night time tariff rates as per the time of the call. (This is mapping. With a function that returns the call rate depending on the time of the call.) And multiply the rate with the actual duration. (Again a mapping operation.)
  3. Sum up all the charges. (Reduction with initial value of zero or may be his previous month balance)
An important this to understand is we could also perform interleaved mapping and reducing in steps (2) and (3) above. For e.g. in the above case, finding the rate, calculating the call charge and adding that to the partial charges so far could be done in the same iteration. Yet, it remains that they are map and reduce operations. Such implementation tweaks could always be done to boost the performance.

Saturday, March 17, 2007

dhclient - Obtaining IP address dynamically

Whenever you want to obtain a dynamic IP address for your Linux/Unix machine from a DHCP server, you should use dhclient utility.

Both DHCP request and responses are UDP requests. If you use a sniffer to identify the pattern of the request responses, the following is what you might see. The following was taken from dhclient running in FreeBSD 6.1.

Len SrcIP SrcMACAddr DestIP DestMACAddr Protocol
342 0.0.0.0 00:0c:29:c1:13:81 255.255.255.255 ff:ff:ff:ff:ff:ff UDP

62 192.168.49.254 00:50:56:ef:75:a8 192.168.49.128 ff:ff:ff:ff:ff:ff ICMP ping request
342 192.168.49.254 00:50:56:ef:75:a8 192.168.49.128 00:0c:29:c1:13:81 UDP
60 192.168.49.128 00:0c:29:c1:13:81 ff:ff:ff:ff:ff:ff ARP request


If you observe the source and destination (IP, MACAddr) patterns, it is easy to appreciate what happens. Here is what happens:

  1. A DHCP request is sent on the network. Both destination MACAddr and IPAddr are broadcast addresses.
  2. The DHCP server chooses and IP address to offer and confirms if the IP address it is not used anywhere else in the network.
  3. DHCP server offers the IP address to the requester.
  4. The requester sends an ARP request to register its MAC address and IP address in interested hosts in the network.
DHCP is an application layer protocol. Thats an important thing to keep in mind.