Monday, December 2, 2013

Choice of Hash functions for integer k-tuple in Java

In theory classes, we learn about Universal Hashing, Perfect Hashing, Cuckoo Hashing, Tabulation Hashing, so on and so forth. Most of those techniques enjoy great theoretical value, yet face difficulties in practice. For example, Universal Hashing asks as to encode some randomness into the hashcode() method, which is hard to achieve and we would rather just hard code some coefficient a and b. On the other hand, we rarely in practice need to consider adversarial guarantees. For another example, we simply don't want to afford the creation time and large constant factor of a Perfect Hashing.

We often find ourself just want a hashcode() function for a class whose object is uniquely identified by a k-tuple of integers (a1, a2, ..., ak). Since the HashMap in Java already has a CRC style Hash function to map integer to bins, the hashcode() function only need to convert the k-tuple into a single integer. The following one is what Java uses to hash a String, and is what I suggest to use for hashing k-tuple of integers:

h((a1, a2, ..., ak))
= a1 + a2 * p + a3 * p^2 + ... + ak * p^(k-1)

There is an implicit module operation due to potential integer overflow. Also, it seems people often choose a Mersenne prime for p, such as 31.

No comments:

Post a Comment