Tuesday, December 3, 2013

LeetCode OJ: Max Points on a Line

This LeetCode problem has the second lowest AC rate. It deserves some attention as it asks for the use of hash table and choice of hash function that requires some thought.

Given n points on a 2D plane, find the maximum number of points that lie on the same straight line.
note: the points have integer coordinates.

A naive O(n^3) algorithm will consider all pairs of points (p_i, p_j), then for all k != i or j, check whether p_k is on the line defined by p_i and p_j. Identical points needs to be taken care of. An implementation of O(n^3) algorithm is as follows:

A better O(n^2) solution is possible by using hash tables. Notice that this problem is known to be 3SUM-hard so a non-parallel, non-randomized algorithm is unlikely to be better than O(n^2).

To implement a O(n^2) algorithm, one need to create a hash table whose key is encoding of line. One way to encode a line in 2D plan is using slope-intercept form: y = mx + b. However, a and b could be float numbers even if x and y are integers and it is well-known that equality test for float number is error-prone. Instead, I choose to use the standard form: ax + by + c = 0. If we have two distinct points (x1, y1) and (x2, y2), then we can set a = y2 - y1, b = x1 - x2, c = y2 * x1 - y1 * x2, and all of a, b and c will be integers. However, this (a, b, c) is not unique as any integral multiple of it represents the same line. To guarantee uniqueness, we require b to be nonnegative. If b is zero, i.e. the line is vertical, then we require a to be positive. Notice that a and b can not both be zero.

We need to find a good hash function in order to use this (a, b, c) representation as key of hash table. The hash function for hashing integer vectors mentioned in my last post is a good candidate. In particular for the problem at hand, I used:

h(a, b, c) = a + b * 31 + c * 31 ^ 2.

The algorithm itself is straightforward. For each point p_i, we consider all points after it {p_i+1, ..., p_n}. We create a hash table whose key is (a, b, c) tuple and whose value is the number of points in {p_i+1, ..., p_n} that connects to p_i by the line represented by the corresponding key. The the largest value in the constructed hash table is the maximum number of points that lie on the same straight line if the smallest index of those points is i. An implementation is as follows:

Interestingly, the running time of O(n^2) and O(n^3) solutions are not very different on LeetCode OJ, which is probably due to large overhead and small testing cases.

No comments:

Post a Comment