Saturday, December 7, 2013

LeetCode OJ: Wildcard Matching

Implement wildcard pattern matching with support for '?' and '*'.
'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

A typical DP. However, a naive O(N^3) time, O(N^2) space solution can not path large test cases. So 3 optimizations are used.

(1) Suppose s and p are of length n and m respectively, then we want to compute a m by n boolean matrix F, where F(i, j) is true if the first i letters of the pattern matches the first j letters of the string. However, we do not need to store then entire matrix, as we can compute the matrix row by row, where the computation of any row only relies on the previous row. Only 2 rows are needed at any time and the space complexity is reduced to O(2*N)

(2) It's easy to see that the '*' is the cause of the O(N^3) complexity, so a simple optimization is to collapse all adjacent '*' into one, which will not change the language represented by the pattern.

(3) When N is very large, like tens of thousand, then even O(N^2) is too slow. We say a non '*' character in the pattern as a hard character, as it must match at least one character from the input string. If we have k hard characters among first i characters of the pattern, then we know F(i, j) must be false if j < k. In this way, we can find a lower and upper bound of j between which F(i, j) can be true for any fixed i. This optimization is especially useful when the pattern is very long, but most of characters in the pattern are hard.

An solution using the above three optimizations is as follows.

No comments:

Post a Comment