These last two weeks I have been solving some programming contest problems using Haskell, the link to the contest is here. The problems are not particularly difficult but are still quite interesting nevertheless. In this blog post, I would like to discuss the fourth problem which is the one that give me the most trouble. (https://www.hackerrank.com/contests/lambda-calculi-jun14/challenges/boleyn-salary)
To summarize the problem, you are given a tree where nodes are labeled from 1 to n and each node has an integer value between 1 and 1 billion. You are then asked to answer queries of the form (v,k) in which you have to find the descendent with the kth smallest integer value of node v. The queries are present in a way such that the next query depend on the result of the previous one, so it's not possible to do offline processing.
My solution to the problem involves finding a sequence of the tree nodes such that any node and all its descendents always form a continuous sequence. This way we can turn a query on node v's descendents into a query on an interval on the sequence. For example, if we have a tree like below
1 ---->2----->3
| |----->4
|------>7
We can transform it into the sequence A = [1,7,2,3,4] and so a query on all of 2's descendents would be a query on subsequence [3,4] of the A. Similarly a query on all of 1's descendents would be a query on subsequence [7,2,3,4] of A.
After this step, we can build a segment tree on the obtained sequence so that a query on an interval can be evaluated by combining at most log n precomputed values (where n is the number of nodes in the tree). For more details on segment tree, please see this link. At this point, however, it's tricky to determine what kind of values we should precompute for each segment in the tree. Since for each interval, we need to find the kth smallest element of all values in it, it's not as straight forward as just computing the sum/minimum/maximum for each segment in the segment tree. In the end, I decided to go with computing a sorted range of all the elements in each segment. This makes the space complexity of the data structure becomes O (nlog^2(n)) instead of O(nlogn) like in traditional segment tree used for range minimum query. In addition, I can't find a way to determine the kth smallest element among log(n) sorted ranges in better than O(log^3(n)) worst case complexity. So, in the end my algorithm takes O(nlog^2(n)) space complexity and O(q*log^3(n)) time complexity where q is the number of queries. The code can be seen here.
Finally, I'd like to comment about some of my beginner experience in using Haskell. It's pretty fun using different constructs/mental models to solve familiar tasks in imperative programming languages and one of the nicest thing is that you feel much more confident about the correctness of your code. I'm not entirely sure why this is the case but I suspect it's a combination of Haskell being less verbose than what I'm used to (Java/C++) which makes the code looks very similar to your own thinking/reasoning and of Haskell being pure which reduces the amount of stuff you need to keep in check. However, I encounter some difficulties in debugging and performance tuning. For debugging, I don't know any easy way to print results of intermediate computations since printing require you to plug in an IO monad in the data flow somewhere and as for performance, differences between strictness and laziness and their implications can be quite confusing at times. Of course, these are my problems rather than the language's problems and I'm looking forward to learn more about these to be able to use Haskell in more serious endeavour in the future.
To summarize the problem, you are given a tree where nodes are labeled from 1 to n and each node has an integer value between 1 and 1 billion. You are then asked to answer queries of the form (v,k) in which you have to find the descendent with the kth smallest integer value of node v. The queries are present in a way such that the next query depend on the result of the previous one, so it's not possible to do offline processing.
My solution to the problem involves finding a sequence of the tree nodes such that any node and all its descendents always form a continuous sequence. This way we can turn a query on node v's descendents into a query on an interval on the sequence. For example, if we have a tree like below
1 ---->2----->3
| |----->4
|------>7
We can transform it into the sequence A = [1,7,2,3,4] and so a query on all of 2's descendents would be a query on subsequence [3,4] of the A. Similarly a query on all of 1's descendents would be a query on subsequence [7,2,3,4] of A.
After this step, we can build a segment tree on the obtained sequence so that a query on an interval can be evaluated by combining at most log n precomputed values (where n is the number of nodes in the tree). For more details on segment tree, please see this link. At this point, however, it's tricky to determine what kind of values we should precompute for each segment in the tree. Since for each interval, we need to find the kth smallest element of all values in it, it's not as straight forward as just computing the sum/minimum/maximum for each segment in the segment tree. In the end, I decided to go with computing a sorted range of all the elements in each segment. This makes the space complexity of the data structure becomes O (nlog^2(n)) instead of O(nlogn) like in traditional segment tree used for range minimum query. In addition, I can't find a way to determine the kth smallest element among log(n) sorted ranges in better than O(log^3(n)) worst case complexity. So, in the end my algorithm takes O(nlog^2(n)) space complexity and O(q*log^3(n)) time complexity where q is the number of queries. The code can be seen here.
Finally, I'd like to comment about some of my beginner experience in using Haskell. It's pretty fun using different constructs/mental models to solve familiar tasks in imperative programming languages and one of the nicest thing is that you feel much more confident about the correctness of your code. I'm not entirely sure why this is the case but I suspect it's a combination of Haskell being less verbose than what I'm used to (Java/C++) which makes the code looks very similar to your own thinking/reasoning and of Haskell being pure which reduces the amount of stuff you need to keep in check. However, I encounter some difficulties in debugging and performance tuning. For debugging, I don't know any easy way to print results of intermediate computations since printing require you to plug in an IO monad in the data flow somewhere and as for performance, differences between strictness and laziness and their implications can be quite confusing at times. Of course, these are my problems rather than the language's problems and I'm looking forward to learn more about these to be able to use Haskell in more serious endeavour in the future.