Sunday, February 8, 2015

Counting Inversions

Several weeks ago now, I implemented an algorithm in Python to count the number of inversions in an array of arbitrarily ordered numbers. My implementation was based on the pseudo-code presented by Tim Roughgarden of Stanford University.

The main function divides the array into two halfs. It then calls itself on the first half, and on the second, in each case getting back a sorted array and the number of inversions found. Then, it merges the two halves back together, getting any further inversions.

In the example pseudo-code, the merge step used two indices to move through the two halves being merged, incrementing them as their numbers were placed in the merged array.  I elected to instead simply .pop(0) the number off the front of the halves, and .append() it to the result. Then I didn't need to have a loop counter, but could instead simply test that the two halves were not empty.

This worked well enough, and I believe the code was fairly clear. It counted the inversions in a list of 100,000 numbers in 1.74 seconds.

Although that was good enough to get the results, it isn’t the best use of a Python list. Since lists are, under the hood, arrays, popping from the front results in the rest of the list having to be shifted. And there were a lot of shifts going on. To relieve some of that shifting, I altered the merge to first .reverse() the two halves, and then pop off then tail instead of the head.  This reduced the runtime on the same list to .52 seconds: an almost 70% improvement in speed.

I suspected we could do better still with a dequeue - since they’re optimized for popping and appending, and then I wouldn't need the confusion of the reverses.  I substituted with dequeues throughout, and went back to left pops.  My run-time was then .5 seconds. Not much savings! That surprised me.

No comments:

Post a Comment