A Paper a Week-ish #9: Two Simplified Algorithms for Maintaining Order in a List
Looking around for papers to cover in this series, I remember a talk about folding algorithms one Erik Demaine gave at my former university as the annual Vienna Gödel Lecture, where once a year major contributors to the field of computer science (other guest lecturers were Don Knuth and Peter Norvig) are invited to give a talk about their work. I skimmed through his very impressively long list of papers and this one caught my eye. If you have a bit of time on your hands and are interested in algorithms (he seems to specialise a lot in folding and puzzles), I recommend you do the same.
Anyway, I picked this paper more or less at random from the seemingly endless list, and it, as the title says, proposes two algorithms for maintaining order in a list. As is usual, the first part of the paper is dedicated to giving a general introduction into the problem, its’ history and the solutions that have previously been proposed to solve it. Then, they justify the first of their two algorithms which, even though it does not offer theoretical improvements (i.e. lower complexity bounds – which is impossible considering that the best solution has \( O(1) \) worst-time on all operations), has two characteristics that make it useful: their analysis is easier to follow, and their practical experiments suggest that its’ real-world performance is superior. In fact the theoretic performance is arguably worse: the proposed data structure has \( O(1) \) amortised time complexity, while the best known one has \( O(1) \) worst-case. Anyway, as is sometimes the case, there are cases where the theoretically worse data-structure has better real-world performance; as computers always work with finite data, asymptotic performance is sometimes an oversimplification.
The first algorithm is a solution to the Order-Maintenance Problem, which has ”the objective … to maintain a total order subject to insertions, deleteions and precendence queries”. It does that by assigning tags to elements such that the order of the tags corresponds to the order of the elements. Then, the analysis proceeds on the virtual trie that can be obtained using the binary expansion of the tags. Roughly the algorithm makes sure that the density (that is, the number of “occupied” tags per available tag) does not exceed a threshold for every node in the implicit trie. The analysis is a bit handwavey at times, for instance it states that “Thus, we pick a u [comment: maximum tag] that works for any value from n/2 to 2n. If at any point we have too few or too many list elements, we rebuild the data structure for the new value of n. This rebuilding introduces only a constant amortized overhead.” I can see this statement being true in analogy to how amortized analysis of dynamic arrays works (that is, always doubling the size of the array when it is full results in amortized \( O(1) \)), but it is not immediately obvious that the statement is true, because there could be subtle differences.
An important detail of the complexity analysis is that it does not do any assumption about which tag is chosen for a newly inserted element. If there are multiple tags free between the elements \( e \) and \( f \) adjacent to the newly inserted one, one would assume that it is best to chose the new tag such that it is as far from both as possible. This not being represented in the theoretical analysis could be a sign that the bound is simply not tight enough (i.e., that one could achieve a better bound by considering this), but their experimental evaluation suggests otherwise. It seems to make no difference where the new element is inserted, in their own words “Consecutive inserts trigger relabels after every insert, but the number of elements relabled is small. … average inserts have fewer triggers, and the number of elements relabled is larger.”
The second algorithm is a solution to the File-Maintenance Problem, which is similar to the Order-Maintenance Problem but also requires that it is possible to traverse the elements in order. The solution keeps the elements in order in an array. Again, the algorithm works on an implicit tree where \( a \log n \) elements of the array are bundled into one leaf node, for some \( a \). The algorithm then makes sure that two invariants are preserved, such that there is always space to insert a new item. Firstly, adjacent intervals (on every level of the implicit tree) must only differ by a certain amount. Secondly, every interval (on every level of the implicit tree) must have sufficient space to accommodate new items. The algorithms that make sure this is the case are too sophisticated to be covered in this summary, so consult the full paper for these details.
Find the paper here.