By Warren D. Smith, warren.wds@gmail.com. First draft Feb. 2014. Second draft March 2014.
Abstract. Basically, we redo the work of Brams, Kilgour, Klamler [Notices Amer. Math'l. Soc. 61,2 (2014) 130-141] on "envy-free allocations" of 2N items among two people "Alice" and "Bob" – obtaining better results, clearer proofs, and/or clearer and faster algorithms. We re-prove BKK's fundamental existence theorem using a characterization as a "totally unimodular linear program." We show the number of such allocations is lower bounded by 0 and upper bounded by the Nth Catalan number (2N)!/[N!(N+1)!]. We find a formula for the "failure" probability no envy-free allocation exists (if Bob's preference ordering is uniform random) which for 2N large is asymptotic to 4/(4N+1), which tends to 0. We find an O(N)-time O(1)-space algorithm for deciding whether such an allocation exists; and if one does, we show how to produce an example in O(NlogN) expected time and O(N) words of memory via a randomized algorithm. All these results are best possible except that perhaps some algorithm faster than NlogN might exist and/or perhaps it could be derandomized. But we prove a lower bound of order NlogN for any algorithm, including randomized ones, of the "algebraic decision tree" class (and ours was in this class). BKK's probability model that Bob has uniform random preference order was unrealistic. We discuss more-realistic probability models, and it appears that in them the failure-probability does not tend to 0; indeed in one of our models it goes to 1!
This work was stimulated by my reading, in the Feb. 2014 AMS Notices, a paper by Brams, Kilgour, and Klamler about "Envy Free" allocations. Suppose there are N indivisible items we wish to divide among two people, "Alice" and "Bob." (Warning: some sections of this paper shall use N, others 2N, to describe the number of items. The present section uses N.) Alice has a preference order among these items. Bob has another. An "allocation"of the N items is an N-bit binary string, whose Kth bit is 1 if item K is awarded to Bob, and 0 if it is awarded to Alice. The allocation is "Envy-Free" if there exist two matchings, each pairing Alice's items with Bob's items (each matching contains N/2 disjoint pairs), with the property that for each pair (a,b) in matching #1, Alice prefers her item a over Bob's paired item b; while for each pair (B,A) in matching #2, Bob prefers his item B over Alice's paired item A.
I think the greatest accomplishment of the Brams-Kilgour-Klamler paper (henceforth "BKK") was to prove
THEOREM (existence): Let there be N≥0 items, and (without loss of generality) let Alice's preference order be 1,2,3,4,...,N, i.e. she prefers lower-numbered over greater-numbered items. An envy-free allocation exists if, and only if, N is even and Bob's preference permutation does not fix any odd prefix. That is: if and only if N is even and no odd K exists with 1≤K<N such that Bob's top K most-prefered items, are the same (in the sense of set-equality) as Alice's top K most-prefered items.
However, I was worried about BKK's proof. It is easy to prove necessity: exactly one among {Alice,Bob} will be allocated the majority among their common top K items, if K is odd, which forces the other to be envious re at least one of those items. But to prove sufficiency, BKK relied on an algorithm they invented, called "AL," which allegedly finds an allocation whenever one exists. I did not understand their algorithm. The problem was it was written in "English," rather than a formal "pseudocode," (a policy I strongly discourage for algorithms of their considerable level of complexity) and it was complicated and ambiguously parseable. Specifically, I did not understand their steps 3 and 4, nor the meaning of their word "etc."
Upon enquiring about this by email to B,K, and K, they did not answer those questions, and it seemed to me they themselves did not understand AL very well. (Albeit, Brams said he did not understand why I did not understand...) For example, their paper claimed AL is a "polynomial time" algorithm. Great – but what is the runtime-bounding polynomial? Their paper did not say, nor were they willing/able to tell me in email. It is quite remarkable for a paper to be written about an allegedly practical and polynomial-time algorithm, where the authors do not determine the degree of the polynomial. Indeed, I am not sure if that has ever occurred before.
Also, Brams told me that "a student" was currently working on programming AL, indicating their algorithm had, at the time of publication, not yet been programmed and tested. (Mind you, I am hoping the situation will be clarified if and when that student succeeds. Or fails.)
So that all led in my mind to the following natural questions:
I was able to answer all of these questions, except perhaps the last. The answers:
Alice: 1 2 3 4 5 6 7 8 ... N-1 N Bob: 2 1 4 3 6 5 8 7 ... N N-1
with N even,
then exactly one envy-free allocation exists, namely Alice gets the odd-numbered items.
The Catalan numbers are easily computed, starting with C(0)=1,
via the recurrence
m ← 0; for(k=1,2,3,...,N){ if(Bob[k]>m){ m ← Bob[k]; } //m is running maximum if(k mod 2=1){ if(m≤k){ report("No envy-free allocation exists"); quit; } } } report("At least one exists");This time consumption obviously is best possible for any correct algorithm since it is easily seen that essentially the entire Bob[] array must be inspected to assure correctness. But also note that if we input the Bob[] array entries "on the fly" without storing the whole array – which this algorithm permits – then the memory consumption drops to O(1) words which then also obviously is best possible.
A "linear program" (LP) is a set of inequalities
where A is a matrix with known integer entries, b is a vector with known integer entries, and x is a vector of unknowns; plus a directive to find, among all vectors satisfying the inequalities, one maximizing some linear function c·x (where c is some known real vector). The matrix A (and the linear program) is "totally unimodular" if every square submatrix of A – got by taking an arbitrary subset of its rows, and same-cardinality subset of its columns – has determinant lying in the set {-1, 0, +1}.
LEMMA: If any real solution-vector exists, then a solution x of a totally unimodular linear program always exists with all its entries integer.
(Proof hint: consider "Cramer's rule." Full discussion: Schrivjer 1998.)
For the envy-free allocation problem, in which Alice's preference ordering is 1 2 3 ... N, while Bob's is described by the array Bob[1...N], consider the following set of inequalities.
(Incidentally, the second set of inequalities would be replaced by ∑_{0≤i≤k} x_{Alice[i]} ≤ ⌊ k/2 ⌋ more generally, but our numbering convention that Alice's order be 1 2 3... N, i.e. that Alice[j]=j, incurs no loss of generality and makes life easier.) I claim that this linear program is totally unimodular. It immediately follows that if this program has any solution vector x, then (for any c) it always has an all-integer solution. It then follows from the first set of inequalities that every entry of that solution vector x lies in the two-element set {0, 1}, hence is a binary allocation. It is trivial to see that any such solution vector, regarded as N bits, describes an envy-free allocation (Alice gets the items with 0-bits, Bob the items with 1-bits). It also is trivial to see that no envy-free allocation can exist if no solution of the linear program exists. Hence
THEOREM (totally unimodular LP formulation): An envy-free allocation exists if, and only if, this LP has a solution.
PROOF: The only nontrivial thing we still must prove, is that the LP is indeed totally unimodular. Negating any row or column of A leaves A's total unimodularity unaffected. Permuting the rows (or columns) of A leaves its total unimodularity unaffected. Adjoining an identity matrix – or more generally, any permutation matrix with any subset of its entries negated – above (or to the left of) A leaves A's total unimodularity unaffected. Our LP's first set of inequalities precisely arises from identity matrix (or negated identity matrix) adjoinings and hence can be ignored from now on. Its second set of inequalities is described by the lower-triangular matrix with all 1 entries in the lower triangle (including diagonal) and 0's elsewhere. Any submatrix of that kind of matrix is reducible via determinant-preserving row-operations (i.e. subtracting rows) to one with only 0 and 1 entries and all the 1-sets in the rows are disjoint sets. Its third set of inequalities is equivalent to the second set – after columns are permuted into Bob-order – except with all entries negated. The second and third sets of inequalities thus, after restriction to a submatrix followed by row reduction, yield a submatrix with only 0 and ±1 entries, and the (+1)-sets in rows are disjoint, and the (-1)-sets in rows are disjoint, and each row is entirely {0, +1} or entirely {0, -1}. Each column of this reduced submatrix thus contains at most a single +1 and at most a single -1, with all other entries 0. But any such matrix automatically itself is totally-unimodular by a simple induction argument. Consider a square submatrix. If it has a column with at most one nonzero entry, then "expand that column by minors" to reduce to a matrix with one fewer column (induction step). Otherwise, each column contains exactly one +1 and one -1 (rest 0) in which case the determinant equals 0 because every column-sum is 0, forcing a linear dependency among the columns, i.e. the matrix must have less than full rank. (This induction argument was attributed to Veblen & Franklin 1921 on page 274 of Schrivjer, who further claims this result was already known to H.Poincare in 1900.) Q.E.D.
In view of either the "ellipsoid method" or "Karmarkar's algorithm" for linear programming (see, e.g, Schrivjer 1998) – also Tardos 1986's methods for "combinatorial" linear programs apply in our case – we immediately find
COROLLARY: Polynomial-time algorithms exist to determine whether an envy-free allocation exists, and if so to find one (indeed to find one maximizing any desired linear function c·x).
For example, one might want to choose c[j]=Alice[j]-Bob[j] which would in some Borda-like sense find the "optimal" envy-free allocation. Or we could take c[j]=B[j]-A[j] where A[j] is Alice's real-valued "utility" for item j, and B[j] is Bob's (if these utility values were somehow known) to get the envy-free allocation maximizing "total utility." Or just c[j]=B[j] to get the envy-free allocation optimal in Bob's view alone.
We also have
COROLLARY: Brams, Kilgour, and Klamler's existence theorem, is correct.
PROOF: Obviously, if Bob and Alice had the same first-k preference sets for some odd k, 1≤k<N, then the corresponding inequalities in the LP's second and third inequality-sets would contradict – it is false that ⌈k/2⌉≤⌊k/2⌋ for any odd integer k – preventing any solution from existing. If there is no such odd k, then there is no obvious contradiction; and indeed a real solution vector then does exist. To construct it, proceed as follows. Alice and Bob walk through their orderings both in the most-to-least preferred direction. If at any point, their current favorites a and b disagree, then Alice gets hers (x_{a}=0) and Bob's his (x_{b}=1). But if they agree, then that item is placed in the "contested pile." Either way, they move on to the next (as-yet untaken) item in their lists until both lists are exhausted. Finally, set x_{i}=1/2 for every contested item i. The resulting vector x is then readily seen to be a solution-vector of the LP (although not necessarily an all-integer one) because every odd prefix contains at least one more wholy-owned item than wholy-not-owned one (for Bob's preference ordering; also true for Alice's), plus all other items are half-owned. Then our previous Lemma shows an all-integer and therefore all-01 solution automatically exists, and hence an envy-free allocation. Q.E.D.
Historical aside: It turns out that what we in the West call the "Catalan numbers" C_{n}=∏_{2≤k≤n}(n+k)/k=(2n)!/[n!(n+1)!] were actually discovered by the Mongolian astronomer Minggatu (1692?-1763?), who found the infinite series generating function
One nice special case arising from x=π/2 is ∑_{j≥0} C_{j} 4^{-j} = 2. Minggatu discovered many infinite series identities, all without calculus. Another identity by Bill Gosper is
Minggatu's and Gosper's results arise from the following two generating functions:
and the latter is also the reason for the "amazing" fact that
Here are some more generating functions (all of these are confirmable by direct computation of the Maclaurin series of the right hand side):
THEOREM (Catalan count): If Alice and Bob's preference orders are exact reversals of each other, and there are 2N items, then there are exactly (2N)!/[N!(N+1)!] envy-free allocations. But if Alice and Bob's preference orders are not exact reversals, then the number of envy-free allocations is less. (And if the number of items is odd, then no envy-free allocations exist.)
PROOF: In the exact-reversal case, envy-free allocations correspond precisely to Dyck words – binary 2N-bit strings each of whose prefixes contains at least as many 0s as 1s. If we replace "0"→"(" and "1"→")" then Dyck words become precisely "expressions containing N pairs of correctly-balanced parentheses." These in turn are equivalent to "rooted ordered trees with N+1 nodes." All these are well known to be counted using Catalan numbers. (There is actually an entire book about Catalan numbers: Koshy 2008. Dyck words are discussed in its chapter 6.)
In any not-reversal case, we also must demand of our binary strings that the prefixes when the bits are considered in Bob's order must each contain at least as many 1s as 0s. With the exact reversal Bob-order this condition would be superfluous since Bob's prefixes correspond to Alice's suffixes so Alice's prefix demands are equivalent to Bob's. But without exact reversal, at least one of Bob's conditions constitutes a genuine additional demand. If even a single Dyck word fails to satisfy even a single one of these Bob-prefix demands, that completes our proof by showing the Catalan number is a strict overestimate of the envy-free allocation count. But it is trivial to confirm that at least one Dyck word exists with any two specified bits of it demanded equal to 0 and 1 respectively (except, of course, the first bit cannot be demanded to be 1 and the last cannot be demanded to be 0) by using our preceding existence theorems for envy-free allocations (with a suitably constructed fake "Bob"). Q.E.D.
Let F(N) be the number of permutations of {1,2,3, ..., N} which which fix at least one odd-prefix. Start with F(0)=F(1)=0.
THEOREM (recurrence formula for F):
PROOF: Enumerate all the F(N) permutations of {1,2,3,...,N} which fix an odd prefix. They are:
perms of form their count 1... (N-1)! (123)... [3!-2](N-3)! where we only count the ones not of preceding form; this is [3!-F(3)](N-3)! (12345)... [5!-F(5)](N-5)! where again count only those not of preceding two forms and so on.
Q.E.D. (And the resulting counts agree with a brute force enumeration by computer for 0≤N≤14.)
THEOREM:
For large even N, asymptotically
PROOF SKETCH: Just the first and last couple of summands in the recurrence suffice to confirm this; the rest are relatively negligible. (Both our even and odd expressions are accurate to both leading and next-to-leading order as N→∞.) Q.E.D.
Unfortunately, it is unrealistic in most situations to regard Bob's preference permutation as being uniform-random (conditioned on Alice's). More realistic is that the items have monetary "prices" which are approximately agreed by Bob and Alice, and further that the distribution of those prices is approximately power law.
E.g. see Xavier Gabaix: Power Laws in Economics and Finance, Annual Rev. Economics 1 (2009) 255-293.
J. Doyne Farmer & John Geanakoplos: Power laws in economics and elsewhere, 2008 manuscript (never published?)
J.D. Farmer & Fabrizio Lillo: On the Origin of Power Law Tails in Price Fluctuations, Quantitative Finance 4,1 (2004) 7-11.
Niall MacKay: London house prices are power-law distributed, http://arxiv.org/abs/1012.3039
A model of that is as follows. Fix some real constant E. For item k where k=1,2,3,...,n, Alice regards its price as r_{01}k^{E} while Bob regards its price as r_{01}k^{E} where r_{01} is a random real number uniform in the interval [0,1], assumed to change every time it is used so that Alice and Bob's item-valuations generically differ.
In this power-law model, what is the probability that an envy-free allocation exists? When E=0 our probability model is the same as Brams-Kilgour-Klamler (i.e. uniform). But when E>0, or when E<0, our model is more pessimistic and in either case I would expect the envy-free-failure probability to be an increasing function of |E|.
Here is what a computer simulation (each datum based on 4000000 Monte-Carlos) finds as the failure-probabilities:
NEGATIVE E: -E=0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 n= 2: 0.500 0.508 0.530 0.558 0.591 0.625 0.660 0.693 0.725 0.754 0.781 0.806 0.828 0.849 0.867 0.883 n= 4: 0.417 0.442 0.495 0.555 0.615 0.671 0.721 0.764 0.802 0.835 0.862 0.885 0.905 0.921 0.935 0.946 n= 6: 0.322 0.368 0.448 0.530 0.604 0.671 0.729 0.778 0.819 0.854 0.882 0.906 0.925 0.940 0.952 0.962 n= 8: 0.250 0.315 0.413 0.507 0.589 0.662 0.725 0.778 0.822 0.859 0.889 0.913 0.932 0.947 0.959 0.968 n=10: 0.201 0.281 0.390 0.489 0.576 0.653 0.719 0.775 0.821 0.859 0.890 0.915 0.934 0.949 0.962 0.971 n=12: 0.166 0.259 0.374 0.476 0.566 0.645 0.713 0.770 0.818 0.857 0.890 0.915 0.935 0.951 0.963 0.972 n=14: 0.142 0.244 0.364 0.467 0.559 0.638 0.707 0.766 0.815 0.855 0.888 0.914 0.935 0.951 0.963 0.972 n=16: 0.123 0.235 0.356 0.461 0.552 0.633 0.703 0.762 0.813 0.853 0.887 0.913 0.934 0.951 0.963 0.972 n=18: 0.109 0.228 0.350 0.456 0.548 0.629 0.700 0.760 0.810 0.852 0.885 0.912 0.933 0.950 0.963 0.972 n=20: 0.099 0.223 0.346 0.452 0.545 0.626 0.697 0.757 0.808 0.850 0.884 0.911 0.933 0.949 0.962 0.972 n=22: 0.090 0.218 0.343 0.449 0.542 0.624 0.694 0.756 0.806 0.849 0.883 0.910 0.932 0.949 0.962 0.972 n=24: 0.082 0.215 0.339 0.446 0.540 0.622 0.693 0.754 0.805 0.847 0.882 0.910 0.931 0.949 0.962 0.972 n=26: 0.076 0.212 0.337 0.444 0.538 0.619 0.691 0.753 0.804 0.847 0.881 0.909 0.931 0.948 0.961 0.971 n=28: 0.071 0.209 0.336 0.442 0.536 0.618 0.690 0.752 0.803 0.846 0.881 0.908 0.931 0.948 0.961 0.971 n=30: 0.066 0.208 0.334 0.441 0.534 0.617 0.689 0.751 0.802 0.845 0.880 0.908 0.930 0.948 0.961 0.971
POSITIVE E: E=0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 n= 2: 0.500 0.508 0.529 0.558 0.590 0.625 0.659 0.693 0.724 0.754 0.781 0.806 0.828 0.849 0.867 0.883 n= 4: 0.417 0.440 0.483 0.534 0.587 0.640 0.689 0.735 0.776 0.813 0.845 0.871 0.893 0.913 0.928 0.941 n= 6: 0.322 0.356 0.410 0.469 0.531 0.595 0.657 0.715 0.766 0.812 0.849 0.880 0.905 0.925 0.941 0.954 n= 8: 0.251 0.292 0.345 0.404 0.470 0.541 0.612 0.680 0.741 0.794 0.838 0.875 0.904 0.926 0.944 0.957 n=10: 0.201 0.245 0.295 0.351 0.416 0.488 0.565 0.641 0.711 0.772 0.824 0.865 0.898 0.923 0.942 0.957 n=12: 0.166 0.211 0.257 0.309 0.370 0.443 0.523 0.605 0.682 0.750 0.806 0.853 0.889 0.918 0.939 0.955 n=14: 0.142 0.186 0.230 0.276 0.334 0.405 0.487 0.572 0.655 0.728 0.790 0.841 0.881 0.912 0.935 0.952 n=16: 0.124 0.167 0.207 0.251 0.306 0.374 0.456 0.544 0.631 0.708 0.775 0.829 0.872 0.905 0.931 0.949 n=18: 0.109 0.153 0.191 0.231 0.282 0.349 0.430 0.521 0.610 0.691 0.761 0.818 0.864 0.899 0.926 0.946 n=20: 0.099 0.141 0.177 0.215 0.263 0.328 0.409 0.501 0.592 0.676 0.749 0.809 0.857 0.894 0.923 0.943 n=22: 0.090 0.132 0.166 0.201 0.248 0.311 0.392 0.483 0.577 0.664 0.739 0.801 0.851 0.890 0.919 0.941 n=24: 0.082 0.123 0.156 0.190 0.234 0.296 0.376 0.469 0.563 0.653 0.730 0.794 0.845 0.885 0.916 0.939 n=26: 0.076 0.117 0.148 0.181 0.223 0.284 0.363 0.456 0.553 0.643 0.722 0.788 0.840 0.881 0.913 0.937 n=28: 0.071 0.110 0.140 0.172 0.214 0.272 0.352 0.445 0.542 0.635 0.715 0.782 0.836 0.878 0.910 0.935 n=30: 0.066 0.105 0.134 0.165 0.205 0.263 0.341 0.435 0.534 0.628 0.709 0.778 0.832 0.875 0.908 0.933
It does not look like the failure probability tends to 0 for large N in all these models. Some of them seem to remain pretty large. Indeed, I can prove that failure-probability→1 in a different model!
CLAIM (Almost-sure failure): In an exponential model where item j is assessed at value r_{01}C^{-j} by Bob (where r_{01} is a uniform random deviate in the interval [0,1]) and as r_{01}C^{-j} by Alice (with all r_{01}'s independently random, here C>1 is a sufficiently large constant) the probability than an envy-free allocation exists, tends to 0 as N→∞.
THEOREM: There exist algorithms which, given Alice's and Bob's preference orderings for the N items as input, will determine whether an envy-free allocation exists in O(N) time and O(1) space; and if it does exist, will produce one in O(NlogN) time and O(N) space.
Here "space" counts words of memory; each memory cell is assumed to hold an integer (or "pointer") whose absolute value is bounded by O(N); and we assume a "pointer machine" in which additive and comparison operations on such integers can be performed in 1 step, and pointed-to memory cells can be accessed in 1 step.
PROOF SKETCH:
1. Label the items 1,2,...,N in order of Alice's preference. Then the only input is Bob's preference order. [This relabeling or renumbering can be done in O(N) time and space by methods the reader should know, but if not see e.g. Nijenhuis & Wilf 1978.] We'll assume this numbering convention from now on.
2. An envy-free permutation exists if, and only if, Bob's permutation does not fix any odd prefix. That is, Bob's first K most-prefered items, as a set, always differ from the set {1,2,3,...,K} whenever K is odd and 0<K≤N. This is the Existence Theorem of Brams, Kilgour, Klamler and immediately yields our linear-time existence-testing algorithm.
3. We shall use link/cut tree data structure, or more precisely we'll only need the considerably simpler "link/cut paths" special case of it, described in §5.2 and 5.3 of Tarjan 1983. [Link/cut trees also are described in the papers by Sleator & Tarjan.]
We assume from here on that an envy-free assignment is known from part a to exist. All we need to do is find one. The idea is as follows. We create a data structure storing Bob's preference order (the array Bob[1...N]). Our data structure will fit in O(N) words of memory, and it will allow at any time
Thus the entire data structure could be created from nothing by performing O(N) "adjoin(X)" steps in O(NlogN) total time. This data structure is readily provided by anybody who understands the "link/cut paths" data structure (it is a trivial modification of it). But for the benefit of novice readers we'll explain a way to build this data structure in §6.
4. Taking it as given that such a data structure is available for use, we work as follows. We argue that if two items a,b (with a<b) are chosen at random from the as-yet-unassigned ones, and assigned to Alice and Bob respectively, then that will, with chance at least 1/2, not ruin existence of an envy-free assignment.
This is since if there are 2M unallocated items, we know exactly M must belong to Alice and M to Bob in the unknown envy-free allocation; hence there are M^{2} ways to pick an (a,b) pair with a belonging to Alice and b to Bob, as versus (2M-1)M ways to pick an unconstrained (a,b) pair. The former is at least half of the latter.
So we do this
random choice, Delete the two items from Bob preference order, and
use Query to test whether
5. The expected number of data-structure operations that process invokes is 2N or less. Each one takes O(logN) time. The central limit theorem may be used to see that the chance that ≥2N+Z√N tests are needed falls exponentially with Z, so we get an almost sure O(NlogN) time bound too. Q.E.D.
The items in Bob's preference order will be stored in-order, as the nodes of a rooted "binary tree" which satisfies a "balance" condition. A suitable rooted-balanced-tree class are "red-black trees" defined by the criteria that each node has a color (red or black) and
To locate an item X in such a tree, walk down it starting from the root, at each node going to its left (right) child if X is less than (greater than) that node's item. This kind of "binary search" will find the tree node containing X (or prove it is not present, in which case it will find the appropriate place to surgically insert a new node now containing X, if then desired) in O(logN) steps.
It is possible to design the data structure we shall want, using red-black trees. (Red-black trees are discussed in chapter 13 of the textbook by Cormen, Leiserson, Rivest. A gnu free software implementation as a "literate program" called "libavl" by Ben Pfaff is available; another public source library has been posted by Julienne S. Walker.) But it is simpler to use "splay trees" (Sleator & Tarjan 1983) which actually do not satisfy any explicit balance condition at all (and do not have red/black node-color information) but nevertheless are guaranteed to enjoy O(logN)-time accessing on average over any sequence of accesses, new-node insertions, and node-deletions. Whenever a node is accessed in a splay tree, a rearrangement, called "splaying," is performed which causes X to become the new tree root.
If new nodes X are inserted into a red-black tree (or splay tree), or are removed from one, then rearrangements and/or node recolorings may need to be done. Suitable algorithms for doing that were invented by the original designers of red-black trees (and splay trees) which perform only "local" operations affecting only the nodes along the root-X path and perhaps their immediate neighbors, and hence which take O(logN) steps. These rearrangements and recolorings preserve/enforce the ordering and coloring properties of the tree, so that the new tree still supports O(logN) time accesses.
It is also possible to make such trees support O(logN)-time "rank queries" by also storing in each tree node, the number of its descendants. (Incidentally, if you are going to store that information in each tree node, that permits using other kinds of "balance conditions" besides "red black.") That way, you can determine in O(logN) steps (by inspecting the nodes along a root-X path), the number of tree nodes less than, and the number greater than, X, in tree-order. Also you can search for the Jth item in the tree, finding it after a walk taking O(logN) steps. The nodes' rank information needs to be updated as the tree is rearranged, or whenever we insert or delete a tree node, but again the updating rules are purely local so that the O(logN) time bounds remain valid. (This also permits random uniform sampling from the tree nodes in any specified range, if we can generate random uniform integers J in that range.)
We shall also store in the Kth node, the min (over all its descendants, and itself) of X_{j}-j, where j is the node number (i.e. rank) and X_{j} is its contents. This permits fast Query(L,U) operations by accessing L and U in the tree and reconstituting the minvalue from the min-values of the subtrees of nodes in the root-L and root-U paths. For splay trees this is simpler since we can just splay L to the root and the U to the right-child of the root, whereupon only the min-field rising from the left-child of U needs to be examined.
If this is done, additional update rules need to be devised. If a new tree-node is inserted (or deleted), then all items (there might be as many as N) "greater" than it in tree-order need to have 1 subtracted from (or added to) their min-fields. This is accomplished quickly by storing with each tree parent→child edge in the tree, a delta-value; tree nodes do not really store their min(X_{j}-j) values but instead additively-offset versions of these values. The correct additive offset is the sum of the delta values of the tree-edges along the root-X path. Then only O(logN) delta-values need to be updated (by adding ±1) when tree nodes are inserted or removed, which is much faster than updating N values naively would have been. If any local "rotation" rearrangement is applied to a (parent,child) node pair in our tree, then the delta values (and rank information fields, and min information fields) can be adjusted by suitable local operations.
This completes our sketch of the data structure. Data structure afficionados will note it combines features of both a "heap" and a "balanced tree" into one object. The reader should now be able to write computer code implementing it and should understand why all operations are supported by this structure in O(logN) time, at most, each. [Or in the case of the simpler splay-tree versions, in O(logN) "amortized time" per operation.]
This section shall consider a slightly different version of our Problem. We now suppose Bob, instead of stating a preference order, states a vector of N real-number scores, the Kth score pertaining to item K. (Bob prefers items with greater scores.)
Of course, if Alice and Bob provide scores rather than preference orders, it is trivial to convert the scores to preference orderings in O(NlogN) steps [using O(N) memory cells, each capable of containing a score; each step is an item-comparison or item-move] by using any O(NlogN)-time sorting algorithm, e.g. "merge sort." [Or, if randomized algorithms are permitted, one could use "quick sort," which runs in O(NlogN) expected time.] Therefore, our previous O(NlogN)-time algorithm remains O(NlogN)-time even with score-based input.
The task is to decide whether an envy-free allocation exists (this is clearly no harder than producing an allocation if "yes"). There actually are three possibilities about Bob's and Alice's envy-statuses:
We shall only allow algorithms which are "algebraic decision trees." That is: the algorithm is fed as input Bob's N scores and Alice's N scores. It then computes algebraic functions of those inputs and test whether the function is greater than, less than, or equal to, zero. Each test, and arithmetic operation (permitted ops: algebraic functions of a bounded number of inputs having bounded degree) takes one "timestep." Based on the results of preceding tests the algorithm decides what algebraic functions to compute, or outputs to emit, next. Such an algorithm can be viewed as a "decision tree" in the sense that each test "branches" to three possible "child nodes" of the tree. Finally, when the algorithm is done testing (i.e. reaches a tree "leaf") it outputs its final yes/no answer.
For those who want to permit randomized algorithms which also depend on (an arbitrary finite amount of) randomness, it also is possible to consider "randomized algebraic decision trees" (RADT). An RADT is a finite collection of plain ADTs, each with some probability-weight. The algorithm chooses which tree to employ with the aid of random numbers, and then uses it. We shall also allow replacing the word "finite" by "countably infinite." Further, we shall even allow uncountably infinite randomness provided it arises by giving the algorithm access to a source of random numbers uniform on the real interval [0,1], and then allowing it to make decisions based on algebraic functions of these as well as of the 4N genuine inputs – with at most O(N) random numbers allowed.
THEOREM (computational complexity lower bound): Any deterministic algebraic decision tree fed as input Bob's N scores and Alice's N scores, and which always outputs a correct answer of types 1 or 2 (for the deterministic case it is permissible to regard type 3 as merged with type 2) necessarily will have depth≥kNlogN for some positive constant k and an infinite set of N>0 for some suitable choice of 2N-component inputs. Further, any randomized algebraic decision tree which returns an answer of types 1, 2, or 3 with answer-correctness probability≥51% will have expected depth≥kNlogN again for some positive constant k and an infinite set of N>0 for some suitable choice of 2N-component inputs.
PROOF:
We follow the ideas and techniques in Ben-Or 1983 and Grigoriev et al 1996/7,
with a few minor changes.
Ben-Or's idea (e.g. see his "theorem 3")
was that if the number of connected components of the
input of some problem in N-dimensional space, which must yield output=1
(as opposed to the input-subset in N-space which yields output=0)
is large, then any algebraic decision tree describing those subsets
must have large depth. He gives specific
formulas which quantify this. The same argument will show that if the 1-set
is polyhedral – i.e. is a boolean combination of halfspaces, e.g.
H_{1} AND (H_{2} OR H_{3}) AND NOT H_{4} –
and if the boundary of that set consists of a large number of faces,
then the decision tree must have large depth. This boundary-based argument is the one
we shall use. Specifically, for our problem it is easy to see that the set
of possible Bob-score vectors for which an envy-free allocation exists, is
(a) polyhedral and (b) has a boundary containing at least
N!·N^{-O(1)} faces.
From this it follows that any algebraic decision tree must depth of order NlogN.
If we instead employ Grigoriev et al's "main theorem" we obtain the same result
now even for "randomized algebraic decision trees."
Q.E.D.
DISCUSSION: This "proof of a lower bound" may be evadable by resorting to algorithms which are not algebraic decision trees, e.g. compute non-algebraic functions, or make k-way choices for k large, rather than merely 2-way or 3-way choices. As a relevant example, consider the
"IS IT A PERMUTATION" PROBLEM: Given as input N real numbers. Decide whether those numbers happen to be a permutation of the N-element integer set {1,2,3,...,N}.
This problem requires at least order NlogN steps to decide via any algebraic decision tree algorithm (and that is achievable by sorting the inputs then checking X_{j}=j for all j=1,...,N). The proof is essentially the same as Ben-Or's Theorem 1 that the "element distinctness" problem requires NlogN steps; for us the permutation-condition is that max_{j}X_{j}=N and min_{j}X_{j}=1 (both checkable in linear time) and that (X_{j}-X_{k})^{2}≥1 if 1≤j<k≤N, which is an algebraic set containing exactly N! connected components (each a single point) in N dimensional space.
However, if we are allowed to employ non-algebraic functions such as sin(πx), which is zero if and only if x is an integer, or the floor function; and if we are allowed to make N-way choices such as effectively happen using indirect array accesses within an N-element read-and-writable array, then it is trivial to solve the "is it a permutation" problem in O(N) steps:
if(max_{j}X[j]≠N){ report("No."); quit; } if(min_{j}X[j]≠1){ report("No."); quit; } for(j=1 to N){ Y[j] ← 0; } for(j=1 to N){ if( sin(πX[j])≠0 ){ report("No."; quit; } if( Y[X[j]]≠0 ){ report("No."); quit; } Y[X[j]] ← 1; } report("Yes, X[1..N] was a permutation.");
In view of this example, there might be some reasonable way to evade our lower bound theorem/proof. And indeed, we earlier gave an O(N)-step non-algebraic algorithm which does solve the envy-free-allocation non-constructive existence yes/no problem! More precisely, the reason that particular algorithm can "accomplish the impossible" is it assumes pre-sorted input (the task of sorting it is NlogN-hard in ADT models).
However, our lower bound cannot be evaded by algorithms on a pointer machine which only make binary "if" decisions based only on algebraic functions of the input. The construction algorithm we gave in §5 (based on link/cut paths data structure) was in that class and hence within that class is an "optimal" algorithm.
One might have thought, initially, that envy-free allocations were going to be an excellent way to, e.g, divide up possessions in divorces. But the fact (§4) that real husbands and wives tend to have similar (not independent random) preference orders makes it much less likely that any envy-free allocation exists. Therefore, I now suspect that the whole envy-free allocation concept is mainly of mathematical importance (as testified by all the beautifully rich structure revealed throughout this paper), not practical importance.
If we had a goodly source of real-world preference-order-pair data for large N, then it would be easy to find out how often envy-free allocations exist, thereby confirming or denying this.
But I want to point out that envy-freeness can be much easier to attain, if, in addition to the indivisible items, there exists one continuum-divisible item called "money." Suppose Alice and Bob provide money-prices for each of the items, not just a preference ordering. I.e. A[j] and B[j] and Alice's and Bob's prices for item j. Ignoring the money for the moment, run the O(N)-step algorithm. Let us suppose that at some point it announces failure, i.e. discovers some odd K such that both Bob and Alice have the same top-K item sets. In that case, if A[K]≠B[K] then we can overcome this failure by creating a new fake "item," namely a pot of money with value ε+min(B[K],A[K]), and continuing on. (Here ε>0 is arbitrarily small.) In this way we either eventually overcome all failures, or run out of money. If the money supply is unbounded (e.g. if going into debt is permitted) then failure can be entirely eliminated. The precise minimal threshold amount of money needed to prevent failure, can be output by the resulting modified O(N)-step algorithm.
In particular, one piece of evidence showing the Power of Money is that our almost-sure failure theorem is falsified (the failure probability shrinks strictly below 1) if the available amount of money is any positive real.
Other open problems now include:
Our data structure is probably very useful re both these open problems.
(See also the sub-bibliography about power laws in economics.)
Michael Ben-Or: Lower bounds for algebraic computation trees, ACM Sympos. Theory of Computing (STOC) 15 (1983) 80-86.
Steven J. Brams, D. Marc Kilgour, Christian Klamler: Two-Person Fair Division of Indivisible Items: An Efficient, Envy-Free Algorithm, Notices Amer. Math'l. Soc. 61,2 (Feb. 2014) 130-141.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: Introduction to Algorithms, MIT press 2009.
Dima Grigoriev, Marek Karpinski, Friedhelm Meyer auf der Heide, Roman Smolensky: A lower bound for randomized algebraic decision trees, Computational Complexity 6,4 (1996/1997) 357-375. Also an earlier version was in ACM STOC 28 (1996) 612-619.
Thomas Koshy: Catalan Numbers with Applications, Oxford University Press 2008.
Friedhelm Meyer auf der Heide: Simulating probabilistic by deterministic algebraic computation trees, Theoretical Computer Science 41 (1985) 325-330.
Albert Nijenhuis & H.S.Wilf: Combinatorial Algorithms, Academic Press 1978.
Alexander Schrijver: Theory of Linear and Integer Programming, John Wiley & Sons 1998. (Chapters 19-21 are about totally unimodular matrices.)
D.D.Sleator & R.E.Tarjan: Self adjusting binary search trees, Journal of the ACM 32,3 (1985) 652-686.
D.D.Sleator & R.E. Tarjan: A Data Structure for Dynamic Trees, J. Computer & System Sciences 26,3 (1983) 362-391.
Eva Tardos: A strongly polynomial algorithm to solve combinatorial linear programs, Operations Research 34,2 (1986) 250-256.
Robert E. Tarjan: Data structures and network algorithms, SIAM (CBMS-NSF Regional Conference Series in Applied Mathematics #44) 1983.
Oswald Veblen & Philip Franklin: On matrices whose elements are integers, Annals of Maths. (Ser. 2) 23,1 (Sept. 1921) 1-15. [Reprinted with minor changes as appendix II, which is pages 170-189, in O.Veblen: Analysis situs, Amer. Math'l Soc. 2nd ed. 1931 (#30), 194 pages. But our and Schrivjer's rewrites are far superior to this original.]
Andrew Chi-Chih Yao: Decision Tree Complexity and Betti Numbers, J. Computer & System Sciences 55,1 (1997) 36-43.