Given N≥2 point-sites in a Euclidean space, suppose we wish to find the point X (which is not necessarily a site) whose sum of distances to the N sites [call it F(X)], is minimal.
a. The "distance to a site" function is continuous and concave-∪, and since any sum of concave-∪ functions is concave-∪, we conclude F(X) is. Obviously, F(X)→∞ when |X|→∞. Therefore, F(X) has a global minimum.
b. If there are more sites on one side of X, moving X in that direction will decrease F(X).
c. The only lines on which the "distance to a site" function is not strictly concave-∪ is along lines through the site. So if the sites do not all lie on a line then F(X) will be strictly concave-∪ on every line.
d. Using the preceding puzzle, we see that the new X (call it Q) uniquely minimizes the weighted sum of squared distances to the sites. Let sk denote site k and let wk denote its weight (which is proportional to its reciprocated distance to X). Hence
with equality if and only if Q=X. Hence
Now note the identity
Applying this identity and simplifying the result a bit, we find
Because any weighted sum of squares (if the weights are positive) is positive (unless it is zero, which occurs only if each and every square is 0) we conclude
This "descent" result was first published by Endre Weiszfeld (whose name later changed to Andrew Vazsonyi) in 1937.
e. If X lies at a site, the weight of that site would be infinite (division by zero) hence our X-updating formula would be inapplicable. This actually is a symptom of a difficulty with this iteration – it is possible for it to "get stuck" very near a site for an arbitrarily large number of iterations time before it finally departs.
One way to avoid that difficulty is to start the Weiszfeld iteration at someplace better than the best site. The iteration will never be able to land on a site from then onward, due to the descent property. The "best site" means B=sk such that the kth row of the distance matrix has minimum sum. Applying a Weiszfeld iteration starting from B but only considering the N-1 other sites, yields a point W. It is easy to see that the direction from B to W is the steepest descent direction for F(X) at X=B, unless B is already optimum. Hence somewhere on the line segment BW, is a better point than B (if any exists). Indeed a point on this line segment lying arbitrarily close to B will do, unless B already is optimum. Hence a way to find a better-than-B point is to start at W, and repeatedly halve your distance to B (staying on the line segment BW) until the point is found.
The optimum point X is known as the Fermat-Steiner-Torricelli-Weber point (or some subset of these names).