Some Elementary Aspects of Means

We raise several elementary questions pertaining to various aspects of means. These questions refer to both known and newly introduced families of means, and include questions of characterizations of certain families, relations among certain families, comparability among the members of certain families, and concordance of certain sequences of means.They also include questions about internality tests for certain mean-looking functions and about certain triangle centers viewed as means of the vertices. The questions are accessible to people with no background in means, and it is also expected that these people can seriously investigate, and contribute to the solutions of, these problems. The solutions are expected to require no more than simple tools from analysis, algebra, functional equations, and geometry.


Definitions and Terminology
In all that follows, R denotes the set of real numbers and J denotes an interval in R.
By a data set (or a list) in a set , we mean a finite subset of  in which repetition is allowed.Although the order in which the elements of a data set are written is not significant, we sometimes find it convenient to represent a data set in  of size  by a point in   , the cartesian product of  copies of .
We will call a data set  = ( 1 , . . .,   ) in R ordered if  1 ≤ ⋅ ⋅ ⋅ ≤   .Clearly, every data set in R may be assumed ordered.
A mean of  variables (or a -dimensional mean) on J is defined to be any function M : J  → J that has the internality property min { 1 , . . .,   } ≤ M ( 1 , . . .,   ) ≤ max { 1 , . . .,   } (1) for all   in J.It follows that a mean M must have the property M(, . . ., ) =  for all  in J.
If M and N are two -dimensional means on J, then we say that M ≤ N if M( 1 , . . .,   ) ≤ N( 1 , . . .,   ) for all   ∈ J.We say that M < N if M( 1 , . . .,   ) < N( 1 , . . .,   ) for all   ∈ J for which  1 , . . .,   are not all equal.This exception is natural since M(, . . ., ) and N(, . . ., ) must be equal, with each being equal to .We say that M and N are comparable if M ≤ N or N ≤ M.

Examples of Means
The arithmetic, geometric, and harmonic means of two positive numbers were known to the ancient Greeks; see [ The celebrated inequalities H (, ) < G (, ) < A (, ) ∀,  > 0 (7) were also known to the Greeks and can be depicted in the well-known figure that is usually attributed to Pappus and that appears in [2, p. 364].Several other less well known means were also known to the ancient Greeks; see [1, pp. 84-90].The three means above, and their natural extensions to any number  of variables, are members of a large twoparameter family of means, known now as the Gini means and defined by where   are the Newton polynomials defined by Means of the type  ,−1 are known as Lehmer's means, and those of the type  ,0 are known as Hölder or power means.Other means that have been studied extensively are the elementary symmetric polynomial and elementary symmetric polynomial ratio means defined by where   is the th elementary symmetric polynomial in  variables, and where These are discussed in full detail in the encyclopedic work [3,Chapters III and V].
It is obvious that the power means P  defined by that correspond to the values  = −1 and  = 1 are nothing but the harmonic and arithmetic means H and A, respectively.It is also natural to set since lim  → 0 for all  1 , . . .,   > 0.
The inequalities (7) can be written as P −1 < P 0 < P 1 .These inequalities hold for any number of variables and they follow from the more general fact that P  ( 1 , . . .,   ), for fixed  1 , . . .,   > 0, is strictly increasing with .Power means are studied thoroughly in [3, Chapter III].

Mean-Producing Distances and Distance Means
It is natural to think of the mean of any list of points in any set to be the point that is closest to that list.It is also natural to think of a point as closest to a list of points if the sum of its distances from these points is minimal.This mode of thinking associates means to distances.
If  is a distance on , and if  = ( 1 , . . .,   ) is a data set in , then a -mean of  is defined to be any element of  at which the function attains its minimum.It is conceivable that (15) attains its minimum at many points, or nowhere at all.However, we shall be mainly interested in distances  on J for which (15) attains its minimum at a unique point   that, furthermore, has the property min { :  ∈ } ≤   ≤ max { :  ∈ } (16) for every data set .Such a distance is called a mean-producing or a mean-defining distance, and the point   is called the -mean of  or the mean of  arising from the distance  and will be denoted by   ().A mean M is called a distance mean if it is of the form   for some distance .

Problem Set 1.
(1-a) Characterize those distances on J that are mean-producing.
(1-b) Characterize those pairs of mean producing distances on J that produce the same mean.

Examples of Mean-Producing Distances
If  0 is the discrete metric defined on R by then the function () in ( 15) is nothing but the number of elements in the given data set  that are different from , and therefore every element having maximum frequency in  minimizes (15) and is hence a  0 -mean of .Thus the discrete metric gives rise to what is referred to in statistics as "the" mode of .Due to the nonuniqueness of the mode, the discrete metric is not a mean-producing distance.
Similarly, the usual metric  =  1 defined on R by is not a mean-producing distance.In fact, it is not very difficult to see that if  = ( is a mean-producing distance, although it is not a metric.In fact, it follows from simple derivative considerations that the function attains its minimum at the unique point Thus  2 is a mean-producing distance, and the corresponding mean is nothing but the arithmetic mean.It is noteworthy that the three distances that come to mind most naturally give rise to the three most commonly used "means" in statistics.In this respect, it is also worth mentioning that a fourth mean of statistics, the so-called midrange, will be encountered below as a very natural limiting distance mean. The distances  1 and  2 (and in a sense,  0 also) are members of the family   of distances defined by It is not difficult to see that if  > 1, then   is a mean-producing distance.In fact, if  = ( 1 , . . .,   ) is a given data set, and if then with equality if and only if  1 = ⋅ ⋅ ⋅ =   = .Thus  is convex and cannot attain its minimum at more than one point.That it attains its minimum follows from the continuity of (), the compactness of [ where sign() is defined to be 1 if  is nonnegative and −1 otherwise.
Note that no matter what  > 1 is, the two-dimensional mean   arising from   is the arithmetic mean.Thus when studying   , we confine our attention to the case when the number  of variables is greater than two.For such , it is impossible in general to compute   () in closed form.
It is highly likely that no two means   are comparable.

Deviation and Sparseness
If  is a mean-producing distance on , and if   is the associated mean, then it is natural to define the -deviation D  () of a data set  = ( 1 , . . .,   ) by an expression like Thus if  is defined by then   is nothing but the arithmetic mean or ordinary average  defined by and D  is the (squared) standard deviation  (2) given by In a sense, this provides an answer to those who are puzzled and mystified by the choice of the exponent 2 (and not any other exponent) in the standard definition of the standard deviation given in the right-hand side of (30).In fact, distance means were devised by the author in an attempt to remove that mystery.Somehow, we are saying that the ordinary average  and the standard deviation  (2) must be taken or discarded together, being both associated with the same distance  given in (28).Since few people question the sensibility of the definition of  given in (29), accepting the standard definition of the standard deviation given in (30) as is becomes a must.
It is worth mentioning that choosing an exponent other than 2 in (30) would result in an essentially different notion of deviations.More precisely, if one defines  () by then  () and  (2) would of course be unequal, but more importantly, they would not be monotone with respect to each other, in the sense that there would exist data sets  and  with  (2) () >  () () and  (2) () <  () ().Thus the choice of the exponent  in defining deviations is not as arbitrary as some may feel.On the other hand, it is (27) and not (31) that is the natural generalization of (30).This raises the following, expectedly hard, problem.
We end this section by introducing the notion of sparseness and by observing its relation with deviation.If  is a mean-producing distance on J, and if   is the associated mean, then the -sparseness S  () of a data set  = ( 1 , . . .,   ) in J can be defined by It is interesting that when  is defined by (28), the standard deviation coincides, up to a constant multiple, with the sparsenss.One wonders whether this pleasant property characterizes this distance .
Problem Set 4. (4-a) Characterize those mean-producing distances whose associated mean is the arithmetic mean.(4-b) If  is as defined in (28), and if   is another meanproducing distance whose associated mean is the arithmetic mean, does it follow that D   and D  are monotone with respect to each other?(4-c) Characterize those mean-producing distances  for which the deviation D  () is determined by the sparseness S  () for every data set , and vice versa.

Best Approximation Means
It is quite transparent that the discussion in the previous section regarding the distance mean   ,  > 1, can be written in terms of best approximation in ℓ   , the vector space R  endowed with the -norm ‖ ⋅ ⋅ ⋅ ‖  defined by     ( The discussion above motivates the following definition. Definition 1.Let J be an interval in R and let  be a distance on J  .If the diagonal Δ(J  ) of J  defined by In particular, if one denotes by   the best approximation -dimensional mean on R arising from (the distance on R  induced by) the norm ‖ ⋅ ⋅ ⋅ ‖  , then the discussion above says that   exists for all  > 1 and that it is equal to   defined in Section 4.
In view of this, one may also define  ∞ to be the best approximation mean arising from the It is not very difficult to see that  ∞ () is nothing but what is referred to in statistics as the mid-range of .Thus if  = ( 1 , . . .,   ) is an ordered data set, then In view of the fact that  ∞ cannot be defined by anything like (23) and  ∞ is thus meaningless, natural question arises as to whether for every .An affirmative answer is established in [5, Theorem 1].In that theorem, it is also established that lim for all  and all .All of this can be expressed by saying that   is continuous in  for  ∈ (1, ∞] for all . We remark that there is no obvious reason why (38) should immediately follow from the well known fact that lim for all points  in R  .
Problem Set 5. Suppose that   is a sequence of distances on a set  that converges to a distance  ∞ (in the sense that lim  → ∞   (, ) =  ∞ (, ) for all ,  in ).Let  ⊆ .
(5-a) If  is Chebyshev with respect to each   , is it necessarily true that  is Chebyshev with respect to  ∞ ?
(5-b) If  is Chebyshev with respect to each   and with respect to  ∞ and if   is the best approximant in  of  with respect to   and  ∞ is the best approximant in  of  with respect to  ∞ , does it follow that   converges to  ∞ ?
We end this section by remarking that if  =   is the -dimensional best approximation mean arising from a distance  on J  , then  is significant only up to its values of the type (, V), where  ∈ Δ(J  ) and V ∉ Δ(J  ).Other values of  are not significant.This, together with the fact that every mean is a best approximation mean arising from a metric, (41) makes the study of best approximation means less interesting.Fact (41) was proved in an unduly complicated manner in [6], and in a trivial way based on a few-line set-theoretic argument in [7].
Problem 6.Given a mean M on J, a metric  on J is constructed in [6] so that M is the best approximation mean arising from .Since the construction is extremely complicated in comparison with the construction in [7], it is desirable to examine the construction of  in [6] and see what other nice properties (such as continuity with respect to the usual metric)  has.This would restore merit to the construction in [6] and to the proofs therein and provide raison d'être for the so-called generalized means introduced there.

Towards a Unique Median
As mentioned earlier, the distance  1 on R defined by (23) does not give rise to a (distance) mean.Equivalently, the 1norm ‖ ⋅ ⋅ ⋅ ‖ 1 on R  defined by (34) does not give rise to a (best approximation) mean.These give rise, instead, to the many-valued function known as the median.Thus, following the statistician's mode of thinking, one may set From a mathematician's point of view, however, this leaves a lot to be desired, to say the least.The feasibility and naturality of defining  ∞ as the limit of   as  approaches ∞ gives us a clue on how the median  1 may be defined.It is a pleasant fact, proved in [5, Theorem 4], that the limit of   () (equivalently of   ()) as  decreases to 1 exists for every  ∈ R  and equals one of the medians described in (42).This limit can certainly be used as the definition of the median.
Problem Set 7. Let   be as defined in Section 4, and let  * be the limit of   as  decreases to 1.
(7-a) Explore how the value of  * () compares with the common practice of taking the median of  to be the midpoint of the median interval (defined in (42) for various values of .
(7-b) Is  * continuous on R  ?If not, what are its points of discontinuity?
The convergence of   () (as  decreases to 1) to  * () is described in [5, Theorem 4], where it is proved that the convergence is ultimately monotone.It is also proved in [5,Theorem 5] that when  = 3, then the convergence is monotone.
It is of course legitimate to question the usefulness of defining the median to be  * , but that can be left to statisticians and workers in relevant disciplines to decide.It is also legitimate to question the path that we have taken the limit along.In other words, it is conceivable that there exists, in addition to   , a sequence    of distances on R that converges to  1 such that the limit  * * , as  decreases to 1, of their associated distance means    is not the same as the limit  * of   .In this case,  * * would have as valid a claim as  * to being the median.However, the naturality of   may help accepting  * as a most legitimate median.
Problem Set 8. Suppose that   and    ,  ∈ N, are sequences of distances on a set  that converge to the distances  ∞ and   ∞ , respectively (in the sense that lim  → ∞   (, ) =  ∞ (, ) for all ,  in , etc.).(

Examples of Distance Means
It is clear that the arithmetic mean is the distance mean arising from the the distance  2 given by  2 (, ) = ( − ) 2 .
Similarly, the geometric mean on the set of positive numbers is the distance mean arising from the distance  G given by In fact, this should not be amazing since the arithmetic mean A on R and the geometric mean G on (0, ∞) are equivalent in the sense that there is a bijection  : (0, ∞) → R, namely () = ln , for which G(, ) =  −1 A((), ()) for all , .Similarly, the harmonic and arithmetic means on (0, ∞) are equivalent via the bijection ℎ() = 1/, and therefore the harmonic mean is the distance mean arising from the distance  H given by The analogous question pertaining to the logarithmic mean L defined by remains open.
Problem 9. Decide whether the mean L (defined in ( 45)) is a distance mean.

Quasi-Arithmetic Means
A -dimensional mean M on J is called a quasi-arithmetic mean if there is a continuous strictly monotone function  from J to an interval I in R such that for all   in J.We have seen that the geometric and harmonic means are quasi-arithmetic and concluded that they are distance means.To see that L is not quasi-arithmetic, we observe that the (two-dimensional) arithmetic mean, and hence any quasi-arithmetic mean M, satisfies the elegant functional equation M (M (M (, ) , ) , M (M (, ) , )) = M (, ) (47) for all ,  > 0. However, a quick experimentation with a random pair (, ) shows that (47) is not satisfied by L. This shows that L is not quasi-arithmetic, but does not tell us whether L is a distance mean, and hence does not answer Problem 9.
The functional equation ( 47) is a weaker form of the functional equation for all , , ,  > 0. This condition, together with the assumption that M is strictly increasing in each variable, characterizes two-dimensional quasi-arithmetic means; see [8,Theorem 1,.A thorough discussion of quasiarithmetic means can be found in [3,8].
Problem 10.Decide whether a mean M that satisfies the functional equation (47) (together with any necessary smoothness conditions) is necessarily a quasi-arithmetic mean.

Deviation Means
Deviation means were introduced in [9] and were further investigated in [10].They are defined as follows.
A real-valued function  = (,) on R 2 is called a deviation if (, ) = 0 for all  and if (, ) is a strictly decreasing continuous function of  for every .If  is a deviation, and if  1 , . . .,   are given, then the -deviation mean of  1 , . . .,   is defined to be the unique zero of  ( 1 , ) + ⋅ ⋅ ⋅ +  (  , ) . (49) It is direct to see that (49) has a unique zero and that this zero does indeed define a mean.
Problem 11.Characterize deviation means and explore their exact relationship with distance means.
If  is a deviation, then (following [11]), one may define   by Then   (, ) ≥ 0 and   (, ) is a strictly convex function in  for every .The -deviation mean of  1 , . . .,   is nothing but the unique value of  at which   ( 1 , )+⋅ ⋅ ⋅+  (  , ) attains its minimum.Thus if   happens to be symmetric, then   would be a distance and the -deviation mean would be the distance mean arising from the distance   .

Other Ways of Generating New Means
If  and  are differentiable on an open interval J, and if  <  are points in J such that () ̸ = (), then there exists, by Cauchy's mean value theorem, a point  in (, ), such that If  and  are such that  is unique for every , , then we call  the Cauchy mean of  and  corresponding to the functions  and , and we denote it by C , (, ).
Another natural way of defining means is to take a continuous function  that is strictly monotone on J, and to define the mean of ,  ∈ J,  ̸ = , to be the unique point  in (, ) such that We call  the mean value (mean) of  and  corresponding to , and we denote it by V(, ).
Clearly, if  is an antiderivative of , then (53) can be written as Thus V  (, ) = C , (, ), where  is the identity function.
For more on the these two families of means, the reader is referred to [12] and [13], and to the references therein.
In contrast to the attitude of thinking of the mean as the number that minimizes a certain function, there is what one may call the Chisini attitude that we now describe.A function  on J  may be called a Chisini function if and only if the equation has a unique solution  =  ∈ [ 1 ,   ] for every ordered data set ( 1 , . . .,   ) in J.This unique solution is called the Chisini mean associated to .In Chisini's own words,  is said to be the mean of  numbers  1 , . . .,   with respect to a problem, in which a function of them ( 1 , . . .,   ) is of interest, if the function assumes the same value when all the  ℎ are replaced by the mean value : ( 1 , . . .,   ) = (, . . ., ); see [14, page 256] and [1].Examples of such Chisini means that arise in geometric configurations can be found in [15].
Problem 12. Investigate how the families of distance, deviation, Cauchy, mean value, and Chisini means are related.

Internality Tests
According to the definition of a mean, all that is required of a function M : J  → J to be a mean is to satisfy the internality property for all   ∈ J.However, one may ask whether it is sufficient, for certain types of functions M, to verify (55) for a finite, preferably small, number of well-chosen -tuples.This question is inspired by certain elegant theorems in the theory of copositive forms that we summarize below.
12.1.Copositivity Tests for Quadratic and Cubic Forms.By a (real) form in  variables, we shall always mean a homogeneous polynomial  = ( 1 , . . .,   ) in the indeterminates  1 , . . .,   having coefficients in R. When the degree  of a form  is to be emphasized, we call  a -form.Forms of degrees 1, 2, 3, 4, and 5 are referred to as linear, quadratic, cubic, quartic, and quintic forms, respectively.
The set of all -forms in  variables is a vector space (over R) that we shall denote by F ()   .It may turn out to be an interesting exercise to prove that the set is a basis, where   is the Newton polynomial defined by The statement above is quite easy to prove in the special case  ≤ 3, and this is the case we are interested in in this paper.We also discard the trivial case  = 1 and assume always that  ≥ 2.
Linear forms can be written as  1 , and they are not worth much investigation.Quadratic forms can be written as Cubic and quartic forms can be written, respectively, as A form  = ( 1 , . . .,   ) is said to be copositive if ( 1 , . . .,   ) ≥ 0 for all   ≥ 0. Copositive forms arise in the theory of inequalities and are studied in [14] (and in references therein).One of the interesting questions that one may ask about forms pertains to algorithms for deciding whether a given form is copositive.This problem, in full generality, is still open.However, for quadratic and cubic forms, we have the following satisfactory answers.Theorem 2. Let  = ( 1 , . . .,   ) be a real symmetric form in any number  ≥ 2 of variables.Let v ()   , 1 ≤  ≤ , be the -tuple whose first  coordinates are 1's and whose remaining coordinates are 0  s.
(62) Part (ii) was proved in [17] for  ≤ 3 and in [18] for all .Two very short and elementary inductive proofs are given in [19].
It is worth mentioning that the  test -tuples in (61) do not suffice for establishing the copositivity of a quartic form even when  = 3.An example illustrating this that uses methods from [20] can be found in [19].However, an algorithm for deciding whether a symmetric quartic form  in  variables is copositive that consists in testing  at -tuples of the type is established in [21].It is also proved there that if  = 3, then the same algorithm works for quintics but does not work for forms of higher degrees.

Internality Tests for Means Arising from Symmetric
Forms.Let F ()  be the vector space of all real -forms in  variables, and let   , 1 ≤  ≤ , be the Newton polynomials defined in (57).Means of the type where   is a symmetric form of degree , are clearly symmetric and 1-homogeneous, and they abound in the literature.These include the family of Gini means  , defined in (8) (and hence the Lehmer and Hölder means).They also include the elementary symmetric polynomial and elementary symmetric polynomial ratio means defined earlier in (10).
In view of Theorem 2 of the previous section, it is tempting to ask whether the internality of a function M of the type described in (64) can be established by testing it at a finite set of test -tuples.Positive answers for some special cases of (64), and for other related types, are given in the following theorem.
Parts (i) and (ii) are restatements of Theorems 3 and 5 in [16].Part (iii) is proved in [22] in a manner that leaves a lot to be desired.Besides being rather clumsy, the proof works for  ≤ 4 only.The problem for  ≥ 5, together with other open problems, is listed in the next problem set.
Problem Set 13.Let , , and  be real symmetric cubic forms of degrees 1, 2, and 3, respectively, in  non-negative variables.
(13-a) Prove or disprove that 3  √  is internal if and only if it is internal at the  test -tuples (13-b) Find, or prove the nonexistence of, a finite set  of test -tuples such that the internality of / at the tuples in  gurantees its internality at all nonnegative -tuples.
(13-c) Find, or prove the nonexistence of, a finite set  of test -tuples such that the internality of  ± √ at the -tuples in  guarantees its internality at all nonnegative -tuples.
Problem (13-b) is open even for  = 2.In Section 6 of [15], it is shown that the two pairs (1, 0) and (1, 1) do not suffice as test pairs.
As for Problem (13-c), we refer the reader to [23], where means of the type  ± √ were considered.It is proved in Theorem 2 there that when  has the special form ∏ 1≤<≤ (  −   ) 2 , then  ± √ is internal if and only if it is internal at the two test -tuples k ()  = (1, 1, . . ., 1) and k ()  −1 = (1, 1, . . ., 1, 0).In the general case, sufficient and necessary conditions for internality of  ± √, in terms of the coefficients of  and , are found in [23,Theorem 3].However, it is not obvious whether these conditions can be rewritten in terms of test -tuples in the manner done in Theorem 3.

Extension of Means, Concordance of Means
The two-dimensional arithmetic mean A (2) defined by can be extended to any dimension  by setting Although very few people would disagree on this, nobody can possibly give a mathematically sound justification of the feeling that the definition in (68) is the only (or even the best) definition that makes the sequence  () of means harmonious or concordant.This does not seem to be an acceptable definition of the notion of concordance.In a private communication several years ago, Professor Zsolt Páles told me that Kolmogorov suggested calling a sequence M () of means on J, where M () is -dimensional, concordant if for every  and  and every   ,   in J, we have M (+) ( 1 , . . .,   ,  1 , . . .,   ) = M (2) (M () ( 1 , . . .,   ) , M  ( 1 , . . .,   )) .
(69) He also told me that such a definition is too restrictive and seems to confirm concordance in the case of the quasi-arithmetic means only.
Problem 14. Suggest a definition of concordance, and test it on sequences of means that you feel concordant.In particular, test it on the existing generalizations, to higher dimensions, of the logarithmic mean L defined in (45).

Distance Functions in Topology
Distance functions, which are not necessarily metrics, have appeared early in the literature on topology.Given a distance function  on any set , one may define the open ball (, ) in the usual manner, and then one may declare a subset  ⊆  open if it contains, for every  ∈ , an open ball (, ) with  > 0. If  has the triangle inequality, then one can proceed in the usual manner to create a topology.However, for a general distance , this need not be the case, and distances that give rise to a coherent topology in the usual manner are called semimetrics and they are investigated and characterized in [24][25][26][27][28][29].Clearly, these are the distances  for which the family {(, ) :  > 0} of open balls centered at  ∈  forms a local base at  for every  in .

Problem 3 .
Let  be the distance defined by (, ) = | − |  , and let the associated deviation D  defined in (27) be denoted by D  .Is D  monotone with respect to D 2 for any  ̸ = 2, in the sense that 1, International Journal of Mathematics and Mathematical Sciences pp. 84-90].They are usually denoted by A, G, and H, respectively, and are defined, for ,  > 0, by 1 -mean of .Similarly, one can show that if  is of an odd size  = 2 − 1, then   is the unique  1mean of .Thus the usual metric on R gives rise to what is referred to in statistics as "the" median of .On the other hand, the distance  2 defined on R by 1 ,   ], and the obvious fact that () is increasing on [  , ∞) and is decreasing on (−∞,  1 ].If we denote the mean that   defines by   , then   () is the unique zero of ( 1 , ...,   ) is just another way of saying that the point (, ..., ) is a best approximant in Δ  of the point ( 1 , ...,   ) with respect to the -norm given in (34).Here, a point   in a subset  of a metric (or distance) space (, ) is said to be a best approximant in  of  ∈  if (,   ) = min{(, ) :  ∈ }.Also, a subset  of (, ) is said to be Chebyshev if every  in  has exactly one best approximant in ; see[4, p. 21].
then  is copositive if and only if  ≥ 0 at the two test -tuples