Concavity of the squared sum of square roots

The p-norm of a vector x\in \mathbb{R}^n is defined to be:

\displaystyle{\Vert x \Vert_p = \left(\sum_{i=1}^n |x_i|^p\right)^{1/p}}.

If p \ge 1, then the p-norm is convex. When 0<p<1, this function is not convex and actually concave when all the entries of x are non-negative. On a recent exam for the course Convex Optimization, we were asked to prove this when p = 1/2. In this special case, the mathematics simplifies nicely.

When p = 1/2, the p-norm can be described as the squared sum of square roots. Specifically,

\displaystyle{\Vert x \Vert_{1/2} = \left(\sum_{i=1}^n \sqrt{|x_i|} \right)^2}.

Note that we can expand the square and rewrite the function as follows

\displaystyle{\Vert x \Vert_{1/2} = \sum_{i=1}^n\sum_{k=1}^n\sqrt{\left|x_i\right|} \sqrt{|x_k|} =\sum_{i=1}^n\sum_{k=1}^n \sqrt{|x_ix_k|}}.

If we restrict to x \in \mathbb{R}^n with x_i \ge 0 for all i, then this function simplifies to

\displaystyle{\Vert x \Vert_{1/2} =\sum_{i=1}^n\sum_{k=1}^n \sqrt{x_ix_k}},

which is a sum of geometric means. The geometric mean function (u,v) \mapsto \sqrt{uv} is concave when u,v \ge 0. This can be proved by calculating the Hessian of this function and verifying that it is negative semi-definite.

From this, we can conclude that each function x \mapsto \sqrt{x_ix_k} is also concave. This is because x \mapsto \sqrt{x_ix_k} is a linear function followed by a concave function. Finally, any sum of concave functions is also concave and thus \Vert x \Vert_{1/2} is concave.

Hellinger distance

A similar argument can be used to show that Hellinger distance is a convex function. Hellinger distance, d_H(\cdot,\cdot) is defined on pairs of probability distributions p and q on a common set \{1,\ldots,n\}. For such a pair,

\displaystyle{d_H(p,q) = \sum_{i=1}^n \left(\sqrt{p_i}-\sqrt{q_i}\right)^2},

which certainly doesn’t look convex. However, we can expand the square and use the fact that p and q are probability distributions. This shows us that Helligener distance can be written as

\displaystyle{d_H(p,q) = \sum_{i=1}^n p_i - 2\sum_{i=1}^n \sqrt{p_iq_i}+\sum_{i=1}^n q_i = 2 - 2\sum_{i=1}^n\sqrt{p_iq_i}}.

Again, each function (p,q) \mapsto \sqrt{p_iq_i} is concave and so the negative sum of such functions is convex. Thus, d_H(p,q) is convex.

The course

As a final comment, I’d just like to say how much I am enjoying the class. Prof. Stephen Boyd is a great lecturer and we’ve seen a wide variety of applications in the class. I’ve recently been reading a bit of John D Cook’s blog and agree with all he says about the course here.