Back to Ishan Levy’s website

Analysis Theorems

2 Inequalities

  • Theorem 2.1 (Cauchy-Schwarz Inequality). In an inner product space, \(|(x,y)| \leq \Vert x \Vert \Vert y \Vert \) with equality holding iff \(x\) and \(y\) are linearly dependent.

  • Proof. After multiplying \(x\) by an element of \(S^1\), we may assume \((x,y)\) is real. Consider \(z = y - \frac {(x,y)}{\Vert x\Vert ^2}x\), the projection of \(y\) onto the orthogonal complement of \(x\). Indeed \(x,z\) are orthogonal as \((z,x) = (y,x) - \frac {(x,y)}{\Vert x\Vert ^2}(x,x) = 0\). Then we have:

    \[ 0 \leq (z,z) = (z,y) - \frac {(x,y)}{\Vert x\Vert ^2}(z,x) = (z,y) = (y,y) - \frac {(x,y)}{\Vert x \Vert ^2}(x,y) \]

    which after rearrangement is what we want. Note that the equality above happens iff \(z = 0\) which happens iff \(x\) and \(y\) are linearly dependent.

  • Theorem 2.2 (AM-GM Inequality). Let \(x_1, \dots ,x_n\) be non-negative reals. Then \(\prod _1^nx_i\leq \big (\frac {\sum _1^nx_i}{n}\big )^n\) with equality holding iff \(x_1 = \dots = x_n\).

  • Proof. We suppose some of the \(x_i\) are not equal and show the strict inequality via induction. If \(\mu \) denotes \(\frac {\sum _1^nx_i}{n}\) then WLOG we may assume \(x_n>\mu >x_{n-1}\). Then define \(y = x_n+x_{n-1}-\mu \) and note \(y\) is non-negative. Then by induction, we have

    \[ y\prod _1^{n-2}x_i\leq \big (\frac {y+\sum _1^{n-2}x_i}{n-1}\big )^{n-1} = \mu ^{n-1} \]

    Multiplying by \(\mu \), we get

    \[ \mu y\prod _1^{n-2}x_i\leq \mu ^{n} \]

    so it suffices to show \(\mu y>x_nx_{n-1}\), but this is true as

    \[ y\mu -x_nx_{n-1} = (x_n+x_{n-1}-\mu )\mu -x_nx_{n-1} = (x_n-\mu )(\mu -x_{n-1})>0 \]

Jensen’s Inequality is a powerful inequality. To prove it in its measure-theoretic form, we need the notion of a subderivative and a convex function.

  • Definition 2.3. Let \(A \subset V\) be a convex subset of a real vector space. A function \(f: A \to \RR \) is convex if \(f(tx+(1-t)y) \leq tf(x)+(1-t)f(y)\) for \(t \in [0,1]\). It is strictly convex, if the inequality is strict for \(x\neq y\),\(t\neq 0,1\).

The following lemma is obvious.

  • Lemma 2.4. \(f\) is convex iff its graph with the set of points lying above it is convex.

  • Definition 2.5. Let \(f:\RR \to \RR \) be convex. A subderivative of \(f\) at \(p\) is a \(c\) such that \(f(x)-f(p) \geq c(x-p)\). The set of subderivatives of \(f\) at \(p\) is called the subdifferential.

  • Lemma 2.6. If \(f\) is convex, the subdifferential at \(p\) is \([a,b]\) where \(a = \lim _{x \to p^-}\frac {f(x)-f(p)}{x-p}\), \(b = \lim _{x \to p^+}\frac {f(x)-f(p)}{x-p}\). Moreover, these limits exist and \(a \leq b\). If \(f\) is strictly convex, then \(f(x)-f(y)> c(x-y)\) when \(x \neq y\) if c is a derivative.

  • Proof. WLOG, \(f(p)=p=0\). Setting \(y = 0\) in the definition of convex, we get \(f(tx) \leq tf(x)\), so \(\frac {f(x)}{x}\) is increasing for all \(x>0\), and \(x<0\). Now by convexity again, \(2f(0) \leq f(\ee )+f(-\ee )\) so \(\frac {f(-\ee )}{-\ee }\geq \frac {f(\ee )}{\ee }\). Thus the limits \(a,b\) exist, and are finite, and it is clear that \([a,b]\) is the subdifferential. Running through the proof for strictly convex \(f\) shows \(\frac {f(x)}{x}\) is strictly increasing, so that the strict inequality holds.

  • Lemma 2.7. If \(f\) is \(\cC ^2\), with \(f''\geq 0\), it is convex. If \(f''>0\), it is strictly convex.

  • Proof. By Taylor’s Theorem, one computes

    \[tf(x)+(1-t)f(y)-f(tx+(1-t)y) = t(f(x)-f(tx+(1-t)y)+(1-t)(f(y)-f(tx+(1-t)y))\]

    \[=tf''(c)(1-t)^2(y-x)^2/2+(1-t)f''(c')t^2(y-x)^2/2\]

    , which satisfies the correct inequalities by assumption.

  • Theorem 2.8 (Jensen’s Inequality). If \((\Omega ,\mu )\) is a probability measure, \(f: \RR \to \RR \) a convex function, \(g\) a \(\mu \)-integrable function, then \(f(\int _\Omega g d\mu ) \leq \int _\Omega f \circ g d\mu \). If \(f\) is strictly convex, equality holds iff \(g\) takes constant value on a set of measure \(1\).

  • Proof. Define \(x_0 = \int _\Omega gd\mu \). By Lemma 2.6 for \(c\), there is \(a,b\) so \(ax+b\geq f(x)\), \(ax_0+b = f(x_0)\). Then \(f(\int _\Omega gd\mu ) = f(x_0) = ax_0+b = \int _\Omega (ag+b)d\mu \leq \int _\Omega f\circ g d\mu \). Equality then holds iff \(ag+b \neq f\circ g\) on a set of measure 0. If \(f\) is strictly convex, by Lemma 2.6, this holds iff \(g\) takes constant value on a set of measure \(1\).

Now that we have this inequality, we can prove many inequalities more quickly, especially exploiting the convexity/concavity of functions like \(\log \).

  • Definition 2.9. The weighted power mean with exponent \(p\) is the function \(M_p(x_1,\dots ,x_n) = (\sum w_ix_i^p)^{1/p}\) where \(x_i\) are positive reals, and \(w_i\) are weights summing to 1.

In particular, \(M_\infty \) is the maximum, \(M_2\) the square mean, \(M_1\) the arithmetic mean, \(M_0\) the geometric mean, \(M_{-1}\) the harmonic mean, and \(M_{-\infty }\) the minimum. The only one which isn’t so obvious is \(M_0\), but to see this, by L’Hopital’s Rule and continuity of the exponential function,

\[\lim _{p \to 0}\big (\sum w_ix_i^p\big )^{\frac {1}{p}}=\lim _{p \to 0}e^{\log \big (\sum w_ix_i^p\big )^{\frac {1}{p}}} =e^{\lim _{p \to 0}\frac {\log \big (\sum w_ix_i^p\big )}{p}} = e^{\lim _{p \to 0}\frac {\sum w_ix_i^p \log (x_i)}{\big (\sum w_ix_i^p\big )}}\]

\[ = e^{\sum w_i\log (x_i)} = \prod x_i^{w_i} \]

The following generalizes Theorem 2.2.

  • Theorem 2.10 (Power Mean Inequality). Let \(x_i\) be positive reals, \(w_i\) weights. Then \(p < q \implies M_p \leq M_q\), with equality holding iff the \(x_i\) with positive weights are equal.

  • Proof. First we will prove the inequality for the cases \(p=0\),\(q=0\). By Jensen’s inequality using concavity of \(\log \), \(\log \prod x_i^{w_i} = \sum \frac {w_i}{q}\log x_i^q \leq \frac {\log (\sum w_ix_i^q)}{q}\) for \(p>0\) and \(q=0\) case is similar. Now it suffices to prove the inequality when \(pq>0\), and note that the \(p>0\) and \(p<0\) cases are equivalent since \(\big (\sum w_ix_i^{-p}\big )^{\frac {1}{-p}} = \frac {1}{\big (\sum w_i(\frac {1}{x_i})^{p}\big )^\frac {1}{p}}\). Now note \(x^{\frac {p}{q}}\) is concave, so \((\sum w_ix_i^p)^\frac {1}{p} = (\sum w_i(x_i^q)^{\frac {p}{q}})^\frac {1}{p} \leq (\sum w_ix_i^q)^{\frac {p}{q}\frac {1}{p}} =(\sum w_ix_i^q)^{\frac {1}{q}}\).

We can also quickly get the Hölder inequality.

  • Lemma 2.11. \(x,y>0 \implies xy \leq \frac {x^p}{p}+\frac {x^q}{q}\) with \(\frac {1}{p}+\frac {1}{q}=1\).

  • Proof. We give two proofs. Jensen’s inequality gives \(\log (xy) = \frac {\log (x^p)} p + \frac {\log (y^q)} q \leq \log (\frac {x^p}{p}+\frac {y^q}{q})\). Alternatively, it is equivalent to prove \(x^\frac {1}{p}y^\frac {1}{q}\leq \frac {x}{p} + \frac {y}{q}\), which is homogeneous so WMA y = 1, in which case, we can optimize \(x\) to get the inequality.

  • Theorem 2.12 (Hölder Inequality). Let \((\Omega ,\Sigma ,\mu )\) be a measure space, \(\frac {1}{p}+\frac {1}{q}=1\) and \(f,g\) measurable functions. Then \(\Vert fg\Vert _1 \leq \Vert f\Vert _p\Vert g\Vert _q\).

  • Proof. By Lemma 2.11,

    \[\Vert fg\Vert _1 = \int _\Omega |fg|d\mu =\int _\Omega t|f|t^{-1}|g|d\mu \leq \int _\Omega t^p\frac {|f|^p}{p}d\mu + \int _\Omega t^{-q}\frac {|g|^q}{q}d\mu = \frac {t^p}{p}\Vert f\Vert _p^p + \frac {t^{-q}}{q}\Vert g\Vert _q^q\]

    We optimize \(t\) to get \(t = \frac {\Vert g \Vert _q^{\frac {q}{p+q}}}{\Vert f\Vert _p^{\frac {p}{p+q}}}\), and plugging this in and simplifying yields the inequality.

The Hölder inequality can establish that \(L^p\) spaces are normed vector spaces. The triangle inequality is the hard part of the proof.

  • Lemma 2.13. \(|x+y|^p \leq 2^{p-1}(|x|^p+|y|^p)\) for \(p>1\).

  • Proof. \(f(x)=x^p\) is convex, so \(|\frac {x+y}{2}|^p\leq \frac 1 2 (|x|^p+|y|^p)\).

  • Theorem 2.14 (Minkowski’s Inequality). \(\Vert f+g\Vert _p \leq \Vert f \Vert _p+\Vert g \Vert _p\).

  • Proof. We give two proofs, both which use Hölder inequality. By the Lemma, \(\Vert f+g\Vert _p\) is finite if \(\Vert f\Vert _p, \Vert g\Vert _p\) are. Then by Hölder inequality,

    \[\Vert f+g \Vert _p^p = \int |f+g|^p \leq \int |f||f+g|^{p-1}+|g||f+g|^{p-1} \]

    \[ \leq (\Vert f\Vert _p +\Vert g\Vert _p)\Vert (f+g)^{p-1}\Vert _{\frac {p}{p-1}} = (\Vert f\Vert _p +\Vert g\Vert _p)\Vert f+g\Vert _p^{p-1}\]

    .

    For the second proof, we claim \(\Vert f\Vert _p = \sup _{\Vert h\Vert _q=1}\Vert fh\Vert _1\), with which the theorem follows from the triangle inequality for \(L^1\). The \(\geq \) follows from the Hölder inequality, and the \(\geq \) follows from setting \(h\) to be \(f^{p-1}\) divided by its norm. Note that this proof also shows the duality between \(L^p\) and \(L^q\) (indeed they are duals as Banach spaces).