Analysis Theorems

2 Inequalities

Theorem 2.1 (Cauchy-Schwarz Inequality). In an inner product space, \(|(x,y)| \leq \Vert x \Vert \Vert y \Vert \) with equality holding iff \(x\) and \(y\) are linearly dependent.

Proof. After multiplying \(x\) by an element of \(S^1\), we may assume \((x,y)\) is real. Consider \(z = y - \frac {(x,y)}{\Vert x\Vert ^2}x\), the projection of \(y\) onto the orthogonal complement of \(x\). Indeed \(x,z\) are orthogonal as \((z,x) = (y,x) - \frac {(x,y)}{\Vert x\Vert ^2}(x,x) = 0\). Then we have:

\[ 0 \leq (z,z) = (z,y) - \frac {(x,y)}{\Vert x\Vert ^2}(z,x) = (z,y) = (y,y) - \frac {(x,y)}{\Vert x \Vert ^2}(x,y) \]

which after rearrangement is what we want. Note that the equality above happens iff \(z = 0\) which happens iff \(x\) and \(y\) are linearly dependent. □

Theorem 2.2 (AM-GM Inequality). Let \(x_1, \dots ,x_n\) be non-negative reals. Then \(\prod _1^nx_i\leq \big (\frac {\sum _1^nx_i}{n}\big )^n\) with equality holding iff \(x_1 = \dots = x_n\).

Proof. We suppose some of the \(x_i\) are not equal and show the strict inequality via induction. If \(\mu \) denotes \(\frac {\sum _1^nx_i}{n}\) then WLOG we may assume \(x_n>\mu >x_{n-1}\). Then define \(y = x_n+x_{n-1}-\mu \) and note \(y\) is non-negative. Then by induction, we have

\[ y\prod _1^{n-2}x_i\leq \big (\frac {y+\sum _1^{n-2}x_i}{n-1}\big )^{n-1} = \mu ^{n-1} \]

Multiplying by \(\mu \), we get

\[ \mu y\prod _1^{n-2}x_i\leq \mu ^{n} \]

so it suffices to show \(\mu y>x_nx_{n-1}\), but this is true as

\[ y\mu -x_nx_{n-1} = (x_n+x_{n-1}-\mu )\mu -x_nx_{n-1} = (x_n-\mu )(\mu -x_{n-1})>0 \]

□

Jensen’s Inequality is a powerful inequality. To prove it in its measure-theoretic form, we need the notion of a subderivative and a convex function.

Definition 2.3. Let \(A \subset V\) be a convex subset of a real vector space. A function \(f: A \to \RR \) is convex if \(f(tx+(1-t)y) \leq tf(x)+(1-t)f(y)\) for \(t \in [0,1]\). It is strictly convex, if the inequality is strict for \(x\neq y\),\(t\neq 0,1\).

The following lemma is obvious.

Lemma 2.4. \(f\) is convex iff its graph with the set of points lying above it is convex.

Definition 2.5. Let \(f:\RR \to \RR \) be convex. A subderivative of \(f\) at \(p\) is a \(c\) such that \(f(x)-f(p) \geq c(x-p)\). The set of subderivatives of \(f\) at \(p\) is called the subdifferential.

Lemma 2.6. If \(f\) is convex, the subdifferential at \(p\) is \([a,b]\) where \(a = \lim _{x \to p^-}\frac {f(x)-f(p)}{x-p}\), \(b = \lim _{x \to p^+}\frac {f(x)-f(p)}{x-p}\). Moreover, these limits exist and \(a \leq b\). If \(f\) is strictly convex, then \(f(x)-f(y)> c(x-y)\) when \(x \neq y\) if c is a derivative.

Proof. WLOG, \(f(p)=p=0\). Setting \(y = 0\) in the definition of convex, we get \(f(tx) \leq tf(x)\), so \(\frac {f(x)}{x}\) is increasing for all \(x>0\), and \(x<0\). Now by convexity again, \(2f(0) \leq f(\ee )+f(-\ee )\) so \(\frac {f(-\ee )}{-\ee }\geq \frac {f(\ee )}{\ee }\). Thus the limits \(a,b\) exist, and are finite, and it is clear that \([a,b]\) is the subdifferential. Running through the proof for strictly convex \(f\) shows \(\frac {f(x)}{x}\) is strictly increasing, so that the strict inequality holds. □

Lemma 2.7. If \(f\) is \(\cC ^2\), with \(f''\geq 0\), it is convex. If \(f''>0\), it is strictly convex.

Proof. By Taylor’s Theorem, one computes

\[tf(x)+(1-t)f(y)-f(tx+(1-t)y) = t(f(x)-f(tx+(1-t)y)+(1-t)(f(y)-f(tx+(1-t)y))\]

\[=tf''(c)(1-t)^2(y-x)^2/2+(1-t)f''(c')t^2(y-x)^2/2\]

, which satisfies the correct inequalities by assumption. □

Theorem 2.8 (Jensen’s Inequality). If \((\Omega ,\mu )\) is a probability measure, \(f: \RR \to \RR \) a convex function, \(g\) a \(\mu \)-integrable function, then \(f(\int _\Omega g d\mu ) \leq \int _\Omega f \circ g d\mu \). If \(f\) is strictly convex, equality holds iff \(g\) takes constant value on a set of measure \(1\).

Proof. Define \(x_0 = \int _\Omega gd\mu \). By Lemma 2.6 for \(c\), there is \(a,b\) so \(ax+b\geq f(x)\), \(ax_0+b = f(x_0)\). Then \(f(\int _\Omega gd\mu ) = f(x_0) = ax_0+b = \int _\Omega (ag+b)d\mu \leq \int _\Omega f\circ g d\mu \). Equality then holds iff \(ag+b \neq f\circ g\) on a set of measure 0. If \(f\) is strictly convex, by Lemma 2.6, this holds iff \(g\) takes constant value on a set of measure \(1\). □

Now that we have this inequality, we can prove many inequalities more quickly, especially exploiting the convexity/concavity of functions like \(\log \).

Definition 2.9. The weighted power mean with exponent \(p\) is the function \(M_p(x_1,\dots ,x_n) = (\sum w_ix_i^p)^{1/p}\) where \(x_i\) are positive reals, and \(w_i\) are weights summing to 1.

In particular, \(M_\infty \) is the maximum, \(M_2\) the square mean, \(M_1\) the arithmetic mean, \(M_0\) the geometric mean, \(M_{-1}\) the harmonic mean, and \(M_{-\infty }\) the minimum. The only one which isn’t so obvious is \(M_0\), but to see this, by L’Hopital’s Rule and continuity of the exponential function,

\[\lim _{p \to 0}\big (\sum w_ix_i^p\big )^{\frac {1}{p}}=\lim _{p \to 0}e^{\log \big (\sum w_ix_i^p\big )^{\frac {1}{p}}} =e^{\lim _{p \to 0}\frac {\log \big (\sum w_ix_i^p\big )}{p}} = e^{\lim _{p \to 0}\frac {\sum w_ix_i^p \log (x_i)}{\big (\sum w_ix_i^p\big )}}\]

\[ = e^{\sum w_i\log (x_i)} = \prod x_i^{w_i} \]

The following generalizes Theorem 2.2.

Theorem 2.10 (Power Mean Inequality). Let \(x_i\) be positive reals, \(w_i\) weights. Then \(p < q \implies M_p \leq M_q\), with equality holding iff the \(x_i\) with positive weights are equal.

Proof. First we will prove the inequality for the cases \(p=0\),\(q=0\). By Jensen’s inequality using concavity of \(\log \), \(\log \prod x_i^{w_i} = \sum \frac {w_i}{q}\log x_i^q \leq \frac {\log (\sum w_ix_i^q)}{q}\) for \(p>0\) and \(q=0\) case is similar. Now it suffices to prove the inequality when \(pq>0\), and note that the \(p>0\) and \(p<0\) cases are equivalent since \(\big (\sum w_ix_i^{-p}\big )^{\frac {1}{-p}} = \frac {1}{\big (\sum w_i(\frac {1}{x_i})^{p}\big )^\frac {1}{p}}\). Now note \(x^{\frac {p}{q}}\) is concave, so \((\sum w_ix_i^p)^\frac {1}{p} = (\sum w_i(x_i^q)^{\frac {p}{q}})^\frac {1}{p} \leq (\sum w_ix_i^q)^{\frac {p}{q}\frac {1}{p}} =(\sum w_ix_i^q)^{\frac {1}{q}}\). □

We can also quickly get the Hölder inequality.

Lemma 2.11. \(x,y>0 \implies xy \leq \frac {x^p}{p}+\frac {x^q}{q}\) with \(\frac {1}{p}+\frac {1}{q}=1\).

Proof. We give two proofs. Jensen’s inequality gives \(\log (xy) = \frac {\log (x^p)} p + \frac {\log (y^q)} q \leq \log (\frac {x^p}{p}+\frac {y^q}{q})\). Alternatively, it is equivalent to prove \(x^\frac {1}{p}y^\frac {1}{q}\leq \frac {x}{p} + \frac {y}{q}\), which is homogeneous so WMA y = 1, in which case, we can optimize \(x\) to get the inequality. □

Theorem 2.12 (Hölder Inequality). Let \((\Omega ,\Sigma ,\mu )\) be a measure space, \(\frac {1}{p}+\frac {1}{q}=1\) and \(f,g\) measurable functions. Then \(\Vert fg\Vert _1 \leq \Vert f\Vert _p\Vert g\Vert _q\).

Proof. By Lemma 2.11,

\[\Vert fg\Vert _1 = \int _\Omega |fg|d\mu =\int _\Omega t|f|t^{-1}|g|d\mu \leq \int _\Omega t^p\frac {|f|^p}{p}d\mu + \int _\Omega t^{-q}\frac {|g|^q}{q}d\mu = \frac {t^p}{p}\Vert f\Vert _p^p + \frac {t^{-q}}{q}\Vert g\Vert _q^q\]

We optimize \(t\) to get \(t = \frac {\Vert g \Vert _q^{\frac {q}{p+q}}}{\Vert f\Vert _p^{\frac {p}{p+q}}}\), and plugging this in and simplifying yields the inequality. □

The Hölder inequality can establish that \(L^p\) spaces are normed vector spaces. The triangle inequality is the hard part of the proof.

Lemma 2.13. \(|x+y|^p \leq 2^{p-1}(|x|^p+|y|^p)\) for \(p>1\).

Proof. \(f(x)=x^p\) is convex, so \(|\frac {x+y}{2}|^p\leq \frac 1 2 (|x|^p+|y|^p)\). □

Theorem 2.14 (Minkowski’s Inequality). \(\Vert f+g\Vert _p \leq \Vert f \Vert _p+\Vert g \Vert _p\).

Proof. We give two proofs, both which use Hölder inequality. By the Lemma, \(\Vert f+g\Vert _p\) is finite if \(\Vert f\Vert _p, \Vert g\Vert _p\) are. Then by Hölder inequality,

\[\Vert f+g \Vert _p^p = \int |f+g|^p \leq \int |f||f+g|^{p-1}+|g||f+g|^{p-1} \]

\[ \leq (\Vert f\Vert _p +\Vert g\Vert _p)\Vert (f+g)^{p-1}\Vert _{\frac {p}{p-1}} = (\Vert f\Vert _p +\Vert g\Vert _p)\Vert f+g\Vert _p^{p-1}\]

.

For the second proof, we claim \(\Vert f\Vert _p = \sup _{\Vert h\Vert _q=1}\Vert fh\Vert _1\), with which the theorem follows from the triangle inequality for \(L^1\). The \(\geq \) follows from the Hölder inequality, and the \(\geq \) follows from setting \(h\) to be \(f^{p-1}\) divided by its norm. Note that this proof also shows the duality between \(L^p\) and \(L^q\) (indeed they are duals as Banach spaces). □