Notes on Chapter 7 (Differentiation) of Walter Rudin's Real and Complex Analysis. I tried to actually section this out properly this time! Long chapter!
Preliminaries
Symmetric Derivative
Fix a dimension \(k\). Let \(\mu\) be a complex Borel measure. Define the quotients $$(Q_r\mu)(x) = \frac{\mu(B(x,r))}{m(B(x,r))}$$ and define the symmetric derivative of \(\mu\) at \(x\) by $$(D\mu)(x)=\lim_{r\rightarrow 0}\ (Q_r\mu)(x)$$ where this limit exists. For \(\mu\geq 0\), define the maximal function \(M\mu\): $$(M\mu)(x) = \sup_{0<r<\infty}\ (Q_r\mu)(x)$$ and for an arbitrary complex Borel measure, the maximal function will be defined to be that of its total variation. This function will be used to study the symmetric derivative. This function is lower semicontinuous.
We can show that the maximal function of a measure cannot be large on a large set:
"Tail bound" for maximal functions of measures: If \(\mu\) is a complex Borel measure on \(\mathbb{R}^k\) and \(\lambda\) is a positive number, then $$m\{M\mu > \lambda\}\leq 3^k\lambda^{-1}\|\mu\|.$$ □
Here \(\|\mu\|=|\mu|(\mathbb{R}^k)\) and the LHS abbreviates \(m(\{x\in \mathbb{R}^k:(M\mu)(x)>\lambda\})\).
This is shown by constructing a suitable covering of any compact subset of the open set \(\{M\mu>\lambda\}\).
Weak \(L^1\)
If \(f\in L^1(\mathbb{R}^k)\) and \(\lambda > 0\), then $$m\{|f|>\lambda\} \leq \lambda^{-1}\|f\|_1$$ because putting \(E=\{|f|>\lambda\}\) we have $$\lambda m(E) \leq \int_E|f|\ dm \leq \int_{\mathbb{R}^k} |f|\ dm = \|f\|_1.$$
We thus say that any measurable function \(f\) for which \(\lambda m\{|f|>\lambda\}\) is a bounded function of \(\lambda\) on \((0, \infty)\) is said to belong to weak \(L^1\). (\(1/x\) on \((0, 1)\) is in weak \(L^1\) but not \(L^1\).)
Analogously to measures, for each \(f\in L^1(\mathbb{R}^k)\) we define its maximal function \(Mf: \mathbb{R}^k\rightarrow [0,\infty]\) by setting $$(Mf)(x)=\sup_{0<r<\infty}\frac{1}{m(B_r)}\int_{B(x, r)} |f|\ dm.$$
In particular, setting \(d\mu = f\ dm\) we see that this definition agrees with the previous on measures. Hence the tail bound on maximal functions of measures says that the "maximal operator" \(M\) sends \(L^1\) to weak \(L^1\), with the following tail bound:
"Tail bound" for maximal functions: For every \(f\in L^1(\mathbb{R}^k)\) and every \(\lambda > 0\), $$m\{Mf>\lambda\}\leq 3^k\lambda^{-1}\|f\|_1.$$
Lebesgue points
If \(f\in L^1(\mathbb{R}^k)\), any \(x\in \mathbb{R}^k\) for which it is true that $$\lim_{r\rightarrow 0} \frac{1}{m(B_r)} \int_{B(x, r)} |f(y) - f(x)|\ dm(y) = 0$$ is called a Lebesgue point of \(f\).
This condition can be interpreted as the points of \(f\) where \(f\) does not oscillate 'too much' on average. Surprisingly, we have:
Lebesgue points are a.e.: If \(f\in L^1(\mathbb{R}^k)\), then almost every \(x\in \mathbb{R}^k\) is a Lebesgue point of \(f\). □
The crucial step is to approximate \(f\in L^1(\mathbb{R}^k)\) by \(g\in C(\mathbb{R}^k)\) with \(\|f - g\|_1\) sufficiently small. If a function is a continuous at a point, it is a Lebesgue point, so the remainder of the proof uses the tail bound (discussed earlier) of \(f-g\) to bound the limit and shows that it goes to zero a.e.
This theorem gives us several interesting consequences, which we will quickly discuss in the following subsections (mostly without proof).
Nicely shrinking sets
A sequence of \(\{E_i\}\) of Borel sets in \(\mathbb{R}^k\) is said to shrink to \(x\in \mathbb{R}^k\) nicely if there is a number \(\alpha>0\) with the following property: There is a sequence of balls \(B(x, r_i)\) with \(\lim r_i = 0\) such that \(E_i\subset B(x, r_i)\) and $$m(E_i)\geq \alpha m(B(x, r_i))$$ for all \(i\).
Note that it is not required that \(x\in E_i\) or even in its closure. This condition says that the sets \(E_i\) must occupy a 'substantial' (\(\alpha\)-fraction) of the volume of some ball around \(x\) containing it.
Differentiation using nicely shrinking sets: Associate to each \(x\in \mathbb{R}^k\) a sequence \{E^i(x)\} that shrinks to \(x\) nicely, and let \(f\in L^1(\mathbb{R}^k)\). Then $$f(x)=\lim_{i\rightarrow\infty}\frac{1}{m(E_i(x))}\int_{E_i(x)}f\ dm$$ at every Lebesgue point of \(f\), hence a.e. \([m]\). □
Fundamental Theorem of Calculus (easy part): If \(f\in L^1(\mathbb{R}^1)\) and $$F(x)=\int_{-\infty}^x f\ dm\qquad (-\infty < x < \infty),$$ then \(F'(x) = f(x)\) at every Lebesgue point of \(f\), hence a.e. \([m]\).
The latter is a corollary of the former theorem.
Metric Density
Let \(E\) be a Lebesgue measurable subset of \(\mathbb{R}^k\). The metric density of \(E\) at a point \(x\in \mathbb{R}^k\) is defined to be $$\lim_{r\rightarrow 0}\frac{m(E\cap B(x,r))}{m(B(x,r))}$$ provided that the limit exists.
Letting \(f\) be the characteristic function of \(E\) and applying the nicely shrinking sets theorem, we see that the metric density of \(E\) is 1 a.e. on \(E\) and 0 a.e. outside it. In particular, if \(\varepsilon > 0\), there is no set \(E\subset \mathbb{R}\) satisfying $$\varepsilon < \frac{m(E\cap I)}{m(I)} < 1-\varepsilon$$ for every segment \(I\).
Differentiation of measures
Using the Lebesgue points theorem, we can show the following quickly:
Differentiation of absolutely continuous complex Borel measures: Suppose \(\mu\) is a complex Borel measure on \(\mathbb{R}^k\), and \(\mu\ll m\). Then \(D\mu = f\) a.e. \([m]\), and $$\mu(E) = \int_E (D\mu)\ dm$$ for all Borel sets \(E\subset \mathbb{R}^k\). □
This means the Radon-Nikodym derivative can also be obtained as a limit of the quotients in this case.
In the singular case, we have the following:
Differentiation of singular complex Borel measures: Associate to each \(x\in \mathbb{R}^k\) a sequence \(\{E_i(x)\}\) that shrinks to \(x\) nicely. If \(\mu\) is a complex Borel measure and \(\mu\ \bot\ m\), then $$\lim_{i\rightarrow\infty}\frac{\mu(E_i(x))}{m(E_i(x))} = 0$$ a.e. \([m]\). □
The Jordan decomposition shows that it suffices to prove this for \(\mu > 0\). Using the nicely shrinking property, we can show that this a consequence of the special case \((D\mu)(x) = 0\) a.e. \([m]\). This is in turn proved by considering instead the upper derivative \((\bar{D}\mu)(x)\) defined by $$(\bar{D}\mu)(x)=\lim_{n\rightarrow\infty}\left[\sup_{0<r<1/n}(Q_r\mu)(x)\right].$$
Choose \(\lambda, \varepsilon > 0\). Using the singularity of \(\mu\), \(\mu\) is concentrated on a set of Lebesgue measure 0. As \(\mu\) is regular (see the second corollary to Regularity of measure on \(\sigma\)-compact spaces), we can pick a compact \(K\) with \(m(K)=0\) and \(\mu(K) > \|\mu\|-\varepsilon\). The tail bound $$m\{\bar{D}\mu > \lambda\} < 3^k\lambda^{-1}\varepsilon$$ arises by showing that outside of \(K\), \((\bar{D}\mu)(x) \leq (M(\mu-\mu_1))(x)\) (\(M\) being the maximum operator). The theorem follows.
Combining, we get the following:
Differentiation of complex Borel measures: Associate to each \(x\in \mathbb{R}^k\) a sequence \(\{E_i(x)\}\) that shrinks to \(x\) nicely. Let\(\mu\) is a complex Borel measure on \(\mathbb{R}^k\). Let \(d\mu = f\ dm + d\mu_s\) be the Lebesgue decomposition of \(\mu\) w.r.t \(m\). Then $$\lim_{i\rightarrow\infty}\frac{\mu(E_i(x))}{m(E_i(x))} = f(x)$$ a.e. \([m]\). In particular \(\mu\ \bot\ m\) iff \((D\mu)(x)=0\) a.e. \([m]\). □
In contrast, we remark that if we consider positive Borel measures, we get something quite different:
Differentiation of positive Borel measures: If \(\mu\) is a positive Borel measure on \(\mathbb{R}^k\) and \(\mu\ \bot\ m\), then $$(D\mu)(x)=\infty$$ a.e. \([\mu]\). □
Note the "a.e." is taken relative to \(\mu\) here, not \(m\). In particular, this makes sense for the zero measure because then any measurable set is also \(\mu\)-almost all of \(\mathbb{R}^k\).
The Fundamental Theorem of Calculus
We have proven the easy part of the FTC above. The other (harder) part of the FTC states the following: $$f(x) - f(a) = \int_a^x f'(t)\ dt \qquad (a\leq x \leq b)$$ when \(f\) is differentiable everywhere and \(f'\) is continuous everywhere.
When extending the FTC to the Lebesgue setting, questions of if the requirements of continuity and differentiability can be relaxed/adjusted arise. We cover two interesting ways where it can fail:
- Set \(f(x)=x^2\sin(x^{-2})\) if \(x\neq 0\), and \(f(0)=0\). Then \(f\) is differentiable at every point but $$\int_0^1 |f'(t)|\ dt = \infty,$$ so \(f'\notin L^1\). However if we interpret the FTC integral (with \([0,1]\) in place of \([a,b]\)) as the limit of integrals over \([\varepsilon, 1]\), then the FTC still holds for this \(f\).
- Q: Suppose \(f\) is continuous on \([a,b]\), \(f\) is differentiable at almost every point of \([a,b]\) and \(f'\in L^1\) on \([a,b]\). Does this imply FTC?
A: No. This is demonstrated by a continuous monotonic nondecreasing Cantor function that increases from 0 to 1 on an interval but has constant derivative almost everywhere on the interval.
As the statement is what is mostly of interest, we state two possible generalizations of the FTC quickly and only broadly sketch the proof strategy. But before that:
A complex function \(f\) on an interval \(I=[a,b]\) is absolutely continuous on \(I\) (\(f\) is AC on \(I\)) if for each \(\varepsilon\) there is a \(\delta\) s.t. $$\sum_{i=1}^n |f(\beta_i)-f(\alpha_i)| < \varepsilon$$ for any \(n\) and any disjoint collection of segments \((\alpha_1, \beta_1),\cdots,(\alpha_n, \beta_n)\) in \(I\) whose lengths satisfy $$\sum_{i=1}^n (\beta_i-\alpha_i) < \delta.$$
We can now state the first generalization, which allows for some a.e. differentiable functions to be integrated.
FTC for AC functions: If \(f\) is a complex function that is AC on \(I=[a,b]\), then \(f\) is differentiable a.e. on \(I\), \(f'\in L^1(m)\), and $$f(x)-f(a) = \int_a^x f'(t)\ dt\qquad (a\leq x\leq b).$$ □
This is proven by first showing it for nondecreasing AC functions, then showing that every AC function \(f\) can be expressed as an average of two nondecreasing AC functions \(F + f\) and \(F - f\), where \(F\) is also an AC function called the total variation of \(f\) (and is defined in a way similar to that of measures). This gives the statement.
In the process, note that we also show that \(f\) maps sets of measure 0 to sets of measure 0.
The second generalization requires differentiability everywhere but not continuity for \(f'\):
FTC for \(f'\in L^1\): If \(f:[a,b]\rightarrow \mathbb{R}\) is differentiable at every point of \([a,b]\) and \(f'\in L^1\) on \([a,b]\), then $$f(x)-f(a)=\int_a^x f'(t)\ dt\qquad (a\leq x\leq b).$$ □
This uses the Vitali-Carathéodory Theorem to approximate \(f'\) with a lower semicontinuous function \(g\) and uses it to prove the theorem with some manipulations.
Differentiable Transformations
In the following, \(V\) is an open set in \(\mathbb{R}^k\), \(T\) maps \(V\) into \(\mathbb{R}^k\), and \(A: \mathbb{R}^k\rightarrow\mathbb{R}^k\) is a linear operator.
If there exists a linear operator (matrix) \(A\) on \(\mathbb{R}^k\) such that $$\lim_{h\rightarrow 0}\frac{|T(x+h)-T(x)-Ah|}{|h|} = 0$$ then we say that \(T\) is differentiable at \(x\), \(T'(x) = A = J_T(x)\) is the derivative, and its determinant is the Jacobian.
In an earlier chapter Rudin showed that (omitted from notes) for every linear \(A\) there is a number \(\Delta(A) = |\det A|\) s.t. \(m(A(E)) = \Delta(A)m(E)\). Hence, in the general case, we would like to say that $$\frac{m(T(E))}{m(E)} \sim \Delta(T'(x)) = |J_T(x)|.$$
This is the content of the following:
Jacobian scaling factor: If T is continuous, and differentiable at some point \(x\in V\), then $$\lim_{r\rightarrow 0} \frac{m(T(B(x,r)))}{m(B(x,r))}=\Delta(T'(x)).$$ □
As it is classically, this shows that the Jacobian represents how the measure of a tiny set changes under the transform. In Rudin this is proven by splitting into cases. If the derivative \(A\) is one-to-one, define \(F(x) = A^{-1}T(x)\). It suffices to show that $$\lim_{r\rightarrow 0} \frac{m(F(B(x,r)))}{m(B(x,r))}=1,$$ as \(m(T(B))=m(A(F(B)))=\Delta(A)m(F(B))\).
Then, the statement follows in this case from showing for any \(\varepsilon > 0\) there is a \(\delta > 0\) such that the sandwich inclusions $$B(0,(1-\varepsilon)) \subset F(B(0,r)) \subset B(0,(1+\varepsilon)r)$$ hold for any \(0<r<\delta\). This uses the Brouwer fixpoint theorem.
In the other case, \(\mathbb{R}^k\) is mapped into a set of measure 0. Fix \(\varepsilon > 0\). We then use the fact that \(Ax\) approximates \(T(x)\) near 0 to contain \(T(B(0, r))\) in a 'thickening' of \(A(B(0, r))\) that has measure \(\varepsilon r^k\). Then the desired limit goes to zero.
A short lemma is stated:
Lemma (7.25): Suppose \(E\subset \mathbb{R}^k\), \(m(E)=0\), \(T\) maps \(E\) into \(\mathbb{R}^k\), and $$\limsup\frac{|T(y)-T(x)|}{|y-x|}<\infty$$ for every \(x\in E\) as \(y\rightarrow x\) in \(E\). Then \(m(T(E))=0\). □
Corollary: \(T\) maps sets of measure 0 to sets of measure 0. □
Finally, we arrive at the change of variables theorem:
Change of variables for integration: Suppose that
- \(X\subset V\subset \mathbb{R}^k\), \(V\) is open, \(T:V\rightarrow\mathbb{R}^k\) is continuous;
- \(X\) is Lebesgue measurable, \(T\) is one-to-one on \(X\), \(T\) is differentiable at every point of \(X\);
- \(m(T(V-X)) = 0\).
Then we have $$\int_{T(X)}f\ dm = \int_X (f\circ T)|J_T|\ dm$$ for every measurable \(f: \mathbb{R}^k\rightarrow [0,\infty]\). □
The proof proceeds in three steps (denoting the Lebesgue measurable subsets of \(\mathbb{R}^k\) by \(\mathfrak{M}\):
- If \(E\in\mathfrak{M}\) and \(E\subset V\), then \(T(E)\in\mathfrak{M}\).
Every Lebesgue-measurable set is a union of an \(F_\sigma\) an a set of Lebesgue measure zero, so this is proven for the cases individually and combined.
- For every \(E\in\mathfrak{M}\), $$m(T(E\cap X)) = \int_X\chi_E |J_T|\ dm.$$
Let \(n\) be a positive integer and put $$V_n = \{x\in V: |T(x)|<n\},\qquad X_n=X\cap V_n.$$
The first step lets us define \(\mu(E) = m(T(E\cap X_n))\) and show it is a measure on \(\mathfrak{M}\). We show that the conclusion holds on each of these parts \(E\cap X_n\) and apply monotone convergence as \(X_n\rightarrow X\).
- For every \(A\in\mathfrak{M}\), $$\int_{T(X)}\chi_A\ dm = \int_X (\chi_A\circ T)|J_T|\ dm.$$
This is proven for Borel sets and Lebesgue measure 0 sets separately and combined, as every Lebesgue measurable set is a disjoint union of a Borel set and a set of measure 0. A subtlety here is that we cannot directly use the previous step by setting \(E=T^{-1}(A')\) (the preimage) where \(A'\) is a Lebesgue-measurable set, and instead do it for Borel sets only. This is because the continuous preimage of a Lebesgue measurable set is not necessarily Lebesgue measurable, but that is true for a Borel set.
Finally, it is now clear the theorem holds for every nonnegative simple function, \(f\), so the monotone convergence theorem gives the result. And of course, the usual single-variable change of variables follows from this.