October 2011

I read abstracts quite a lot and I also have to write them occasionally. Nowadays I usually write abstracts quite quickly and don’t think too much if the abstract is going to be good. This semester I am organizing a seminar for the graduate students and post docs in our department (we simply call this seminar “research seminar”) and especially have to collect the abstracts to make the announcements. Since all mathematical institutes here are involved, the topics vary a lot: abstract algebra, numerical linear algebra, mathematical physics or optimization. Hence, it may happen that I get abstracts which I can hardly access and I thought if it is always possible to write an abstract someone like me can understand. At this point I should probably explain what I mean by “understand”: I think that an abstract should allow to locate the field in which the talk will be situated. Moreover, I should understand what the objects of interests will be and what the role of these objects is in the field. Moreover, it would be good if I would be able to estimate how the topic of the talk is related to other fields of mathematics (especially to fields I am interested in).

Since the people who give talks in our research seminar are mostly quite early in their career I made the following experiment: When I received a new abstract I read it thoroughly and tried to understand it in the above sense. Usually there was something which I totally not got and I went ahead and replied to the speaker and asked very basic questions which I asked myself while reading the abstract; questions like “What are the object you talk about?” or “What can one do with these objects or results?”. The speakers then responded with a new version of the abstract and from my point of view this process always produced a way better abstract.

Hence, I started thinking about rules and tips to write a good abstract. From what I’ve written above, I tried to deduce some rules which I collect here:

  • Avoid jargon. Although this sounds obvious, most abstracts contain jargon in one way or the other. Of course one can not avoid the use of specific terminology and technical terms but even then there is an easy check, if a technical term is appropriate: Try to find a definition on the internet – if you do not succeed within a few minutes you should find a different word. I’ll try to illustrate this with an example (which I randomly choose from arxiv.org on a topic which is quite far away from mine):

    On a convex body in a Euclidean space, we introduce a new variational formulation for its Funk metric, a Finsler metric compatible with the tautological Finsler structure of the convex body.

    I know what a convex body in Euclidean space is and I know what could be meant by a variational formulation; however, I had no idea what a Funk metric is – but it was not hard to find out that a Finsler structure is something like a “metric varying continuously from point to point”. Well, its still open to me, what the “tautological” Finsler structure of the convex body shall be, but this is something that I hope, a talk or paper could explain. This example is somehow borderline, since the additional explanation still leads to terms which are not all defined on Wikipedia or Wolfram MathWorld. But still, this sentence gives me something: The author studies the geometry of convex bodies and will come up with a variational formulation of a special metric.

  • Use buzzwords. This may sound to contradict the precious point and in part it does. But beware that you can use a buzzword together with its explanation. Again, the example from the previous point works: “Funk metric” may be a buzzword and the explanation using the name “Finsler” is supposed to ring a bell (as I learned, it is related to Hilbert’s 23rd problem). This helps the readers to find related work and to remember what was the field you were working in.
  • General to specific. In general it’s a good advice to work from general to specific. Start with a sentence which points in the direction of the field you are working in. So your potential audience will know from the beginning in which field your work is situated.
  • Answer questions. If you think that your work answers questions, why not pose the questions in the abstract? This may motivate the readers to think by themselves and draw their interest to the topic.
  • Don’t be afraid of layman’s terms. Although layman’s terms usually do not give an exact description and sometimes even are ridiculously oversimplified, they still help to form a mental picture.

Finally I’d like to repeat an advice which you can find in almost every other collection of tips on writing (e.g. here: Write, read, rewrite and reread your abstract. Repeat this procedure.

Concerning the fact that weak and strong convergence coincide on {\ell^1} (also know as Schur’s Theorem) I asked about an elementary proof in my last post. And in fact Markus Grasmair send me one which I’d like to present here. It is indeed elementary as it does not use any deep mathematics – however, it is a bit tricky.

Theorem 1 (Schur’s Theorem) A sequence {(x^k)} in {\ell^1} converges weakly iff it converges strongly.

Proof: As always, strong convergence implies weak convergence. To proof the opposite we assume that {x^k\rightharpoonup 0} but {\|x^k\|_1\geq\epsilon} for some {\epsilon>0}. From this assumption we are going to derive a contradiction by constructing a vector {f\in\ell^\infty} with unit norm and a subsequence {x^{k_l}} such that {\langle f,x^{k_l}\rangle} does not converge to zero. We initialize {j_0=0}, set {k_1=1}, choose {j_1\in{\mathbb N}} such that {\sum_{j>j_1}|x^1_j|\leq \epsilon/6} and define the first {j_1} entries of {f} as

\displaystyle  f_j = \text{sign}(x^1_j)\ \text{ for }\ 1\leq j\leq j_1.

Now we proceed inductively and assume that for some {l\geq 1} the numbers {j_1,\dots,j_l} the subsequence {x^1,\dots,x^{k_l}} and the entries {f_1,\dots,f_{j_l}} have already been constructed and fulfill for all {m\leq l}

\displaystyle  \Big|\sum_{j\leq j_{m-1}} f_j x^{k_m}_j\Big| \leq \frac{\epsilon}{6} \ \ \ \ \ (1)

\displaystyle  \sum_{j=j_{m-1}+1}^{j_m} f_j x^{k_m}_j\geq \frac{2\epsilon}{3} \ \ \ \ \ (2)

\displaystyle  \sum_{j>j_m}|x^{k_m}_j|\leq \frac{\epsilon}{6}. \ \ \ \ \ (3)

(Note that for {l=1} these conditions are fulfilled: (1) is fulfilled since the sum is empty, (2) is fulfilled since {\sum_{j=1}^{j_1}f_j x^1_j = \|x\|_1 - \sum_{j>j_1}|x^1_j|>5\epsilon/6} and (3) is fulfilled by definition.) To go from a given {l} to the next one, we first observe that {x^k\rightharpoonup 0} implies that for all {j} it holds that {x^k_j\rightarrow 0}. Hence, we may take {k_{l+1}} such that {\sum_{j\leq j_l} |x^{k_{l+1}}_j|\leq \epsilon/6} and of course we can take {k_{l+1}>k_l}. Since {x^{k_{l+1}}} is a summable sequence, we find {j_{l+1}} such that {\sum_{j>j_{l+1}}|x^{k_{l+1}}_j|<\epsilon/6} and again we may take {j_{l+1}>j_l}. We set

\displaystyle  f_j = \text{sign}(x^{k_{l+1}}_j)\ \text{ for }\ j_l\leq j\leq j_{l+1}

and observe

\displaystyle  \sum_{j = j_l+1}^{j_{l+1}} f_j x^{k_{l+1}}_j = \sum_{j = j_l+1}^{j_{l+1}} |x^{k_{l+1}}_j| > \|x^{k_{l+1}}\|_1 - \frac{\epsilon}{3} > \frac{2\epsilon}{3}.

By construction, the properties (1), (2) and (3) are fulfilled for {l+1}, and we see that we can continue our procedure ad infinitum. For the resulting {f\in\ell^\infty} (indeed {\|f\|_\infty=1}) and the subsequence {(x^{k_l})} we obtain (using the properties (1), (2) and (3)) that

\displaystyle  \begin{array}{rcl}  \langle f,x^{k_l}\rangle &=& \sum_j f_j x^{k_l}_j\\ & = & \sum_{j\leq j_{l-1}} f_j x^{k_l}_j + \sum_{j= j_{l-1}+1}^{j_l} f_j x^{k_l}_j + \sum_{j> j_l} f_j x^{k_l}_j\\ & \geq & -\Big|\sum_{j\leq j_{l-1}} f_j x^{k_l}_j \Big| + \sum_{j= j_{l-1}+1}^{j_l} f_j x^{k_l}_j - \sum_{j> j_l} |x^{k_l}_j|\\ & \geq& - \frac{\epsilon}{6} + \frac{2\epsilon}{3} - \frac{\epsilon}{6}\geq \frac{\epsilon}{3}. \end{array}


Although this proof is really elementary (you only need to know what convergence means and have a proper {\epsilon}-handling) I found it hard to digest. I think that this is basically not avoidable – Schur’s Theorem seems to be one of these facts which easy to remember, hard to believe and hard to prove.

Let’s try a direct proof that {x^k\rightharpoonup 0} implies {x^k\rightarrow 0} in {\ell^1}.

Proof: We know that a weakly convergent sequence in {\ell^1} is pointwise convergent (by testing with the canonical basis vectors), hence for every {j} it holds that {x^k_j\rightarrow 0} for {k\rightarrow\infty}. For some {J\in{\mathbb N}} we write

\displaystyle  \|x^k\|_1 = \sum_{j< J} |x^k_j| + \sum_{j\geq J} |x^k_j|.

By pointwise convergence of {x^k_j} we know that the first sum converges to zero for every {J}. Hence, we can proceed as follows: For a given {\epsilon>0} first choose {J} so large such that {\sum_{j\geq J} |x^k_j|\leq \epsilon/2} for all {k} and then choose {k} so large that { \sum_{j< J} |x^k_j|\leq \epsilon/2}. This sounds nice but wait! How about the choice of {J}? The tails of the series shall be bounded by {\epsilon/2} uniformly in {k}. That sounds possible but not obvious and indeed this is the place where the hard work starts! Indeed, one can prove that this choice of {J} is possible by contradiction: Assume that there exists an {\epsilon} such that for all {J} there exists a {k} such that {\sum_{j\geq J}|x^k_j|>\epsilon}. Alas, this is same thing which we have proven in the proof of Markus… \Box

Indeed, the construction Markus made in his proof can be generalized to obtain the Nikodym convergence theorem:

Theorem 2 Let {(\sigma_n)} be a sequence of signed finite measures for which it holds that {\sigma_n(E)\rightarrow 0} for every measurable set {E} (which is equivalent to the weak convergence of {\mu_n} to zero). Then the sequence {(\sigma_n)} is uniformly countably additive, i.e. for any sequence {(E_m)} of disjoint measurable sets the series {\sum_{m=1}^\infty |\sigma_n(E_m)|} converges uniformly in {n}.

One may compare this alternative proof with Exercise 11 (right after Theorem 5 [Nikodym convergence theorem]) in this blog post by Terry where one shall take a similar path to proof Schur’s Theorem.

This morning I was looking for the book “Vector measures” by Diestel and Uhl in our local math library. Although it was not there I found the book “Sequences and Series in Banach Spaces” by Joseph Diestel. This (very nicely written) book on somehow advanced theory of Banach spaces contains a number of facts of the type “yeah, that is true, although not easy to see; however, I do not know any reference for…”.

1. The dual of {\ell^\infty}

The question on how the dual space of {\ell^\infty} looks like seems to be of interest for many people (e.g. questions on it are frequently asked on the web, e.g. at math.stackexchange). Diestel’s book has the chapter “The Classical Banach Spaces” and in the section “The Classical Nonreflexive Sequence Spaces” he describes helpful properties of the space {c_0}, {\ell^1} and {\ell^\infty}.

The spaces {c_0} and {\ell^\infty} have the property that maps into them can be extended to larger domains:

Theorem 1 Let {Y} be a linear subspace of a Banach space {X} and let {T:Y\rightarrow\ell^\infty} be bounded and linear. Then there is an extension {S:X\rightarrow\ell^\infty} of {T} having the same norm as {T}.

Proof: Since {T} is bounded, it holds for all {n\in{\mathbb N}} that {|Ty|_n\leq \|Ty\|_\infty\leq c\|y\|_Y} and hence the mappings {y\mapsto (Ty)_n} are all elements in the space {Y^*}—we denote them by {y^*_n}. This gives the representation

\displaystyle  Ty = (y^*_n(y))_n.

By Hahn-Banach there exist the extensions {x^*_n} of {y^*_n} onto {X} (with equal norm) and hence, an extension of {T} is given by

\displaystyle  Sx = (x^*_n(x))_n

works. \Box

For {c_0} something similar holds but only for separable spaces {X} (with a more involved proof):

Theorem 2 Let {Y} be a linear subspace of a separable Banach space {X} and let {T:Y\rightarrow c_0} be bounded and linear. Then there is an extension {S:X\rightarrow c_0} of {T}.

Then Diestel moves on to the dual space of {\ell^\infty}: {ba(2^{\mathbb N})} (which stands for the bounded and (finitely) additive measures). Although this topic is also treated in other classical book (as “Linear Operators” by Dunford and Schwartz), the exposition in Diestel’s book was the most lively I came across.

Start with an {x^*\in (\ell^\infty)^*}. For a subset {\Delta} of the natural numbers the characteristic function {\chi_\Delta} belongs to {\ell^\infty} and hence, we can evaluate {x^*(\chi_\Delta)}. Of course, {x^*(\chi_\Delta)} is disjoint additive in {\Delta} and for disjoint {\Delta_1,\dots,\Delta_n} we have

\displaystyle  \begin{array}{rcl}  \sum_{i=1}^n |x^*(\chi_{\Delta_i})| &= &\sum_{i=1}^n x^*(\chi_{\Delta_i})\, \text{sgn}(x^*(\chi_{\Delta_i}))\\ & = &x^*(\sum_{i=1}^n \text{sgn}(x^*(\chi_{\Delta_i}))\,\chi_{\Delta_i})\\ & \leq &\|x^*\|\,\|\sum_{i=1}^n \text{sgn}(x^*(\chi_{\Delta_i}))\,\chi_{\Delta_i}\|_\infty\leq\|x^*\|. \end{array}

This shows that {(\ell^\infty)^*} indeed contains finitely additive measures. I’d like to quote Diestel directly:

A scalar-valued measure being bounded and additive is very like a countably additive measure and is not […] at all pathological.

Now we denote by {ba(2^{\mathbb N})} the Banach space space of bounded additive scalar-valued measures on {{\mathbb N}} endowed is with the variational norm {\|\cdot\|_{ba}} (which can be defined through the Hahn-Jordan decomposition and is, in a nutshell, the measure of the positive set plus the measure of the negative set)

For {\mu\in ba(2^{\mathbb N})} and disjoint finite subsets {\Delta_k} of the natural numbers it holds for every {n} that

\displaystyle  \sum_{i=1}^n |\mu(\Delta_i)|\leq \|\mu\|_{ba}

and hence

\displaystyle  \sum_{i=1}^\infty |\mu(\Delta_i)|\leq \|\mu\|_{ba}.

We see that {\mu} adds up the measures of countably many disjoint sets, especially, {\sum_{i=1}^\infty \mu(\Delta_i)} is an absolutely convergent sequence. However, the sum does not have to be the right one: {\sum_{i=1}^\infty \mu(\Delta_i)} may be smaller than {\mu(\bigcup_{i=1}^\infty \Delta_i)}. Diestel says that it is not fair to blame the measures {\mu} for this probable defect but it is “a failure on the part of the underlying field of sets”. He goes on the make this precise by the use of the Stone representation theorem which assigns to any Boolean algebra (in our case the algebra {\Sigma=2^{\mathbb N}} of subsets of {{\mathbb N}}) the appropriately topologized set of ultrafilters and in this set he considers the Boolean algebra {\mathcal{S}(\Sigma)} of the sets which are “clopen” (which is isomorphic to the original algebra {\Sigma}). He then considers {\mu} not acting on {\Sigma} but its identical twin {\hat\mu} on {\mathcal{S}(\Sigma)} and shows that there one needs to work with the closure of the union of sets and this makes the identical twin of {\mu} even countably additive. In this sense, the lack of countable additivity is not a failure of {\mu} but of {\Sigma}, so to say.

2. Weak convergence in {\ell^1} is strong

As a unique feature of {\ell^1} I’d like to quote Diestel again:

Face it: the norm of a vector {\sum_n t_n e_n} in {\ell^1} is as big as it can be ({\|\sum_n t_n e_n\|_1 = \sum_n |t_n|}) if respect for the triangle inequality and the “unit” vector is to be preserved.

The following theorems hold:

Theorem 3 For every separable Banach space {X} there exists a bounded and linear operator {T:X\rightarrow\ell^1} which is onto.

Theorem 4 Let {X} be a Banach space and {T:X\rightarrow\ell^1} bounded, linear and onto. Then {X} contains a subspace that is isomorphic to {\ell^1} and complemented.

Probably the most stunning fact about {\ell^1} is that weak and strong convergence coincide. To show this Diestel used heavy machinery, namely Phillips’ Lemma.

Lemma 5 (Phillips’ Lemma) Let {\mu_n} be in {ba(2^{\mathbb N})} and satisfy {\lim_{n\rightarrow\infty}\mu_n(\Delta)=0} for every {\Delta\subset{\mathbb N}}. Then

\displaystyle  \lim_n \sum_k |\mu_n(\{k\})|=0.

Theorem 6 (Schur’s Theorem) In {\ell^1} it holds that {x_n\rightharpoonup x} iff {x_n\rightarrow x}.

Proof: As always, strong convergence implies weak convergence and we only have to show the converse. Of course we considered {\ell^1=\ell^1({\mathbb N})} and hence, by canonical embedding into the double dual, there is for every {x\in\ell^1} a {\mu_x\in ba(2^{\mathbb N})} and for every {\Delta\subset {\mathbb N}} it holds that

\displaystyle  \mu_x(\Delta) = \sum_{k\in\Delta} x(k).

Let {(x_n)} be a weak null sequence in {\ell^1}. Then it holds that

\displaystyle  \begin{array}{rcl}  \lim_{n\rightarrow\infty} \mu_{x_n}(\Delta) &=&\lim_{n\rightarrow\infty}\sum_{k\in\Delta}x_n(k)\\ & = & \lim_{n\rightarrow\infty}\langle \chi_\Delta,x_n\rangle_{\ell^\infty\times\ell^1} = 0 \end{array}

Now, Phillips’ Lemma gives

\displaystyle  0 = \lim_n \sum_k |\mu_n(\{k\})|= \lim_n \sum_k |x_n(k))|=\lim_n \|x_n\|_1.


By the way: Does anyone has a reference for a more direct/elementary proof of Schur’s Theorem?