I blogged about the Douglas-Rachford method before and in this post I’d like to dig a bit into the history of the method.

As the name suggests, the method has its roots in a paper by Douglas and Rachford and the paper is

Douglas, Jim, Jr., and Henry H. Rachford Jr., “On the numerical solution of heat conduction problems in two and three space variables.” Transactions of the American mathematical Society 82.2 (1956): 421-439.

At first glance, the title does not suggest that the paper may be related to monotone inclusions and if you read the paper you’ll not find any monotone operator mentioned. So let’s start and look at Douglas and Rachford’s paper.

1. Solving the heat equation numerically

So let us see, what they were after and how this is related to what is known as Douglas-Rachford splitting method today.

Indeed, Douglas and Rachford wanted to solve the instationary heat equation

\displaystyle \begin{array}{rcl} \partial_{t}u &=& \partial_{xx}u + \partial_{yy}u \\ u(x,y,0) &=& f(x,y) \end{array}

with Dirichlet boundary conditions (they also considered three dimensions, but let us skip that here). They considered a rectangular grid and a very simple finite difference approximation of the second derivatives, i.e.

\displaystyle \begin{array}{rcl} \partial_{xx}u(x,y,t)&\approx& (u^{n}_{i+1,j}-2u^{n}_{i,j}+u^{n}_{i-1,j})/h^{2}\\ \partial_{yy}u(x,y,t)&\approx& (u^{n}_{i,j+1}-2u^{n}_{i,j}+u^{n}_{i,j-1})/h^{2} \end{array}

(with modifications at the boundary to accomodate the boundary conditions). To ease notation, we abbreviate the difference quotients as operators (actually, also matrices) that act for a fixed time step

\displaystyle \begin{array}{rcl} (Au^{n})_{i,j} &=& (u^{n}_{i+1,j}-2u^{n}_{i,j}+u^{n}_{i-1,j})/h^{2}\\ (Bu^{n})_{i,j} &=& (u^{n}_{i,j+1}-2u^{n}_{i,j}+u^{n}_{i,j+1})/h^{2}. \end{array}

With this notation, our problem is to solve

\displaystyle \begin{array}{rcl} \partial_{t}u &=& (A+B)u \end{array}

in time.

Then they give the following iteration:

\displaystyle Av^{n+1}+Bw^{n} = \frac{v^{n+1}-w^{n}}{\tau} \ \ \ \ \ (1)


\displaystyle Bw^{n+1} = Bw^{n} + \frac{w^{n+1}-v^{n+1}}{\tau} \ \ \ \ \ (2)


(plus boundary conditions which I’d like to swipe under the rug here). If we eliminate {v^{n+1}} from the first equation using the second we get

\displaystyle (A+B)w^{n+1} = \frac{w^{n+1}-w^{n}}{\tau} + \tau AB(w^{n+1}-w^{n}). \ \ \ \ \ (3)


This is a kind of implicit Euler method with an additional small term {\tau AB(w^{n+1}-w^{n})}. From a numerical point of it has one advantage over the implicit Euler method: As equations (1) and (2) show, one does not need to invert {I-\tau(A+B)} in every iteration, but only {I-\tau A} and {I-\tau B}. Remember, this was in 1950s, and solving large linear equations was a much bigger problem than it is today. In this specific case of the heat equation, the operators {A} and {B} are in fact tridiagonal, and hence, solving with {I-\tau A} and {I-\tau B} can be done by Gaussian elimination without any fill-in in linear time (read Thomas algorithm). This is a huge time saver when compared to solving with {I-\tau(A+B)} which has a fairly large bandwidth (no matter how you reorder).

How do they prove convergence of the method? They don’t since they wanted to solve a parabolic PDE. They were after stability of the scheme, and this can be done by analyzing the eigenvalues of the iteration. Since the matrices {A} and {B} are well understood, they were able to write down the eigenfunctions of the operator associated to iteration (3) explicitly and since the finite difference approximation is well understood, they were able to prove approximation properties. Note that the method can also be seen, as a means to calculate the steady state of the heat equation.

We reformulate the iteration (3) further to see how {w^{n+1}} is actually derived from {w^{n}}: We obtain

\displaystyle (-I + \tau(A+B) - \tau^{2}AB)w^{n+1} = (-I-\tau^{2}AB)w^{n} \ \ \ \ \ (4)


2. What about monotone inclusions?

What has the previous section to do with solving monotone inclusions? A monotone inclusion is

\displaystyle \begin{array}{rcl} 0\in Tx \end{array}

with a monotone operator, that is, a multivalued mapping {T} from a Hilbert space {X} to (subsets of) itself such that for all {x,y\in X} and {u\in Tx} and {v\in Ty} it holds that

\displaystyle \begin{array}{rcl} \langle u-v,x-y\rangle\geq 0. \end{array}

We are going to restrict ourselves to real Hilbert spaces here. Note that linear operators are monotone if they are positive semi-definite and further note that monotone linear operators need not to be symmetric. A general approach to the solution of monotone inclusions are so-called splitting methods. There one splits {T} additively {T=A+B} as a sum of two other monotone operators. Then one tries to use the so-called resolvents of {A} and {B}, namely

\displaystyle \begin{array}{rcl} R_{A} = (I+A)^{-1},\qquad R_{B} = (I+B)^{-1} \end{array}

to obtain a numerical method. By the way, the resolvent of a monotone operator always exists and is single valued (to be honest, one needs a regularity assumption here, namely one need maximal monotone operators, but we will not deal with this issue here).

The two operators {A = \partial_{xx}} and {B = \partial_{yy}} from the previous section are not monotone, but {-A} and {-B} are, so the equation {-Au - Bu = 0} is a special case of a montone inclusion. To work with monotone operators we rename

\displaystyle \begin{array}{rcl} A \leftarrow -A,\qquad B\leftarrow -B \end{array}

and write the iteration~(4) in terms of monotone operators as

\displaystyle \begin{array}{rcl} (I + \tau(A+B) + \tau^{2}AB)w^{n+1} = (I+\tau^{2}AB)w^{n}, \end{array}


\displaystyle \begin{array}{rcl} w^{n+1} = (I+\tau A+\tau B+\tau^{2}AB)^{-1}(I+\tau AB)w^{n}. \end{array}

Using {I+\tau A+\tau B + \tau^{2}A = (I+\tau A)(I+\tau B)} and {(I+\tau^{2}AB) = (I-\tau B) + (I + \tau A)\tau B} we rewrite this in terms of resolvents as

\displaystyle \begin{array}{rcl} w^{n+1} & = &(I+\tau B)^{-1}[(I+\tau A)^{-1}(I-\tau B) + \tau B]w^{n}\\ & =& R_{\tau B}(R_{\tau A}(w^{n}-\tau Bw^{n}) + \tau Bw^{n}). \end{array}

This is not really applicable to a general monotone inclusion since there {A} and {B} may be multi-valued, i.e. the term {Bw^{n}} is not well defined (the iteration may be used as is for splittings where {B} is monotone and single valued, though).

But what to do, when both and {A} and {B} are multivaled? The trick is, to introduce a new variable {u^{n} = R_{\tau B}(w^{n})}. Plugging this in throughout leads to

\displaystyle \begin{array}{rcl} R_{\tau B} u^{n+1} & = & R_{\tau B}(R_{\tau A}(R_{\tau B}u^{n}-\tau B R_{\tau B}u^{n}) + \tau B R_{\tau B}u^{n}). \end{array}

We cancel the outer {R_{\tau B}} and use {\tau B R_{\tau B}u^{n} = u^{n} - R_{\tau B}u^{n}} to get

\displaystyle \begin{array}{rcl} u^{n+1} & = & R_{\tau A}(2R_{\tau B}u^{n} - u^{n}) + u^{n} - R_{\tau B}u^{n} \end{array}

and here we go: This is exactly what is known as Douglas-Rachford method (see the last version of the iteration in my previous post). Note that it is not {u^{n}} that converges to a solution, but {w^{n} = R_{\tau B}u^{n}}, so it is convenient to write the iteration in the two variables

\displaystyle \begin{array}{rcl} w^{n} & = & R_{\tau B}u^{n}\\ u^{n+1} & = & R_{\tau A}(2w^{n} - u^{n}) + u^{n} - w^{n}. \end{array}

The observation, that these splitting method that Douglas and Rachford devised for linear problems has a kind of much wider applicability is due to Lions and Mercier and the paper is

Lions, Pierre-Louis, and Bertrand Mercier. “Splitting algorithms for the sum of two nonlinear operators.” SIAM Journal on Numerical Analysis 16.6 (1979): 964-979.

Other, much older, splitting methods for linear systems, such as the Jacobi method, the Gauss-Seidel method used different properties of the matrices such as the diagonal of the matrix or the upper and lower triangluar parts and as such, do not generalize easily to the case of operators on a Hilbert space.

Lately quite a discussion about journals, publishing and ranking has started, also in the mathematical community. As an entry to the discussion on the role of journals you may consider the post “How might we get to a new model of mathematical publishing?” by Timothy Gowers (and the follow-up “A more modest approach”) and also some entries on nuit blanche or the post “The problem with journals”. Recently, I found that the International Mathematical Union (IMU) has started a blog on related issues: The Blog On Mathematical Journals. In the first entry “BLOG on Mathematical Journals” Ingrid Daubechies and Barbara Lee Keyfitz describe the aim of this blog: “We invite the mathematical community to provide their views on the journal rating issue, and on whether IMU and ICIAM should formulate their own rating. Views on how to establish and update this rating would also be welcome.” Although the first entry only introduced the rating issue, two further posts discuss other topics related to publishing: What might be done about high prices of journals? and Beyond Journals.

In total, the current system of publication is criticized to several extends in the discussion and I’d like to add some remarks on the pricing and ranking issue in this post.

1. Pricing

I know that some journals have prices which seem high and I have heard a lot of people complaining about high journal prices (some of them just repeat what they have heard, but other are in charge of library budget). I myself can not judge if prices are too high or appropriate. However, Where I am it seems like the library is not too rich, but basically I have no problems to obtain the papers I’d like to read: Either they are available in published digital form or printed (by campus or national license), they are available at the authors websites (in their published form or in a preprint form), they are available in preprint form on some preprint server or, as a last resort, I ask the authors directly (and rarely I ask a colleague at a richer university to download the paper and send it…). So, access to results is not a problem for me and I think for nobody at a university, at least in Germany.

One complain is that the countries pay the private publishers, through the university budgets, for work which is done by their employees (the scientists). Well, but this seems ok, since the scientists are the “only” customers, and the customers pay for what a company delivers to them. This is comparable to other situations in which countries pay private companies. However, here (as in other cases) a non-profit company, run by the country, could do the same job. A specific issue is that the journals are totally international (e.g. the editorial boards are mostly formed by reputation in the respective field and not by nationality). Hence, a centralized system should be independent of the countries which makes a transition from a decentralized system based on companies difficult.

Moreover, publishing is not totally for free and it just not true that publishers do nothing for their money. Well, we TeX our papers ourselves, however, a good publisher has to do (and does) quite some editing both technically and linguistically, to get a polished and publishable paper. Also, do not forget that the publishers do a good job in maintaining databases, permanent and stable links and cross-referencing their content (do not point to free tools like Google Scholar here, they heavily depend on the good databases by the publishers). For me it seems difficult to judge if journals are overpriced. I just do not have enough information: How many staff is needed to maintain the publication of a journal? How is the access to articles in general (several journals allow to publish a pdf on your own website)? What are examples of bad practice and best practice (high price with no service, high price and great service, low price and good service…)?

2. Ranking and rating

The primary task of the IMU blog is the discussion of journal ratings. I am not sure about the differences between “rating” and “ranking” but as far as I understand, a ranking of things should at least answer the question “Which of these two items is better?” for all two items, while a rating should answer the question “How good is this item?” for every item. Then, a ranking of journals is, in my opinion, truly not desirable and not achievable. Well, I am sorry for this old analogy, but this is just comparing apples and oranges. Which one is better: FC Barcelona or the LA Lakers? The Simpsons or How I met Your Mother? An iPad or a Nintendo 3DS? A Hummer H2 or a Smart? Lufthansa or the London Underground? Jacket or trousers? Annals of Mathematics or Annals of Statistics? Annales Scientifiques de l’École Normale Supérieure or Mathematics of Operations Research? SIAM Journal on Numerical Analysis or Crelle’s Journal? I hope you’ve got my point…

But how about a rating? Isn’t it possible to say that a journal is top, average or below? Actually there is journal rating by the Australian Mathematical Society here (although they call it ranking…). There they rate the journals into categories A*, A, B and C (from good to bad). I checked the rating of some journals I know and published in and I have to confess that I mainly agree with the ratings of there journals (e.g. Inverse Problems is excellent, Current Development in Theory and Applications of Wavelets is mediocre).

There are a lot of comments to the post BLOG on Mathematical Journals most of them saying that it is inappropriate to reduce the “quality” of a journal to a single number or rate and this number will not be any better than Impact Factor rubbish (see e.g. the comment by Jean-Paul Allouche). Arieh Iserles made the point that ratings are already used and will be used in the future and a self-made rating will be better than the others. John Ball argues why arguing against all rating is the better alternative (and gives two examples in which organizations abandoned from using ratings or bibliometry).

Basically, I do not have to add anything new to the comments given in the post. My opinion is that rating is possible but both useless and dangerous. It is possible, since you can ask experienced mathematicians and you will get a reliable answer. It is useless, since you either know which journals are good or you can ask a colleague (which is basically the same reason as the previous one). It is dangerous, since it offers the possibility to form decisions on tenure or grants on these numbers and shifts the focus from what you publish to where you publish (see also the comment by Ivar Ekeland to this post).