Today there are several things I could blog on. The first is the planary by Rich Baraniuk on Compressed Sensing. However, I don’t think that I could reflect the content in a way which would be helpful for a potential reader. Just for the record: If you have the chance to visit one of Rich’s talk: Do it!

The second thing is the talk by Bernd Hofmann on source conditions, smoothness and variational inequalities and their use in regularization of inverse problems. However, this would be too technical for now and I just did not take enough notes to write a meaningful post.

As a third thing I have the talk by Christian Clason on inverse problems with uniformly distributed noise. He argued that for uniform noise it is much better to use an ${L^\infty}$ discrepancy term instead of the usual ${L^2}$-one. He presented a path-following semismooth Newton method to solve the problem

$\displaystyle \min_x \frac{1}{p}\|Kx-y^\delta\|_\infty^p + \frac{\alpha}{2}\|x\|_2^2$

and showed examples with different kinds of noise. Indeed the examples showed that ${L^\infty}$ works much better than ${L^2}$ here. But in fact it works even better, if the noise is not uniformly distributed but “impulsive” i.e. it attains bounds ${\pm\delta}$ almost everywhere. It seems to me that uniform noise would need a slightly different penalty but I don’t know which one – probably you do? Moreover, Christian presented the balancing principle to choose the regularization parameter (without knowledge about the noise level) and this was the first time I really got what it’s about. What one does here is, to choose ${\alpha}$ such that (for some ${\sigma>0}$ which only depends on ${K}$, but not on the noise)

$\displaystyle \sigma\|Kx_\alpha^\delta-y^\delta\|_\infty = \frac{\alpha}{2}\|x_\alpha^\delta\|_2^2.$

The rational behind this is, that the left hand side is monotonically non-decreasing in ${\alpha}$, while the right hand side is monotonically non-increasing. Hence, there should be some ${\alpha}$ “in the middle” which make both somewhat equally large. Of course, we do neither want to “over-regularize” (which would usually “smooth too much”) nor to “under-regularize” (which would not eliminate noise). Hence, balancing seems to be a valid choice. From a practical point of view the balancing is also nice because one can use the fixed-point iteration

$\displaystyle \alpha^{n+1} = 2\sigma\frac{\|Kx_{\alpha^n}^\delta - y^\delta\|_\infty}{\|x_{\alpha_n}^\delta\|_2^2}$

which converges in a few number of iterations.

Then there was the talk by Esther Klann, but unfortunately, I was late so only heard the last half…

Last but not least we have the talk by Christiane Pöschl. If you are interested in Total-Variation-Denoising (TV denoising), then you probably have heard many times that “TV denoising preserves edges” (have a look at the Wikipedia page – it claims this twice). What Christiane showed (in a work with Vicent Caselles and M. Novaga) that this claim is not true in general but only for very special cases. In case of characteristic functions, the only functions for which the TV minimizer has sharp edges are these so-called calibrated sets, introduced by Caselles et el. Building on earlier works by Caselles and co-workers she calculated exact minimizers for TV denoising in the case that the image consists of characteristic functions of two convex sets or of a single star shaped domain, that is, for a given set $B$ she calculated the solution of

$\displaystyle \min_u\int (u - \chi_B)^2dx + \lambda \int|Du|.$

This is not is as easy as it may sound. Even for the minimizer for a single convex set one has to make some effort. She presented a nice connection of the shape of the obtained level-sets with the morphological operators of closing and opening. With the help of this link she derived a methodology to obtain the exact TV denoising minimizer for all parameters. I do not have the images right now but be assured that most of the time, the minimizers do not have sharp edges all over the place. Even for simple geometries (like two rectangles touching in a corner) strange things happen and only very few sharp edges appear. I’ll keep you posted in case the paper comes out (or appears as a preprint).

Christiane has some nice images which make this much more clear:

For two circles edges are preserved if they are far enough away from each other. If they are close, the area “in between” them is filled and, moreover, obey this fuzzy boundary. I remember myself seeing effects like this in the output of TV-solvers and thinking “well, it seems that the algorithm is either not good or not converged yet – TV should output sharp edges!”.

For a star-shaped shape (well, actually a star) the output looks like this. The corners are not only rounded but also blurred and this is true both for the “outer” corners and the “inner” corners.

So, if you have any TV-minimizing code, go ahead and check if your code actually does the right things on images like this!
Moreover, I would love to see similar results for more complicated extensions of TV like Total Generalized Variation, I treated here.

The second day of ISMP started (for me) with the session I organized and chaired.

The first talk was by Michael Goldman on Continuous Primal-Dual Methods in Image Processing. He considered the continuous Arrow-Hurwitz method for saddle point problems

$\displaystyle \min_{u}\max_{\xi} K(u,\xi)$

with ${K}$ convex in the first and concave in the second variable. The continuous Arrow-Hurwitz method consists of solving

$\displaystyle \begin{array}{rcl} \partial_t u(t) &=& -\nabla_u K(u(t),\xi(t))\\ \partial_t \xi(t) &=& \nabla_\xi K(u(t),\xi(t)). \end{array}$

His talk evolved around the problem if ${K}$ comes from a functional which contains the total variation, namely he considered

$\displaystyle K(u,\xi) = -\int_\Omega u\text{div}(\xi) + G(u)$

with the additional constraints ${\xi\in C^1_C(\Omega,{\mathbb R}^2)}$ and ${|\xi|\leq 1}$. For the case of ${G(u) = \lambda\|u-f\|^2/2}$ he presented a nice analysis of the problem including convergence of the method to a solution of the primal problem and some a-posteriori estimates. This reminded me of Showalters method for the regularization of ill-posed problems. The Arrow-Hurwitz method looks like a regularized version of Showalters method and hence, early stopping does not seem to be necessary for regularization. The related paper is Continuous Primal-Dual Methods for Image Processing.

The second talk was given by Elias Helou and was on Incremental Algorithms for Convex Non-Smooth Optimization with Applications in Image Reconstructions. He presented his work on a very general framework for problems of the class

$\displaystyle \min_{x\in X} f(x)$

with a convex function ${f}$ and a convex set ${X}$. Basically, he abstracted the properties of the projected subgradient method. This consists of taking subgradient descent steps for ${f}$ followed by projection onto ${X}$ iteratively: With a subgradient ${g^n\in\partial f(x^n)}$ this reads as

$\displaystyle x^{n+1} = P_X(x^n -\alpha_n g^n)$

he extracted the conditions one needs from the subgradient descent step and from the projection step and formulated an algorithm which consists of successive application of an “optimality operator” ${\mathcal{O}_f}$ (replacing the subgradient step) and a feasibility operator ${\mathcal{F}_X}$ (replacing the projection step). The algorithm then reads as

$\displaystyle \begin{array}{rcl} x^{n+1/2} &=& \mathcal{O}_f(x^n,\alpha_n)\\ x^{n+1} &=& \mathcal{F}_x(x^{n+1/2} \end{array}$

and he showed convergence under the extracted conditions. The related paper is , Incremental Subgradients for Constrained Convex Optimization: a Unified Framework and New Methods.

The third talk was by Jerome Fehrenbach on Stripes removel in images, apllications in microscopy. He considered the problem of very specific noise which is appear in the form of stripes (and appears, for example, “single plane illumination microscopy”). In fact he considered a little more general case and the model he proposed was as follows: The observed image is

$\displaystyle u_{\text{OBS}} = u + n,$

i.e. the usual sum of the true image ${u}$ and noise ${n}$. However, for the noise he assumed that it is given by

$\displaystyle n = \sum_{j=1}^m \psi_j*\lambda_j,$

i.e. it is a sum of different convolutions. The ${\psi_j}$ are kind of shape-functions which describe the “pattern of the noise” and the ${\lambda_j}$ are samples of noise processes, following specific distributions (could be white noise realizations, impulsive noise or something else)-. He then formulated a variational method to identify the variables ${\lambda_j}$ which reads as

$\displaystyle \min \|\nabla(u_{\text{OBS}} - \sum_{j=1}^m \psi_j*\lambda_j)\|_1 + \sum_j \phi_j(\lambda_j).$

Basically, this is the usual variational approach to image denoising, but nor the optimization variable is the noise rather than the image. This is due to the fact that the noise has a specific complicated structure and the usual formulation with ${u = u_{\text{OBS}} +n}$ is not feasible. He used the primal-dual algorithm by Chambolle and Pock for this problem and showed that the method works well on real world problems.

Another theme which caught my attention here is “optimization with variational inequalities as constraints”. At first glance that sounds pretty awkward. Variational inequalities can be quite complicated things and why on earth would somebody considers these things as side conditions in optimization problems? In fact there are good reasons to do so. One reason is, if you have to deal with bi-level optimization problems. Consider an optimization problem

$\displaystyle \min_{x\in C} F(x,p) \ \ \ \ \ (1)$

with convex ${C}$ and ${F(\cdot,p)}$ (omitting regularity conditions which could be necessary to impose) depending on a parameter ${p}$. Now consider the case that you want to choose the parameter ${p}$ in an optimal way, i.e. it solves another optimization problem. This could look like

$\displaystyle \min_p G(x)\quad\text{s.t.}\ x\ \text{solves (1)}. \ \ \ \ \ (2)$

Now you have an optimization problem as a constraint. Now we use the optimality condition for the problem~(1): For differentiable ${F}$, ${x^*}$ solves~(1) if and only if

$\displaystyle \forall y\in C:\ \nabla_x F(x^*(p),p)(y-x^*(p))\geq 0.$

In other words: We con reformulate (2) as

$\displaystyle \min_p G(x)\quad\text{s.t.}\ \forall y\in C:\ \nabla_x F(x^*(p),p)(y-x^*(p))\geq 0. \ \ \ \ \ (3)$

And there it is, our optimization problem with a variational inequality as constraint. Here at ISMP there are entire sessions devoted to this, see here and here.

In this post I would like to comment on two papers I “stumbled upon”, one in regularization theory and one in image processing.

The first one is A regularization parameter for nonsmooth Tikhonov regularization by Kazufumi Ito, Bangti Jin and Tomoya Takeuchi. As the title announces, the paper addresses the problem of determining suitable regularization parameter for some kind of Tikhonov regularization. In particular, the authors propose a new heuristic method, i.e. method which does not use any estimate of the noise level in the data. This is an interesting and important topic for several reasons:

1. Practically, estimates on the noise level are rarely available and if they are, they are not too reliable.
2. Strictly speaking, these kind of rules are “bad” since there is the “Bakushinksii Veto”: Such rules only provide regularizations (e.g. in the sense of Engl, Hanke, Neubauer for problems which are well-posed (as a great service, the authors state and prove this statement as Theore 3.2).
3. Despite this veto, several heuristic rules produce excellent results in practice.

Note that the last second points are not in contradiction. They merely say that the notion of “regularization” may be too strict. Usually, it uses a worst case estimate which may practically never observed.

The paper contributes a new rule and state that it is applicable to a broad range of problems. They use very general Tikhonov functional:

$\displaystyle \phi(x,y^\delta) + \eta\psi(x)$

and do not assume that ${\phi}$ or ${\psi}$ are smooth. They use the value function

$\displaystyle F(\eta) = \min_x \phi(x,y^\delta) + \eta\psi(x)$

and propose the following rule for ${\eta}$: For some ${\gamma}$ choose ${\eta}$ such that

$\displaystyle \Phi_\gamma(\eta) = \frac{F(\eta)^{1+\gamma}}{\eta}$

is minimal. I do not have any intuition for this rule (however, from they proofs you see that they work, at least for “partially smooth cases”, see below). Lacking a name for this rule, one may use the term “weighted value function rule”.

They prove several nice properties of the value function (continuity, monotonicity and concavity) with loose assumptions on ${\phi}$ and ${\psi}$ (especially they do not even need existence of minimizers for ${\phi(x,y^\delta) + \eta\psi(x)}$, only that the minimum exists). However, when it comes to error estimates, they only obtain results for a specific discrepancy measure, namely a squares Hilbert space norm:

$\displaystyle \phi(x,y^\delta) = \tfrac12\|Kx-y^\delta\|^2.$

It seems that, for general convex and lower-semicontinuous penalties ${\psi}$ they build upon results from my paper with Bangti Jin on the Hanke-Raus rule and the quasi-optimality principle.

Another contribution of the paper is that it gives an algorithm that realizes the weighted value function rule (a thing which I omitted in my paper with Bangti). Their numerical experiments demonstrate that the weighted value function rule and the proposed algorithm works well for academic examples.

The next paper I want to discuss is the preprint Properties of ${L^1-\text{TGV}^2}$: The one-dimensional case by Kristian Bredies, Karl Kunisch and Tuomo Valkonen. There the authors analyze the somehow recent generalization “total generalized variation” ${\text{TGV}}$ of the omnipresent total variation. The TGV has been proposed by Bredies, Kunisch and Pock in this paper recently and Kristian and me also briefly described it in our book on mathematical image processing. Loosely speaking, the TGV shall be a generalization of the usual total variation which does not lead to “staircasing”. While one may observe from the construction of the TGV functional, that staircasing is not to be expected, the authors in this paper give precise statements. By restricting to the one dimensional case they prove several interesting properties of the TGV functional, most notably that it leads to an equivalent norm of the space ${BV}$.

Maybe I should state the definitions here: The total variation of a function ${u\in L^1(\Omega)}$ is

$\displaystyle \text{TV}(u) = \sup\{\int_\Omega u v'\ |\ v\in C^1_c(\Omega),\ \|v\|_\infty\leq 1\}$

leading the the ${BV}$-norm

$\displaystyle \|u\|_{BV} = \|u\|_{L^1} + \text{TV}(u).$

The ${\text{TGV}^2}$ seminorm for a parameter tuple ${(\alpha,\beta)}$ is

$\displaystyle \text{TGV}^2_{(\alpha,\beta)}(u) = \sup\{\int_\Omega u v''\ |\ C^2_c(\Omega), \|v\|_\infty\leq\beta,\ \|v'\|_\infty\leq\alpha\}$

and the associated norm is

$\displaystyle \|u\|_{BGV^2} = \|u\|_{L^1} + \text{TGV}^2(u).$

In Lemma 3.3 they prove that ${\|\cdot\|_{BV}}$ and ${\|\cdot\|_{BGV^2}}$ are equivalent norms on ${\text{BV}}$. In Section 4 they show that minimizers of

$\displaystyle \|u-f\|_{L^1} + \alpha\text{TV}(u)$

obey staircasing of degree 0, i.e. the solution ${u}$ is piecewise constant when it is not equal to ${f}$. For the minimizers of

$\displaystyle \|u-f\|_{L^1} + \text{TGV}^2_{(\alpha,\beta)}(u)$

one has staircasing of degree 1: ${u}$ is affine linear where it is not equal to ${f}$.

These two facts combined (norm equivalence of ${\text{BV}}$ and ${\text{BGV}^2}$ and the staircasing of degree 1) seem quite remarkable to me. They somehow show that staircasing is not related to the space ${\text{BV}}$ of functions of bounded variation but only to the specific ${\text{TV}}$ semi-norm. This is somehow satisfying since I still remember the thorough motivation of L. Rudin in his 1987 thesis for the usage of the space ${\text{BV}}$ in image processing: If there where images which are not in ${\text{BV}}$ we could not observe them. (He even draws an analogy to the question: How many angles can dance on the point of a needle?) Moreover, he further argues that ${\text{BV}}$ is not too large in the sense that its elements are still accessible to analysis (e.g. in defining a weak notion of curvature although they may be discontinuous). The ${\text{BGV}^2}$-model shows that it is possible to overcome the undesired effect of staircasing while staying in the well founded and mathematically sound and appealing framework of ${\text{BV}}$.

The paper contains several more interesting results (e.g. on preservation of continuity and “affinity” and on convergence of with respect to ${(\alpha,\beta)}$ which I do not collect here.