In this post I would like to comment on two papers I “stumbled upon”, one in regularization theory and one in image processing.

The first one is A regularization parameter for nonsmooth Tikhonov regularization by Kazufumi Ito, Bangti Jin and Tomoya Takeuchi. As the title announces, the paper addresses the problem of determining suitable regularization parameter for some kind of Tikhonov regularization. In particular, the authors propose a new heuristic method, i.e. method which does not use any estimate of the noise level in the data. This is an interesting and important topic for several reasons:

1. Practically, estimates on the noise level are rarely available and if they are, they are not too reliable.
2. Strictly speaking, these kind of rules are “bad” since there is the “Bakushinksii Veto”: Such rules only provide regularizations (e.g. in the sense of Engl, Hanke, Neubauer for problems which are well-posed (as a great service, the authors state and prove this statement as Theore 3.2).
3. Despite this veto, several heuristic rules produce excellent results in practice.

Note that the last second points are not in contradiction. They merely say that the notion of “regularization” may be too strict. Usually, it uses a worst case estimate which may practically never observed.

The paper contributes a new rule and state that it is applicable to a broad range of problems. They use very general Tikhonov functional:

$\displaystyle \phi(x,y^\delta) + \eta\psi(x)$

and do not assume that ${\phi}$ or ${\psi}$ are smooth. They use the value function

$\displaystyle F(\eta) = \min_x \phi(x,y^\delta) + \eta\psi(x)$

and propose the following rule for ${\eta}$: For some ${\gamma}$ choose ${\eta}$ such that

$\displaystyle \Phi_\gamma(\eta) = \frac{F(\eta)^{1+\gamma}}{\eta}$

is minimal. I do not have any intuition for this rule (however, from they proofs you see that they work, at least for “partially smooth cases”, see below). Lacking a name for this rule, one may use the term “weighted value function rule”.

They prove several nice properties of the value function (continuity, monotonicity and concavity) with loose assumptions on ${\phi}$ and ${\psi}$ (especially they do not even need existence of minimizers for ${\phi(x,y^\delta) + \eta\psi(x)}$, only that the minimum exists). However, when it comes to error estimates, they only obtain results for a specific discrepancy measure, namely a squares Hilbert space norm:

$\displaystyle \phi(x,y^\delta) = \tfrac12\|Kx-y^\delta\|^2.$

It seems that, for general convex and lower-semicontinuous penalties ${\psi}$ they build upon results from my paper with Bangti Jin on the Hanke-Raus rule and the quasi-optimality principle.

Another contribution of the paper is that it gives an algorithm that realizes the weighted value function rule (a thing which I omitted in my paper with Bangti). Their numerical experiments demonstrate that the weighted value function rule and the proposed algorithm works well for academic examples.

The next paper I want to discuss is the preprint Properties of ${L^1-\text{TGV}^2}$: The one-dimensional case by Kristian Bredies, Karl Kunisch and Tuomo Valkonen. There the authors analyze the somehow recent generalization “total generalized variation” ${\text{TGV}}$ of the omnipresent total variation. The TGV has been proposed by Bredies, Kunisch and Pock in this paper recently and Kristian and me also briefly described it in our book on mathematical image processing. Loosely speaking, the TGV shall be a generalization of the usual total variation which does not lead to “staircasing”. While one may observe from the construction of the TGV functional, that staircasing is not to be expected, the authors in this paper give precise statements. By restricting to the one dimensional case they prove several interesting properties of the TGV functional, most notably that it leads to an equivalent norm of the space ${BV}$.

Maybe I should state the definitions here: The total variation of a function ${u\in L^1(\Omega)}$ is

$\displaystyle \text{TV}(u) = \sup\{\int_\Omega u v'\ |\ v\in C^1_c(\Omega),\ \|v\|_\infty\leq 1\}$

leading the the ${BV}$-norm

$\displaystyle \|u\|_{BV} = \|u\|_{L^1} + \text{TV}(u).$

The ${\text{TGV}^2}$ seminorm for a parameter tuple ${(\alpha,\beta)}$ is

$\displaystyle \text{TGV}^2_{(\alpha,\beta)}(u) = \sup\{\int_\Omega u v''\ |\ C^2_c(\Omega), \|v\|_\infty\leq\beta,\ \|v'\|_\infty\leq\alpha\}$

and the associated norm is

$\displaystyle \|u\|_{BGV^2} = \|u\|_{L^1} + \text{TGV}^2(u).$

In Lemma 3.3 they prove that ${\|\cdot\|_{BV}}$ and ${\|\cdot\|_{BGV^2}}$ are equivalent norms on ${\text{BV}}$. In Section 4 they show that minimizers of

$\displaystyle \|u-f\|_{L^1} + \alpha\text{TV}(u)$

obey staircasing of degree 0, i.e. the solution ${u}$ is piecewise constant when it is not equal to ${f}$. For the minimizers of

$\displaystyle \|u-f\|_{L^1} + \text{TGV}^2_{(\alpha,\beta)}(u)$

one has staircasing of degree 1: ${u}$ is affine linear where it is not equal to ${f}$.

These two facts combined (norm equivalence of ${\text{BV}}$ and ${\text{BGV}^2}$ and the staircasing of degree 1) seem quite remarkable to me. They somehow show that staircasing is not related to the space ${\text{BV}}$ of functions of bounded variation but only to the specific ${\text{TV}}$ semi-norm. This is somehow satisfying since I still remember the thorough motivation of L. Rudin in his 1987 thesis for the usage of the space ${\text{BV}}$ in image processing: If there where images which are not in ${\text{BV}}$ we could not observe them. (He even draws an analogy to the question: How many angles can dance on the point of a needle?) Moreover, he further argues that ${\text{BV}}$ is not too large in the sense that its elements are still accessible to analysis (e.g. in defining a weak notion of curvature although they may be discontinuous). The ${\text{BGV}^2}$-model shows that it is possible to overcome the undesired effect of staircasing while staying in the well founded and mathematically sound and appealing framework of ${\text{BV}}$.

The paper contains several more interesting results (e.g. on preservation of continuity and “affinity” and on convergence of with respect to ${(\alpha,\beta)}$ which I do not collect here.