Today I’d like to blog about two papers which appeared on the arxiv.

1. Regularization with the Augmented Lagrangian Method – Convergence Rates from Variational Inequalities

Well, the title basically describes the content quite accurate. However, recall that the Augmented Lagrangian Method (ALM) is a method to calculate solutions to certain convex optimization problems. For a convex, proper and lower-semicontinuous function ${J}$ on a Banach space ${X}$, a linear and bounded operator ${K:X\rightarrow H}$ from ${X}$ into a Hilbert space ${H}$ and an element ${g\in H}$ consider the problem

$\displaystyle \inf_{u} J(u)\quad\text{s.t.}\quad Ku=g. \ \ \ \ \ (1)$

The ALM goes as follows: Start with an initial dual variable ${p_0}$, choose step-sizes ${\tau_k>0}$ and iterate

$\displaystyle u_k \in \text{argmin}\Big(\frac{\tau_k}{2}\|Ku-g\|^2 + J(u) + \langle p_{k-1},Ku-g\rangle\Big)$

$\displaystyle p_k = p_{k-1}+\tau_k(g-Ku_k).$

(These days one should note that this iteration is also known under the name Bregman iteration…). Indeed, it is known that the ALM converges to a solution of (1) if there exists one. Klaus and Markus consider the ill-posed case, i.e. the range of ${K}$ is not closed and ${g}$ is replaced by some ${g^\delta}$ which fulfills ${\|g-g^\delta\|\leq\delta}$ (and hence, ${g^\delta}$ is generally not in the range of ${K}$). Then, the ALM does not converge but diverges. However, one observes “semi-convergence” in practice, i.e. the iterates approach an approximate “solution to ${Ku=g^\delta}$” (or even a true solution to ${Ku=g}$) first but then start to diverge from some point on. Then it is natural to ask, if the ALM with ${g}$ replaced by ${g^\delta}$ can be used for regularization, i.e. can one choose a stopping index ${k^*}$ (depending on ${\delta}$ and ${g^\delta}$) such that the iterates ${u_{k^*}^\delta}$ approach the solution of (1) if ${\delta}$ vanishes? The question has been answered in the affirmative in previous work by Klaus (here and here) and also estimates on the error and convergence rates have been derived under an additional assumption on the solution of (1). This assumption used to be what is called “source condition” and says that there should exist some ${p^\dag\in H}$ such that for a solution ${u^\dagger}$ of (1) it holds that

$\displaystyle K^* p^\dagger \in\partial J(u^\dagger).$

Under this assumption it has been shown that the Bregman distance ${D(u_{k^*}^\delta,u^\dag)}$ goes to zero linearly in ${\delta}$ under appropriate stopping rules. What Klaus and Markus investigate in this paper are different conditions which ensure slower convergence rates than linear. These conditions come in the form of “variational inequalities” which gained some popularity lately. As usual, these variational inequalities look some kind of messy at first sight. Klaus and Markus use

$\displaystyle D(u,u^\dag)\leq J(u) - J(u^\dag) + \Phi(\|Ku-g\|^2)$

for some positive functional ${D}$ with ${D(u,u)=0}$ and some non-negative, strictly increasing and concave function ${\Phi}$. Under this assumption (and special ${D}$) they derive convergence rates which again look quite complicated but can be reduced to simpler and more transparent cases which resemble the situation one knows for other regularization methods (like ordinary Tikhonov regularization).

In the last section Klaus and Markus also treat sparse regularization (i.e. with ${J(u) = \|u\|_1}$) and derive that a weak condition (like ${(K^*K)^\nu p^\dag\in\partial J(u^\dag)}$ for some ${0<\nu<1/2}$ already imply the stronger one (1) (with a different ${p^\dag}$). Hence, interestingly, it seems that for sparse regularization one either gets a linear rate or nothing (in this framework).

2. On necessary conditions for variational regularization

The second paper is “Necessary conditions for variational regularization schemes” by Nadja Worliczek and myself. I have discussed some parts of this paper alread on this blog here and here. In this paper we tried to formalize the notion of “a variational method” for regularization with the goal to obtain necessary conditions for a variational scheme to be regularizing. As expected, this goal is quite ambitions and we can not claim that we came up with ultimate necessary condition which describe what kind of variational methods are not possible. However, we could first relate the three kinds of variational methods (which I called Tikhonov, Morozov and Ivanov regularization here) and moreover investigated the conditions on the data space a little closer. In recent years it turned out that one should not always use a term like ${\|Ku-g^\delta\|^2}$ to measure the noise or to penalize the deviation from ${Ku}$ to ${g^\delta}$. For several noise models (like Poisson noise or multiplicative noise) other functionals are better suited. However, these functionals raise several issues: They are often not defined on a linear space but on a convex set, sometimes with the nasty property that their interior is empty. They often do not have convenient algebraic properties (e.g. scaling invariance, triangle inequalities or the like). Finally they are not necessarily (lower semi-)continuous with respect to the usual topologies. Hence, we approached the data space from quite abstract way: The data space ${(Y,\tau_Y)}$ is topological space which comes with an additional sequential convergence structure ${\mathcal{S}}$ (see e.g. here) and on (a subset of) which there is a discrepancy functional ${\rho:Y\times Y\rightarrow [0,\infty]}$. Then we analyzed the interplay of these three things ${\tau_Y}$, ${\mathcal{S}}$ and ${\rho}$. If you wonder why we use the additional sequential convergence structure, remember that in the (by now classical) setting for Tikhonov regularization in Banach spaces with a functional like

$\displaystyle \|Ku-g^\delta\|_Y^q + \alpha\|u\|_X^p$

with some Banach space norms ${\|\cdot\|_Y}$ and ${\|\cdot\|_X}$ there are also two kinds of convergence on ${Y}$: The weak convergence (which is replaced by ${\tau_Y}$ in our setting) which is, e.g., used to describe convenient (lower semi-)continuity properties of ${K}$ and the norm ${\|\cdot\|_Y}$ and the norm convergence which is used to describe that ${g^\delta\rightarrow g^\dag}$ for ${\delta\rightarrow 0}$. And since we do not have a normed space ${Y}$ in our setting and one does not use any topological properties of the norm convergence in all the proofs of regularizing properties, Nadja suggested to use a sequential convergence structure instead.