Some remark before you read this post: It is on a very specialized topic and only presents a theoretical insight which seems to be of no practical value whatsoever. Continue at your own risk of wasting your time.

Morozov’s discrepancy principle is a means to choose the regularization parameter when regularizing inverse ill-posed problems. To fix the setting, we consider two Hilbert spaces {X} and {Y} and a bounded linear operator {A:X\rightarrow Y}. We assume that the range of {A} is a non-closed subset of {Y}. As a consequence of the Bounded Inverse Theorem the pseudo-inverse {A^\dag} (defined on {{\mathrm rg} A \oplus ({\mathrm rg} A)^\bot}) is unbounded.

This is the classical setting for linear inverse problems: We have a process, modeled by {A}, such that the quantity {x^\dag\in X} we are interested in gives rise to on output {y^\dag = Ax^\dag}. We are able to measure {y} but, as it is always the case, the is some noise introduced in the measurement process and hence, we have access to a measured quantity {y^\delta\in Y} which is a perturbation of {y}. Our goal is, to approximate {x^\dag} from the knowledge of {y^\delta}. Note that simply taking {A^\dag y^\delta} is not an option, since firstly {y^\delta} is not guaranteed to be in the domain of the pseudo-inverse and, somehow even more severely and also more practical, the unboundedness of {A^\dag} will lead to a severe (in theory infinitely large) amplification of the noise, rendering the reconstruction useless.

The key idea is to approximate the pseudo-inverse {A^\dag} by a family of bounded operators {R_\alpha:Y\rightarrow X} with the hope that one may have

\displaystyle  R_\alpha y^\delta \rightarrow x^\dag\ \text{when}\ y^\delta\rightarrow y\in{\mathrm rg} A\ \text{and}\ \alpha\rightarrow 0 \ \text{appropriately.} \ \ \ \ \ (1)

(Note that the assumption {\alpha\rightarrow 0} is just a convention. It says that we assume that {R_\alpha} is a closer approximation to {A^\dag} the closer {\alpha} is to zero.) Now, we have two tasks:

  1. Construct a good family of regularizing operators {R_\alpha} and
  2. devise a parameter choice, i.e. a way to choose {\alpha}.

The famous Bakushinskii veto says that there is no parameter choice that can guarantee convergence in~(1) in the worst case and only uses the given data {y^\delta}. The situation changes if one introduces knowledge about the noise level {\delta = \|y-y^\delta\|}. (There is an ongoing debate if it is reasonable to assume that the noise level is known – my experience when working with engineers is that they are usually very good in quantifying the noise present in their system and hence, in my view the assumption that noise level is known is ok.)

One popular way to choose the regularization parameter in dependence on {y^\delta} and {\delta} is Morozov’s discrepancy principle:

Definition 1 Morozov’s discrepancy principle states that one shall choose the regularization parameter {\alpha(\delta,y^\delta)} such that

\displaystyle  \|AR_{\alpha(\delta,y^\delta)}y^\delta - y^\delta\| = c\delta

for some fixed {c>1}.

In other words: You shall choose {\alpha} such that the reconstruction {R_\alpha y^\delta} produces a discrepancy {\|AR_{\alpha}y^\delta - y^\delta\|} which is in the order of and slightly larger than the noise level.

Some years ago I wrote a paper about the use of Morozov’s discrepancy principle when using the augmented Lagragian method (aka Bregman iteration) as an iterative regularization method (where one can view the inverse of the iteration counter as a regularization parameter). The paper is Morozov’s principle for the augmented Lagrangian method applied to linear inverse problems (together with , the arxiv link is here). In that paper we derived an estimate for the (squared) error of {R_\alpha y^\delta} and {x^\dag} that behaves like

\displaystyle  C \frac{c(\sqrt{c}+1)}{\sqrt{c-1}}\delta

for some {C>0} and the {c>1} from Morozov’s discrepancy principle. The somehow complicated dependence on {c} was a bit puzzling to me. One can optimize {c>1} such that the error estimate is optimal. It turns out that {c\mapsto \frac{c(\sqrt{c}+1)}{\sqrt{c-1}}} attains the minimal value of about {4.68} for about {c=1.64}. I blamed the already quite complicated analysis of the augmented Lagragian method for this obviously much too complicated values (and in practice, using {c} much closer to {1} usually lead to much better results).

This term I teach a course on inverse problems and also covered Morozov’s discrepancy principle but this time for much simpler regularization methods, namely for linear methods such as, for example, Tikhonov regularization, i.e.

\displaystyle  R_\alpha y^\delta = (A^* A + \alpha I)^{-1} A^* y^\delta

(but other linear methods exist). There I arrived at an estimate for the (squared) error of {R_\alpha y^\delta} any {x^\dag} of the form

\displaystyle  C(\sqrt{c} + \tfrac1{\sqrt{c-1}})^2\delta.

Surprisingly, the dependence on {c} for this basic regularization method is also not very simple. Optimizing the estimate over {c>1} leads to an optimal value of about {c= 2.32} (with a minimal value of the respective constant of about {5.73}). Again, using {c} closer to {1} usually lead to much better results.

Well, these observations are not of great importance… I just found it curious to observe that the analysis of Morozov’s discrepancy principle seems to be inherently a bit more complicated than I thought…