Some remark before you read this post: It is on a very specialized topic and only presents a theoretical insight which seems to be of no practical value whatsoever. Continue at your own risk of wasting your time.

Morozov’s discrepancy principle is a means to choose the regularization parameter when regularizing inverse ill-posed problems. To fix the setting, we consider two Hilbert spaces and and a bounded linear operator . We assume that the range of is a non-closed subset of . As a consequence of the Bounded Inverse Theorem the pseudo-inverse (defined on ) is unbounded.

This is the classical setting for linear inverse problems: We have a process, modeled by , such that the quantity we are interested in gives rise to on output . We are able to measure but, as it is always the case, the is some noise introduced in the measurement process and hence, we have access to a measured quantity which is a perturbation of . Our goal is, to approximate from the knowledge of . Note that simply taking is not an option, since firstly is not guaranteed to be in the domain of the pseudo-inverse and, somehow even more severely and also more practical, the unboundedness of will lead to a severe (in theory infinitely large) amplification of the noise, rendering the reconstruction useless.

The key idea is to approximate the pseudo-inverse by a family of bounded operators with the hope that one may have

(Note that the assumption is just a convention. It says that we assume that is a closer approximation to the closer is to zero.) Now, we have two tasks:

- Construct a good family of
*regularizing operators*and - devise a
*parameter choice*, i.e. a way to choose .

The famous Bakushinskii veto says that there is no parameter choice that can guarantee convergence in~(1) in the worst case and only uses the given data . The situation changes if one introduces knowledge about the *noise level* . (There is an ongoing debate if it is reasonable to assume that the noise level is known – my experience when working with engineers is that they are usually very good in quantifying the noise present in their system and hence, in my view the assumption that noise level is known is ok.)

One popular way to choose the regularization parameter in dependence on and is *Morozov’s discrepancy principle*:

Definition 1Morozov’s discrepancy principle states that one shall choose the regularization parameter such that

for some fixed .

In other words: You shall choose such that the reconstruction produces a discrepancy which is in the order of and slightly larger than the noise level.

Some years ago I wrote a paper about the use of Morozov’s discrepancy principle when using the augmented Lagragian method (aka Bregman iteration) as an iterative regularization method (where one can view the inverse of the iteration counter as a regularization parameter). The paper is Morozov’s principle for the augmented Lagrangian method applied to linear inverse problems (together with , the arxiv link is here). In that paper we derived an estimate for the (squared) error of and that behaves like

for some and the from Morozov’s discrepancy principle. The somehow complicated dependence on was a bit puzzling to me. One can optimize such that the error estimate is optimal. It turns out that attains the minimal value of about for about . I blamed the already quite complicated analysis of the augmented Lagragian method for this obviously much too complicated values (and in practice, using much closer to usually lead to much better results).

This term I teach a course on inverse problems and also covered Morozov’s discrepancy principle but this time for much simpler regularization methods, namely for *linear methods* such as, for example, Tikhonov regularization, i.e.

(but other linear methods exist). There I arrived at an estimate for the (squared) error of any of the form

Surprisingly, the dependence on for this basic regularization method is also not very simple. Optimizing the estimate over leads to an optimal value of about (with a minimal value of the respective constant of about ). Again, using closer to usually lead to much better results.

Well, these observations are not of great importance… I just found it curious to observe that the analysis of Morozov’s discrepancy principle seems to be inherently a bit more complicated than I thought…

July 24, 2015 at 12:38 pm

In Def.1, how about determining the constant c firstly (e.g.,by taking advantage of the convexity of the whole cost functional) and then determining the regularization parameter choice according to that c ? I think this is more challanging than finding some \alpha and then checking whether Morozov is satisfied.

July 24, 2015 at 1:06 pm

It is always important to check, what shall depend on what. I am not sure how you could first determine c while leaving alpha open. Also the functional for the Morozov principle in indeed not convex in alpha (and I guess it need not be monotone in alpha either.

June 18, 2016 at 9:53 am

Hi, you pointed out that,

“There is an ongoing debate if it is reasonable to assume that the noise level is known – my experience when working with engineers is that they are usually very good in quantifying the noise present in their system and hence, in my view the assumption that noise level is known is ok.”

How about we really do not know the noise level. Is there a way to still use Morozov principle to choose the regularization parameter ? Can you point out some papers or articles in this direction ?

June 20, 2016 at 8:50 am

By definition, Morozov’s principle does not make sense if no noise level is know. However, there are approaches to choose the regularization parameter without any knowledge of the noise level and these methods go under the names of “noise-free rules” or “heuristic rules”. Some names are “generalized cross valdation”, “L-curve method”, “principle of quasi-optimality”, “Hanke-Raus-rule” or “weighted residual rule”.

June 24, 2016 at 2:01 am

Thank you so much on the pointers. I have heard of the GCV and L-Curve methods. However, the Principle of Quasi-Optimality and the Hanke-Raus-Rule are relatively new to me. Thanks again for pointing those out.