Coming back to regularization, especially Ivanov regularization. Recall that I used the term Ivanov regularization for the minimization problem
I again stumbled upon some reference: It seems that in the case that the constraint defines a compact set, this method is usually referred to as “method of quasi solutions”. More precisely, I found this in “Elements of the theory of inverse problems” by A.M. Denisov, Chapter 6. There he uses metric spaces and proves the following:
Theorem 1 Let
be metric spaces with metrics
,
, respectively and
continuous. Furthermore let
be compact,
be in the range of
and assume that
is the unique solution of
which lies in
. Finally for a
with
define
and
. Then it holds for
that
Remark 1 Before we prove this theorem, we relate is to what I called Ivanov regularization above: The set
is encoded in~(1) as
and the “discrepancy measure”
is simply the metric
. Hence, let
denote a solution of
Because
is feasible for this problem it follows from
that
. Hence,
. In other words: Ivanov regularization produces one element in the set
. Now, the theorem says that every element in
is a good approximation for
(at least asymptotically).
Proof: We take a sequence and assume to the contrary that there exist
such that for every
there exists
such that it holds that
. Since all
lie in
which is compact, there is a convergent subsequence
with limit
. We obtain
. However, this contradicts the assumption: d_Y(A\bar x,Ax^\dagger) & = &d_Y(A\bar x,y^\dagger) = \lim_{n\rightarrow \infty} d_Y(Ax_{\delta_n},y^\dagger) \nonumber
& \leq &\lim_{n\rightarrow \infty} d_Y(Ax_{\delta_n},y^{\delta_n}) + d_Y(y^{\delta_n},y^\dagger) \leq \lim_{n\rightarrow\infty}2\delta_n =0.
Coming back to the interpretation of the Theorem~1 and Ivanov regularization: Instead of Ivanov regularization, one could also use the following feasibility problem: Find an such that both
and
. For the case of vector spaces
and
and a convex set
, this would be a convex feasibility problem which one may attack by available methods.
A further important remark is that we did not assume any linearity on (of course: we did not even assume a linear structure on
or
). Hence, the theorem seem very powerful: There is no regularization parameter involved and one still gets convergence to the true solution! However, one of the assumptions in the theorem is somehow strong: The uniqueness of
. To illustrate this we consider a special case:
Example 1 Let
and
be real (or complex) vector spaces and
be linear with non-trivial null space. Furthermore, assume that
is convex and compact and consider scaled versions
for
. Then the set of solutions of
is an affine space in
and there are three cases for the intersection of this set and
:
- The intersection is empty.
- The intersection is a convex set and contains infinitely many elements.
- The intersection contains exactly one element.
The last case occurs precisely, when the affine space of solution is tangential to
. Loosely speaking, one may say that this case only occurs, if the set
is scaled precisely to the right size such that is only touches the affine space of solutions.
Another strong assumption in Theorem~1 is that the set is compact. First there is a way to somehow relax this condition. Basically, we only need compactness to obtain the converging subsequence. Hence, one could try to work with a weaker topology on
(which would result in a weaker notion of compactness) and then obtain a limit of a subsequence which converges in the weaker sense only. Then one would need some tool to deduce that the weak limit is indeed a solution. This strategy work, for example in Banach spaces:
Example 2 Let
and
be reflexive Banach spaces and
be linear, bounded and one-to-one. We use the set
as prior knowledge on the solution of
. Moreover, we use the metrics induced the norms of
and
, respectively:
and
.
Obviously,
is not compact (if
is of infinite dimension) but it is weakly compact (and by the Eberlein-Smulian theorem also weakly-sequentially compact). In the situation of the proof of Theorem~1 we only get a weakly converging subsequence
. However, a linear operator
is also weak-to-weak linear, and hence
. While we only have a weakly converging sequence, we still can obtain the contradiction in~(0) since the norm is weakly lower semicontinuous.
Another way to justify the assumption that the solution is in a known compact set, is that in practice we always use a representation of the solution which only use a finite number of degrees of freedom (think of a Galerkin ansatz for example). However, this interpretation somehow neglects that we are interested in finding the true solution to the true infinite dimensional problem and that the discretization of the problem should be treated as a different issue. Just building on the regularizing effect of discretization will almost surely result in a method which stability properties depend on the resolution of the discretization.
Finally: Another good reference of these somehow ancient results in regularization theory is one of the first books on this topics: “Solutions of ill-posed problems” by Tikhonov and Arsenin (1977). While it took me some time to get used the type of presentation, I have to admit that it is really worth to read this book (and other translation of Russian mathematical literature).