Coming back to regularization, especially Ivanov regularization. Recall that I used the term Ivanov regularization for the minimization problem

$\displaystyle \min S(Ax,y^\delta)\ \text{ s.t. }\ R(x)\leq \tau. \ \ \ \ \ (1)$

I again stumbled upon some reference: It seems that in the case that the constraint ${R(x)\leq \tau}$ defines a compact set, this method is usually referred to as “method of quasi solutions”. More precisely, I found this in “Elements of the theory of inverse problems” by A.M. Denisov, Chapter 6. There he uses metric spaces and proves the following:

Theorem 1 Let ${X,Y}$ be metric spaces with metrics ${d_X}$, ${d_Y}$, respectively and ${A:X\rightarrow Y}$ continuous. Furthermore let ${M\subset X}$ be compact, ${y^\dagger}$ be in the range of ${A}$ and assume that ${x^\dagger}$ is the unique solution of ${Ax=y^\dagger}$ which lies in ${M}$. Finally for a ${y^\delta}$ with ${d_Y(y^\delta,y^\dagger)\leq\delta}$ define ${X_\delta = \{x\ :\ d_Y(Ax,y^\delta)\leq\delta\}}$ and ${X_\delta^M = X_\delta\cap M}$. Then it holds for ${\delta\rightarrow 0}$ that

$\displaystyle \sup_{x\in X_\delta^M}d_X(x,x^\dagger) \rightarrow 0.$

Remark 1 Before we prove this theorem, we relate is to what I called Ivanov regularization above: The set ${M}$ is encoded in~(1) as ${M = \{x\ :\ R(x)\leq\tau\}}$ and the “discrepancy measure” ${S}$ is simply the metric ${d_Y}$. Hence, let ${x_M^\delta}$ denote a solution of

$\displaystyle \min\ d_Y(Ax,y^\delta)\ \text{ s.t. } x\in M.$

Because ${x^\dagger}$ is feasible for this problem it follows from ${d_Y(Ax^\dagger,y^\delta) = d_Y(y^\dagger,y^\delta)\leq\delta}$ that ${d_Y(Ax_M^\delta,y^\delta)\leq\delta}$. Hence, ${x_M^\delta\in X_M^\delta}$. In other words: Ivanov regularization produces one element in the set ${X_M^\delta}$. Now, the theorem says that every element in ${X_M^\delta}$ is a good approximation for ${x^\dagger}$ (at least asymptotically).

Proof: We take a sequence ${\delta_n\rightarrow 0}$ and assume to the contrary that there exist ${\epsilon>0}$ such that for every ${n}$ there exists ${x_{\delta_n}\in X_M^{\delta_n}}$ such that it holds that ${d_X(x_{\delta_n},x^\dagger)\geq \epsilon}$. Since all ${x_{\delta_n}}$ lie in ${M}$ which is compact, there is a convergent subsequence ${(x_k)}$ with limit ${\bar x}$. We obtain ${d_X(\bar x,x^\dagger)\geq \epsilon}$. However, this contradicts the assumption: d_Y(A\bar x,Ax^\dagger) & = &d_Y(A\bar x,y^\dagger) = \lim_{n\rightarrow \infty} d_Y(Ax_{\delta_n},y^\dagger) \nonumber
& \leq &\lim_{n\rightarrow \infty} d_Y(Ax_{\delta_n},y^{\delta_n}) + d_Y(y^{\delta_n},y^\dagger) \leq \lim_{n\rightarrow\infty}2\delta_n =0. $\Box$

Coming back to the interpretation of the Theorem~1 and Ivanov regularization: Instead of Ivanov regularization, one could also use the following feasibility problem: Find an ${x}$ such that both ${d_Y(Ax,y^\delta)\leq\delta}$ and ${x\in M}$. For the case of vector spaces ${X}$ and ${Y}$ and a convex set ${M}$, this would be a convex feasibility problem which one may attack by available methods.

A further important remark is that we did not assume any linearity on ${A}$ (of course: we did not even assume a linear structure on ${X}$ or ${Y}$). Hence, the theorem seem very powerful: There is no regularization parameter involved and one still gets convergence to the true solution! However, one of the assumptions in the theorem is somehow strong: The uniqueness of ${x^\dagger}$. To illustrate this we consider a special case:

Example 1 Let ${X}$ and ${Y}$ be real (or complex) vector spaces and ${A}$ be linear with non-trivial null space. Furthermore, assume that ${M\subset X}$ is convex and compact and consider scaled versions ${\tau M}$ for ${\tau>0}$. Then the set of solutions of ${Ax=y^\dagger}$ is an affine space in ${X}$ and there are three cases for the intersection of this set and ${\tau M}$:

1. The intersection is empty.
2. The intersection is a convex set and contains infinitely many elements.
3. The intersection contains exactly one element.

The last case occurs precisely, when the affine space of solution is tangential to ${\tau M}$. Loosely speaking, one may say that this case only occurs, if the set ${M}$ is scaled precisely to the right size such that is only touches the affine space of solutions.

Another strong assumption in Theorem~1 is that the set ${M}$ is compact. First there is a way to somehow relax this condition. Basically, we only need compactness to obtain the converging subsequence. Hence, one could try to work with a weaker topology on ${Y}$ (which would result in a weaker notion of compactness) and then obtain a limit of a subsequence which converges in the weaker sense only. Then one would need some tool to deduce that the weak limit is indeed a solution. This strategy work, for example in Banach spaces:

Example 2 Let ${X}$ and ${Y}$ be reflexive Banach spaces and ${A:X\rightarrow Y}$ be linear, bounded and one-to-one. We use the set ${M = \{x\ :\ \|x\|_X\leq R\}}$ as prior knowledge on the solution of ${Ax=y^\dagger}$. Moreover, we use the metrics induced the norms of ${X}$ and ${Y}$, respectively: ${d_X(x,x') = \|x-x'\|_X}$ and ${d_Y(y,y') = \|y-y'\|_Y}$.

Obviously, ${M}$ is not compact (if ${X}$ is of infinite dimension) but it is weakly compact (and by the Eberlein-Smulian theorem also weakly-sequentially compact). In the situation of the proof of Theorem~1 we only get a weakly converging subsequence ${x_{\delta_n}\rightharpoonup \bar x}$. However, a linear operator ${A}$ is also weak-to-weak linear, and hence ${Ax_{\delta_n}\rightharpoonup A\bar x}$. While we only have a weakly converging sequence, we still can obtain the contradiction in~(0) since the norm is weakly lower semicontinuous.

Another way to justify the assumption that the solution is in a known compact set, is that in practice we always use a representation of the solution which only use a finite number of degrees of freedom (think of a Galerkin ansatz for example). However, this interpretation somehow neglects that we are interested in finding the true solution to the true infinite dimensional problem and that the discretization of the problem should be treated as a different issue. Just building on the regularizing effect of discretization will almost surely result in a method which stability properties depend on the resolution of the discretization.

Finally: Another good reference of these somehow ancient results in regularization theory is one of the first books on this topics: “Solutions of ill-posed problems” by Tikhonov and Arsenin (1977). While it took me some time to get used the type of presentation, I have to admit that it is really worth to read this book (and other translation of Russian mathematical literature).