Today I gave a talk at STOR colloquium at the University of North Carolina (UNC). I spoke about the Master’s thesis of Lars Mescheder in which he developed probabilistic models for image denoising. Among other things, he proposed a Gaussian scale mixture model as an image prior and developed several methods to infer information from the respective posterior. One curios result was that the Perona-Malik diffusivity (and variants thereof) popped up in basically every algorithm he derived. It’s not only that the general idea to have an edge dependent diffusion coefficient in a PDE or, formally equivalent, a concave penalty for the magnitude of the gradient in a variational formulation, but it also turned out that the very diffusion coefficient ${\frac1{1+|\nabla u|^{2}}}$ appeared exactly in this form.

Besides the talk I got the chance to chat with many interesting people. I’d like to mention a chat about a background story on Perona-Malik which I find quite interesting:

Steven Pizer, Kenan Professor for computer science at UNC and active in a lot of fields for several decades, and in particular a driving for in the early days of digital image analysis, told that the very idea of the non-linear, gradient-magnitude dependent diffusion coefficient actually dates back to works of neuroscientists Stephen Grossberg and that he moderated a discussion between Stephan Grossberg, Pietro Perona and Jitendra Malik in which the agreed that the method should be attributed as “Grossberg-Perona-Malik” method. I could not find any more hints for this story anywhere else, but given the mathematical, psychological and biological background of Grossberg and a look at Grossbergs list of publications in the 1980’s and earlier make this story appear more than plausible.

It may also well be, that this story is just another instance of the effect, that good ideas usually pop up at more than one place…

Consider the simple linear transport equation $\displaystyle \partial_t f + v\partial_x f = 0,\quad f(x,0) = \phi(x)$

with a velocity ${v}$. Of course the solution is $\displaystyle f(x,t) = \phi(x-tv),$

i.e. the initial datum is just transported in direction of ${v}$, as the name of the equation suggests. We may also view the solution ${f}$ as not only depending on space ${x}$ and time ${t}$ but also dependent on the velocity ${v}$, i.e. we write ${f(x,t,v) =\phi(x-tv)}$.

Now consider that the velocity is not really known but somehow uncertain (while the initial datum ${\phi}$ is still known exactly). Hence, it does not make too much sense to look at the exact solution ${f}$, because the effect of a wrong velocity will get linearly amplified in time. It seems more sensible to assume a distribution ${\rho}$ of velocities and look at the averaged solutions that correspond to the different velocities ${v}$. Hence, the quantity to look at would be $\displaystyle g(x,t) = \int_{-\infty}^\infty f(x,t,v)\rho(v) dv.$

Let’s have a closer look at the averaged solution ${g}$. We write out ${f}$, perform a change of variables and end up with $\displaystyle \begin{array}{rcl} g(x,t) & = &\int_{-\infty}^\infty f(x,t,v)\rho(v)dv\\ & =& \int_{-\infty}^\infty \phi(x-tv)\rho(v)dv\\ & =& \int_{-\infty}^\infty \phi(x-w)\tfrac1t\rho(w/t)dw. \end{array}$

In the case of a Gaussian distribution ${\rho}$, i.e. $\displaystyle \rho(v) = \frac{1}{\sqrt{4\pi}}\exp\Big(-\frac{v^2}{4}\Big)$

we get $\displaystyle g(x,t) =\int_{-\infty}^\infty \phi(x-w)G(t,w)dw$

with $\displaystyle G(t,w) = \frac{1}{\sqrt{4\pi}\,t}\exp\Big(-\frac{w^2}{4t^2}\Big).$

Now we make a time rescaling ${\tau = t^2}$, denote ${h(x,\tau) = h(x,t^2) = g(x,t)}$ and see that $\displaystyle h(x,\tau) = \int_{-\infty}^\infty \phi(x-w)\frac{1}{\sqrt{4\pi \tau}}\exp\Big(-\frac{w^2}{4\tau}\Big)dw.$

So what’s the point of all this? It turns out that the averaged and time rescaled solution ${h}$ of the transport equation indeed solves the heat equation $\displaystyle \partial_t h - \partial_{xx} h = 0,\quad h(x,0) = \phi(x).$

In other words, velocity averaging and time rescaling turn a transport equation (a hyperbolic PDE) into a diffusion equation (a parabolic PDE).

I’ve seen this derivation in a talk by Enrique Zuazua in his talk at SciCADE 2015.

To end this blog post, consider the slight generalization of the transport equation $\displaystyle \partial_t f + \psi(x,v)\partial_x f = 0$

where the velocity depends on ${x}$ and ${v}$. According to Enrique Zuazua it’s open what happens here when you average over velocities…

If you are working on optimization with partial differential equations as constraints, you may be interested in the website

“OPTPDE – A Collection of Problems in PDE-Constrained Optimization”, http://www.optpde.net.

If you have developed an algorithm which can handle a certain class of optimization problems you need to do evaluations and tests on how well the method performs. To do so, you need well constructed test problems. This could be either problems where the optimal solution is known analytically our problems where the solution is known with a rigorous error bound obtained with a bullet-proof solver. Both things are not always easy to obtain and OPTPDE shall serve as a resource for such problems. It has been designed by Roland Herzog, Arnd Rösch, Stefan Ulbrich and Winnifried Wollner.

The generation of test instance for optimization problems seems quite important to me and indeed, several things can go wrong if this is not done right. Frequently, one sees tests for optimization routines on problems where the optimal solution is not known. Since there are usually different ways to express optimality conditions it is not always clear how to check for optimality; even more so, if you only check for “approximate optimality”, e.g. up to machine precision. A frequently observed effect is a kind of “trusted method bias”. By this I mean that an optimal solution is calculated by some trusted method and comparing the outcome of the tested routine with this solution. However, the trusted method uses some stopping criterion usually based on some specific set of formulations of optimality conditions and these can be different from what the new method has been tuned to. And most often, the stopping criteria do not give a rigorous error bound for the solution or the optimal objective value.

For sparse reconstruction problems, I dealt with this issue in “Constructing test instance for Basis Pursuit Denoising” (preprint available here) but I think this methodology could be used for other settings as well.

Recently Arnd Rösch and I organized the minisymposium “Parameter identification and nonlinear optimization” at the SIAM Conference on Optimization. One of the aims of this symposium was, to initiate more connections between the communities in optimal control of PDEs on the one hand and regularization of ill-posed problems on the other hand. To give a little bit of background, let me somehow formulate the “mother problems” in both fields:

Example 1 (Mother problem in optimal control of PDEs) We consider a bounded Lipschitz domain ${\Omega}$ in ${{\mathbb R}^2}$ (or ${{\mathbb R}^3}$). Assume that we are given a target (or desired state) ${y_d}$ which is a real valued function of ${\Omega}$. Our aim is to find a function (or control) ${u}$ (also defined on ${\Omega}$) such that the solution of the equation $\displaystyle \begin{array}{rcl} \Delta y & = u,\ \text{ on }\ \Omega\\ y & = 0,\ \text{ on }\ \partial\Omega. \end{array}$

Moreover, our solution (or control) shall obey some pointwise bounds $\displaystyle a\leq u \leq b.$

This motivates the following constrained optimization problem $\displaystyle \begin{array}{rcl} \min_{y,u} \|y - y_d\|_{L^2}^2\quad \text{s.t.} & \Delta y = u,\ \text{ on }\ \Omega\\ & y = 0,\ \text{ on }\ \partial\Omega.\\ & a\leq u\leq b. \end{array}$

Often, also the regularized problem is considered: For a small ${\alpha>0}$ solve: $\displaystyle \begin{array}{rcl} \min_{y,u} \|y - y_d\|_{L^2}^2 + \alpha\|u\|_{L^2}^2\quad \text{s.t.} & \Delta y = u,\ \text{ on }\ \Omega\\ & y = 0,\ \text{ on }\ \partial\Omega.\\ & a\leq u\leq b. \end{array}$

(This problem is also extensively treated section 1.2.1 in the excellent book “Optimal Control of Partial Differential Equations” by Fredi Tröltzsch.)

For inverse problems we may formulate:

Example 2 (Mother problem in inverse problems) Consider a bounded and linear operator ${A:X\rightarrow Y}$ between two Hilbert spaces and assume that ${A}$ has non-closed range. In this case, the pseudo-inverse ${A^\dagger}$ is not a bounded operator. Consider now, that we have measured data ${y^\delta\in Y}$ that is basically a noisy version of “true data” ${y\in\text{range}A}$. Our aim is, to approximate a solution of ${Ax=y}$ by the knowledge of ${y^\delta}$. Since ${A}$ does no have a closed range, it is usually the case that ${y^\delta}$ is not in the domain of the pseudo inverse and ${A^\dagger y^\delta}$ simply does not make sense. A widely used approach, also treated in my previous post is Tikhonov regularization, that is solving for a small regularization parameter ${\alpha>0}$ $\displaystyle \min_x\, \|Ax-y^\delta\|_Y^2 + \alpha\|x\|_X^2$

Clearly both mother problems have a very similar mathematical structure: We may use the solution operator of the PDE, denote it by ${A}$, and restate the mother problem of optimal control of PDEs in a form similar to the mother problem of inverse problems. However, there are some important conceptual differences:

Desired state vs. data: In Example 1 ${y_d}$ is a desired state which, however, may not be reachable. In Example 2 ${y^\delta}$ is noisy data and hence, shall not reached as good as possible.

Control vs. solution: In Example 1 the result ${u}$ is an optimal control. It’s form is not of prime importance, as long as it fulfills the given bounds and allows for a good approximation of ${y_d}$. In Example 2 the result ${x}$ is the approximate solution itself (which, of course shall somehow explain the measured data ${y^\delta}$). It’s properties are itself important.

Regularization: In Example 1 the regularization is mainly for numerical reasons. The problem itself also has a solution for ${\alpha=0}$. This is due to the fact that the set of admissible ${u}$ for a weakly compact set. However, in Example 2 one may not choose ${\alpha=0}$: First because the functional will not have a minimizer anymore and secondly one really does not want ${\|Ax-y^\delta\|}$ as small as possible since ${y^\delta}$ is corrupted by noise. Especially, the people from inverse problems are interested in the case in which both ${y^\delta\rightarrow y\in\text{range}A}$ and ${\alpha\rightarrow 0}$. However, in optimal control of PDEs, ${\alpha}$ is often seen as a model parameter which ensures that the control has somehow a small energy.

These conceptual difference sometimes complicate the dialog between the fields. One often runs into discussion dead ends like “Why should we care about decaying ${\alpha}$—it’s given?” or “Why do you need these bounds on ${u}$? This makes your problem worse and you may not reach to state as good as possible\dots”. It often takes some time until the involved people realize that they really pursue different goals, that the quantities which even have similar names are something different and that the minimization problems can be solved with the same techniques.

In our minisymposium we had the following talks:

• “Identification of an Unknown Parameter in the Main Part of an Elliptic PDE”, Arnd Rösch
• “Adaptive Discretization Strategies for Parameter Identification Problems in PDEs in the Context of Tikhonov Type and Newton Type Regularization”, Barbara Kaltenbacher
• “Optimal Control of PDEs with Directional Sparsity”, Gerd Wachsmuth
• “Nonsmooth Regularization and Sparsity Optimization” Kazufumi Ito
• ${L^1}$ Fitting for Nonlinear Parameter Identification Problems for PDEs”, Christian Clason
• “Simultaneous Identification of Multiple Coefficients in an Elliptic PDE”, Bastian von Harrach

Finally, there was my own talk “Error Estimates for joint Tikhonov and Lavrentiev Regularization of Parameter Identification Probelms” which is based on a paper with the similar name which is at http://arxiv.org/abs/0909.4648 and published in Applicable Analysis. The slides of the presentation are here (beware, there may be some wrong exponents in the pdf…).

In a nutshell, the message of the talk is: Bound both on the control/solution and the state/data may be added also to a Tikhonov-regularized inverse problem. If the operator has convenient mapping properties then the bounds will eventually be inactive if the true solution has the same property. Hence, the known estimates for usual inverse problems are asymptotically recovered.