If you are working on optimization with partial differential equations as constraints, you may be interested in the website

“OPTPDE – A Collection of Problems in PDE-Constrained Optimization”, http://www.optpde.net.

If you have developed an algorithm which can handle a certain class of optimization problems you need to do evaluations and tests on how well the method performs. To do so, you need well constructed test problems. This could be either problems where the optimal solution is known analytically our problems where the solution is known with a rigorous error bound obtained with a bullet-proof solver. Both things are not always easy to obtain and OPTPDE shall serve as a resource for such problems. It has been designed by Roland Herzog, Arnd Rösch, Stefan Ulbrich and Winnifried Wollner.

The generation of test instance for optimization problems seems quite important to me and indeed, several things can go wrong if this is not done right. Frequently, one sees tests for optimization routines on problems where the optimal solution is not known. Since there are usually different ways to express optimality conditions it is not always clear how to check for optimality; even more so, if you only check for “approximate optimality”, e.g. up to machine precision. A frequently observed effect is a kind of “trusted method bias”. By this I mean that an optimal solution is calculated by some trusted method and comparing the outcome of the tested routine with this solution. However, the trusted method uses some stopping criterion usually based on some specific set of formulations of optimality conditions and these can be different from what the new method has been tuned to. And most often, the stopping criteria do not give a rigorous error bound for the solution or the optimal objective value.

For sparse reconstruction problems, I dealt with this issue in “Constructing test instance for Basis Pursuit Denoising” (preprint available here) but I think this methodology could be used for other settings as well.

Advertisements

The second day of ISMP started (for me) with the session I organized and chaired.

The first talk was by Michael Goldman on Continuous Primal-Dual Methods in Image Processing. He considered the continuous Arrow-Hurwitz method for saddle point problems

\displaystyle \min_{u}\max_{\xi} K(u,\xi)

with {K} convex in the first and concave in the second variable. The continuous Arrow-Hurwitz method consists of solving

\displaystyle \begin{array}{rcl} \partial_t u(t) &=& -\nabla_u K(u(t),\xi(t))\\ \partial_t \xi(t) &=& \nabla_\xi K(u(t),\xi(t)). \end{array}

His talk evolved around the problem if {K} comes from a functional which contains the total variation, namely he considered

\displaystyle K(u,\xi) = -\int_\Omega u\text{div}(\xi) + G(u)

with the additional constraints {\xi\in C^1_C(\Omega,{\mathbb R}^2)} and {|\xi|\leq 1}. For the case of {G(u) = \lambda\|u-f\|^2/2} he presented a nice analysis of the problem including convergence of the method to a solution of the primal problem and some a-posteriori estimates. This reminded me of Showalters method for the regularization of ill-posed problems. The Arrow-Hurwitz method looks like a regularized version of Showalters method and hence, early stopping does not seem to be necessary for regularization. The related paper is Continuous Primal-Dual Methods for Image Processing.

The second talk was given by Elias Helou and was on Incremental Algorithms for Convex Non-Smooth Optimization with Applications in Image Reconstructions. He presented his work on a very general framework for problems of the class

\displaystyle \min_{x\in X} f(x)

with a convex function {f} and a convex set {X}. Basically, he abstracted the properties of the projected subgradient method. This consists of taking subgradient descent steps for {f} followed by projection onto {X} iteratively: With a subgradient {g^n\in\partial f(x^n)} this reads as

\displaystyle x^{n+1} = P_X(x^n -\alpha_n g^n)

he extracted the conditions one needs from the subgradient descent step and from the projection step and formulated an algorithm which consists of successive application of an “optimality operator” {\mathcal{O}_f} (replacing the subgradient step) and a feasibility operator {\mathcal{F}_X} (replacing the projection step). The algorithm then reads as

\displaystyle \begin{array}{rcl} x^{n+1/2} &=& \mathcal{O}_f(x^n,\alpha_n)\\ x^{n+1} &=& \mathcal{F}_x(x^{n+1/2} \end{array}

and he showed convergence under the extracted conditions. The related paper is , Incremental Subgradients for Constrained Convex Optimization: a Unified Framework and New Methods.

The third talk was by Jerome Fehrenbach on Stripes removel in images, apllications in microscopy. He considered the problem of very specific noise which is appear in the form of stripes (and appears, for example, “single plane illumination microscopy”). In fact he considered a little more general case and the model he proposed was as follows: The observed image is

\displaystyle u_{\text{OBS}} = u + n,

i.e. the usual sum of the true image {u} and noise {n}. However, for the noise he assumed that it is given by

\displaystyle n = \sum_{j=1}^m \psi_j*\lambda_j,

i.e. it is a sum of different convolutions. The {\psi_j} are kind of shape-functions which describe the “pattern of the noise” and the {\lambda_j} are samples of noise processes, following specific distributions (could be white noise realizations, impulsive noise or something else)-. He then formulated a variational method to identify the variables {\lambda_j} which reads as

\displaystyle \min \|\nabla(u_{\text{OBS}} - \sum_{j=1}^m \psi_j*\lambda_j)\|_1 + \sum_j \phi_j(\lambda_j).

Basically, this is the usual variational approach to image denoising, but nor the optimization variable is the noise rather than the image. This is due to the fact that the noise has a specific complicated structure and the usual formulation with {u = u_{\text{OBS}} +n} is not feasible. He used the primal-dual algorithm by Chambolle and Pock for this problem and showed that the method works well on real world problems.

Another theme which caught my attention here is “optimization with variational inequalities as constraints”. At first glance that sounds pretty awkward. Variational inequalities can be quite complicated things and why on earth would somebody considers these things as side conditions in optimization problems? In fact there are good reasons to do so. One reason is, if you have to deal with bi-level optimization problems. Consider an optimization problem

\displaystyle \min_{x\in C} F(x,p) \ \ \ \ \ (1)

 

with convex {C} and {F(\cdot,p)} (omitting regularity conditions which could be necessary to impose) depending on a parameter {p}. Now consider the case that you want to choose the parameter {p} in an optimal way, i.e. it solves another optimization problem. This could look like

\displaystyle \min_p G(x)\quad\text{s.t.}\ x\ \text{solves (1)}. \ \ \ \ \ (2)

 

Now you have an optimization problem as a constraint. Now we use the optimality condition for the problem~(1): For differentiable {F}, {x^*} solves~(1) if and only if

\displaystyle \forall y\in C:\ \nabla_x F(x^*(p),p)(y-x^*(p))\geq 0.

In other words: We con reformulate (2) as

\displaystyle \min_p G(x)\quad\text{s.t.}\ \forall y\in C:\ \nabla_x F(x^*(p),p)(y-x^*(p))\geq 0. \ \ \ \ \ (3)

 

And there it is, our optimization problem with a variational inequality as constraint. Here at ISMP there are entire sessions devoted to this, see here and here.

Recently Arnd Rösch and I organized the minisymposium “Parameter identification and nonlinear optimization” at the SIAM Conference on Optimization. One of the aims of this symposium was, to initiate more connections between the communities in optimal control of PDEs on the one hand and regularization of ill-posed problems on the other hand. To give a little bit of background, let me somehow formulate the “mother problems” in both fields:

Example 1 (Mother problem in optimal control of PDEs) We consider a bounded Lipschitz domain {\Omega} in {{\mathbb R}^2} (or {{\mathbb R}^3}). Assume that we are given a target (or desired state) {y_d} which is a real valued function of {\Omega}. Our aim is to find a function (or control) {u} (also defined on {\Omega}) such that the solution of the equation

\displaystyle  \begin{array}{rcl}  \Delta y & = u,\ \text{ on }\ \Omega\\ y & = 0,\ \text{ on }\ \partial\Omega. \end{array}

Moreover, our solution (or control) shall obey some pointwise bounds

\displaystyle  a\leq u \leq b.

This motivates the following constrained optimization problem

\displaystyle  \begin{array}{rcl}  \min_{y,u} \|y - y_d\|_{L^2}^2\quad \text{s.t.} & \Delta y = u,\ \text{ on }\ \Omega\\ & y = 0,\ \text{ on }\ \partial\Omega.\\ & a\leq u\leq b. \end{array}

Often, also the regularized problem is considered: For a small {\alpha>0} solve:

\displaystyle  \begin{array}{rcl}  \min_{y,u} \|y - y_d\|_{L^2}^2 + \alpha\|u\|_{L^2}^2\quad \text{s.t.} & \Delta y = u,\ \text{ on }\ \Omega\\ & y = 0,\ \text{ on }\ \partial\Omega.\\ & a\leq u\leq b. \end{array}

(This problem is also extensively treated section 1.2.1 in the excellent book “Optimal Control of Partial Differential Equations” by Fredi Tröltzsch.)

For inverse problems we may formulate:

Example 2 (Mother problem in inverse problems) Consider a bounded and linear operator {A:X\rightarrow Y} between two Hilbert spaces and assume that {A} has non-closed range. In this case, the pseudo-inverse {A^\dagger} is not a bounded operator. Consider now, that we have measured data {y^\delta\in Y} that is basically a noisy version of “true data” {y\in\text{range}A}. Our aim is, to approximate a solution of {Ax=y} by the knowledge of {y^\delta}. Since {A} does no have a closed range, it is usually the case that {y^\delta} is not in the domain of the pseudo inverse and {A^\dagger y^\delta} simply does not make sense. A widely used approach, also treated in my previous post is Tikhonov regularization, that is solving for a small regularization parameter {\alpha>0}

\displaystyle  \min_x\, \|Ax-y^\delta\|_Y^2 + \alpha\|x\|_X^2

Clearly both mother problems have a very similar mathematical structure: We may use the solution operator of the PDE, denote it by {A}, and restate the mother problem of optimal control of PDEs in a form similar to the mother problem of inverse problems. However, there are some important conceptual differences:

Desired state vs. data: In Example 1 {y_d} is a desired state which, however, may not be reachable. In Example 2 {y^\delta} is noisy data and hence, shall not reached as good as possible.

Control vs. solution: In Example 1 the result {u} is an optimal control. It’s form is not of prime importance, as long as it fulfills the given bounds and allows for a good approximation of {y_d}. In Example 2 the result {x} is the approximate solution itself (which, of course shall somehow explain the measured data {y^\delta}). It’s properties are itself important.

Regularization: In Example 1 the regularization is mainly for numerical reasons. The problem itself also has a solution for {\alpha=0}. This is due to the fact that the set of admissible {u} for a weakly compact set. However, in Example 2 one may not choose {\alpha=0}: First because the functional will not have a minimizer anymore and secondly one really does not want {\|Ax-y^\delta\|} as small as possible since {y^\delta} is corrupted by noise. Especially, the people from inverse problems are interested in the case in which both {y^\delta\rightarrow y\in\text{range}A} and {\alpha\rightarrow 0}. However, in optimal control of PDEs, {\alpha} is often seen as a model parameter which ensures that the control has somehow a small energy.

These conceptual difference sometimes complicate the dialog between the fields. One often runs into discussion dead ends like “Why should we care about decaying {\alpha}—it’s given?” or “Why do you need these bounds on {u}? This makes your problem worse and you may not reach to state as good as possible\dots”. It often takes some time until the involved people realize that they really pursue different goals, that the quantities which even have similar names are something different and that the minimization problems can be solved with the same techniques.

In our minisymposium we had the following talks:

  • “Identification of an Unknown Parameter in the Main Part of an Elliptic PDE”, Arnd Rösch
  • “Adaptive Discretization Strategies for Parameter Identification Problems in PDEs in the Context of Tikhonov Type and Newton Type Regularization”, Barbara Kaltenbacher
  • “Optimal Control of PDEs with Directional Sparsity”, Gerd Wachsmuth
  • “Nonsmooth Regularization and Sparsity Optimization” Kazufumi Ito
  • {L^1} Fitting for Nonlinear Parameter Identification Problems for PDEs”, Christian Clason
  • “Simultaneous Identification of Multiple Coefficients in an Elliptic PDE”, Bastian von Harrach

Finally, there was my own talk “Error Estimates for joint Tikhonov and Lavrentiev Regularization of Parameter Identification Probelms” which is based on a paper with the similar name which is at http://arxiv.org/abs/0909.4648 and published in Applicable Analysis. The slides of the presentation are here (beware, there may be some wrong exponents in the pdf…).

In a nutshell, the message of the talk is: Bound both on the control/solution and the state/data may be added also to a Tikhonov-regularized inverse problem. If the operator has convenient mapping properties then the bounds will eventually be inactive if the true solution has the same property. Hence, the known estimates for usual inverse problems are asymptotically recovered.