It’s out again! Our department has a vacant position for optimization to fill! This time we are seeking a professor (W2) working in discrete optimization.

The official job advertisement has been sent to various newletters and digests and you can find it for example here or here. In addition to that information let me give some more information about the math department here. Basically, I copied the following information from this previous advertisement:

The math department here is a medium sized department. It covers quite broad range of mathematics:

• Numerical Linear Algebra (Fassbender, Bollhöfer)
• PDEs (Sonar, Hempel)
• Modelling (Langemann)
• Stochastics (Kreiss, Lindner, Leucht)
• Applied Analysis/Mathematical Physics (Bach, myself)
• Algebra and Discrete Mathematics (Eick, Löwen, Opolka)

and, of course, Optimization (Zimmermann, tba). In most cases I find some expert around for all my questions that are a bit outside my field. All groups are active and working together smoothly. The department is located in the Carl-Friedrich Gauss Faculty which is also the home of the departments for Computer Science, Business Administration and Social Sciences. At the least in Computer Science and Business Administration there are some mathematically oriented groups, e.g.

and there are several groups with some mathematical background and interesting fields of applications (computer graphics, robotics,…). Moreover, the TU has a lot of engineering institutes with strong background in mathematics and cool applications.
In addition to a lively and interesting research environment, the university treats its staff well (as far as I can see) and administrative burden or failures are not harming too much (in fact less then at other places, I’ve heard)!

Full disclosure: I am the head of the hiring committee this time. All questions you may have about the position can be sent to me.

The deadline for application is 30.04.2014. The deadline is sharp and only electronic applications (addressed to fk1@tu-bs.de) will be considered. Please send a single pdf-file and make sure that all text in the document is searchable.

I recently updated my working hardware and now use a tablet pc for work (namely a Nexus 10). In consequence, I also updated the software I used to have things more synchronized across devices. For my RSS feeds I now use feedly and the gReader app. However, I was not that happy with the method to store and mark paper I found but found the sharing interfaces between the apps pretty handy. I adopted the workflow that when I see a paper that I want to remember I sent them to my Evernote account where I tag them. Then, from time to time I go over the papers I marked and have a more detailed look. If I think, they deserve to be kept for future reference, they get a small entry here. Here’s the first take with just two papers from the last weeks (there are more in my backlog…):

On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems by Radu Ioan Boţ, Ernö Robert Csetnek, André Heinrich, Christopher Hendrich (Math Prog): As first sight, I found this work pretty inaccessible but the title sounded interesting. I was a bit scared by the formula for the kind of problems they investigated: Solve the following inclusion for ${x}$

$\displaystyle 0 \in z + Ax + \sum_{i=1}^m L_i^*((B_i\square D_i)(L_ix -r_i)) + Cx$

where ${A}$, ${B_i}$ and ${D_i}$ are maximally monotone, ${D_i}$ also ${\nu_i}$ strongly monotone, ${C}$ is ${\eta}$-coercive, ${L_i}$ are linear and bounded and ${\square}$ denotes the parallel sum, i.e. ${A\square B = (A^{-1}+B^{-1})^{-1}}$. Also the proposed algorithm looked a bit like a monster. Then, on later pager, things became a bit more familiar. As an application, they considered the optimization problem

$\displaystyle \min_x f(x) + \sum_{i=1}^m (g_i\square l_i)(L_ix - r_i) + h(x) - \langle x,z\rangle$

with convex ${f}$, ${g_i}$, ${l_i}$ (${l_i}$ ${\nu_i^{-1}}$ strongly convex), ${h}$ convex with ${\eta}$-Lipschitz gradient and ${L_i}$ as above. By noting that the parallel sum is related to the infimal convolution of convex functions, things became clearer. Also, the algorithm looks more familiar now (Algorithm 18 in the paper – I’m too lazy to write it down here). They have an analysis of the algorithms that allow to deduce convergence rates for the iterates (usually ${\mathcal{O}(1/n)}$) but I haven’t checked the details yet.

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm by Jinshan Zeng, Shaobo Lin, Zongben Xu: At first I was excited but then I realized that they simple tackled

$\displaystyle \min F + \lambda \Phi$

with smooth ${F}$ and non-smooth, non-convex ${\Phi}$ by “iterative thresholding”, i.e.

$\displaystyle x^{n+1} = \mathrm{prox}_{\mu\lambda\Phi}(x^n - \mu \nabla F(x^n)).$

The paper really much resembles what Kristian and I did in the paper Minimization of non-smooth, non-convex functionals by iterative thresholding (at least I couldn’t figure out the improvements…).

Some remark before you read this post: It is on a very specialized topic and only presents a theoretical insight which seems to be of no practical value whatsoever. Continue at your own risk of wasting your time.

Morozov’s discrepancy principle is a means to choose the regularization parameter when regularizing inverse ill-posed problems. To fix the setting, we consider two Hilbert spaces ${X}$ and ${Y}$ and a bounded linear operator ${A:X\rightarrow Y}$. We assume that the range of ${A}$ is a non-closed subset of ${Y}$. As a consequence of the Bounded Inverse Theorem the pseudo-inverse ${A^\dag}$ (defined on ${{\mathrm rg} A \oplus ({\mathrm rg} A)^\bot}$) is unbounded.

This is the classical setting for linear inverse problems: We have a process, modeled by ${A}$, such that the quantity ${x^\dag\in X}$ we are interested in gives rise to on output ${y^\dag = Ax^\dag}$. We are able to measure ${y}$ but, as it is always the case, the is some noise introduced in the measurement process and hence, we have access to a measured quantity ${y^\delta\in Y}$ which is a perturbation of ${y}$. Our goal is, to approximate ${x^\dag}$ from the knowledge of ${y^\delta}$. Note that simply taking ${A^\dag y^\delta}$ is not an option, since firstly ${y^\delta}$ is not guaranteed to be in the domain of the pseudo-inverse and, somehow even more severely and also more practical, the unboundedness of ${A^\dag}$ will lead to a severe (in theory infinitely large) amplification of the noise, rendering the reconstruction useless.

The key idea is to approximate the pseudo-inverse ${A^\dag}$ by a family of bounded operators ${R_\alpha:Y\rightarrow X}$ with the hope that one may have

$\displaystyle R_\alpha y^\delta \rightarrow x^\dag\ \text{when}\ y^\delta\rightarrow y\in{\mathrm rg} A\ \text{and}\ \alpha\rightarrow 0 \ \text{appropriately.} \ \ \ \ \ (1)$

(Note that the assumption ${\alpha\rightarrow 0}$ is just a convention. It says that we assume that ${R_\alpha}$ is a closer approximation to ${A^\dag}$ the closer ${\alpha}$ is to zero.) Now, we have two tasks:

1. Construct a good family of regularizing operators ${R_\alpha}$ and
2. devise a parameter choice, i.e. a way to choose ${\alpha}$.

The famous Bakushinskii veto says that there is no parameter choice that can guarantee convergence in~(1) in the worst case and only uses the given data ${y^\delta}$. The situation changes if one introduces knowledge about the noise level ${\delta = \|y-y^\delta\|}$. (There is an ongoing debate if it is reasonable to assume that the noise level is known – my experience when working with engineers is that they are usually very good in quantifying the noise present in their system and hence, in my view the assumption that noise level is known is ok.)

One popular way to choose the regularization parameter in dependence on ${y^\delta}$ and ${\delta}$ is Morozov’s discrepancy principle:

Definition 1 Morozov’s discrepancy principle states that one shall choose the regularization parameter ${\alpha(\delta,y^\delta)}$ such that

$\displaystyle \|AR_{\alpha(\delta,y^\delta)}y^\delta - y^\delta\| = c\delta$

for some fixed ${c>1}$.

In other words: You shall choose ${\alpha}$ such that the reconstruction ${R_\alpha y^\delta}$ produces a discrepancy ${\|AR_{\alpha}y^\delta - y^\delta\|}$ which is in the order of and slightly larger than the noise level.

Some years ago I wrote a paper about the use of Morozov’s discrepancy principle when using the augmented Lagragian method (aka Bregman iteration) as an iterative regularization method (where one can view the inverse of the iteration counter as a regularization parameter). The paper is Morozov’s principle for the augmented Lagrangian method applied to linear inverse problems (together with , the arxiv link is here). In that paper we derived an estimate for the (squared) error of ${R_\alpha y^\delta}$ and ${x^\dag}$ that behaves like

$\displaystyle C \frac{c(\sqrt{c}+1)}{\sqrt{c-1}}\delta$

for some ${C>0}$ and the ${c>1}$ from Morozov’s discrepancy principle. The somehow complicated dependence on ${c}$ was a bit puzzling to me. One can optimize ${c>1}$ such that the error estimate is optimal. It turns out that ${c\mapsto \frac{c(\sqrt{c}+1)}{\sqrt{c-1}}}$ attains the minimal value of about ${4.68}$ for about ${c=1.64}$. I blamed the already quite complicated analysis of the augmented Lagragian method for this obviously much too complicated values (and in practice, using ${c}$ much closer to ${1}$ usually lead to much better results).

This term I teach a course on inverse problems and also covered Morozov’s discrepancy principle but this time for much simpler regularization methods, namely for linear methods such as, for example, Tikhonov regularization, i.e.

$\displaystyle R_\alpha y^\delta = (A^* A + \alpha I)^{-1} A^* y^\delta$

(but other linear methods exist). There I arrived at an estimate for the (squared) error of ${R_\alpha y^\delta}$ any ${x^\dag}$ of the form

$\displaystyle C(\sqrt{c} + \tfrac1{\sqrt{c-1}})^2\delta.$

Surprisingly, the dependence on ${c}$ for this basic regularization method is also not very simple. Optimizing the estimate over ${c>1}$ leads to an optimal value of about ${c= 2.32}$ (with a minimal value of the respective constant of about ${5.73}$). Again, using ${c}$ closer to ${1}$ usually lead to much better results.

Well, these observations are not of great importance… I just found it curious to observe that the analysis of Morozov’s discrepancy principle seems to be inherently a bit more complicated than I thought…

Today I’d like to collect some comments one a few papers I stumbled upon recently on the arXiv.

1. TGV minimizers in 1D

First, about a month ago two very similar paper appeared in the same week:

Both papers treat the recently proposed “total generalized variation” model (which is a somehow-but-not-really-higher-order generalization of total variation). The total variation of a function ${u\in L^1(\Omega)}$ (${\Omega\subset{\mathbb R}^d}$) is defined by duality

$\displaystyle TV(u) = \sup\Big\{\int_\Omega \mathrm{div} \phi\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,{\mathbb R}^d), |\phi|\leq 1\Big\}.$

(Note that the demanded high regularity of the test functions ${\phi}$ is not essential here, as we take a supremum over all these functions under the only, but important, requirement that the functions are bounded. Test functions from ${C^1_c(\Omega,{\mathbb R}^d)}$ would also do.)

Several possibilities for extensions and generalization of the total variation exist by somehow including higher order derivatives. The “total generalized variation” is a particular successful approach which reads as (now using two non-negative parameter ${\alpha,\beta}$ which do a weighting):

$\displaystyle TGV_{\beta,\alpha}^2(u) = \sup\Big\{\int_\Omega \mathrm{div}^2 \phi\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,S^{d\times d}),\ |\phi|\leq \beta,\ |\mathrm{div}\phi|\leq \alpha\Big\}.$

To clarify some notation: ${S^{d\times d}}$ are the symmetric ${d\times d}$ matrices, ${\mathrm{div}^n}$ is the negative adjoint of ${\nabla^n}$ which is the differential operator that collects all partial derivatives up to the ${n}$-th order in a ${d\times\cdots\times d}$-tensor. Moreover ${|\phi|}$ is some matrix norm (e.g. the Frobenius norm) and ${|\mathrm{div}\phi|}$ is some vector norm (e.g. the 2-norm).

Both papers investigate so called denoising problems with TGV penalty and ${L^2}$ discrepancy, i.e. minimization problems

$\displaystyle \min_u \frac12\int_\Omega(u-u^0)^2\, dx + TGV_{\alpha,\beta}^2(u)$

for a given ${u^0}$. Moreover, both papers treat the one dimensional case and investigate very special cases in which they calculate minimizers analytically. In one dimension the definition of ${TGV^2}$ becomes a little more familiar:

$\displaystyle TGV_{\beta,\alpha}^2(u) = \sup\Big\{\int_\Omega \phi''\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,{\mathbb R}),\ |\phi|\leq \beta,\ |\phi'|\leq \alpha\Big\}.$

Some images of both papar are really similar: This one from Papafitsoros and Bredies

and this one from Pöschl and Scherzer

Although both paper have a very similar scopes it is worth to read both. The calculations are tedious but both paper try to make them accessible and try hard (and did a good job) to provide helpful illustrations. Curiously, the earlier paper cites the later one but not conversely…

Another paper I found very interesting was

This paper shows a nice duality which I haven’t been aware of, namely the one between the subgradient descent methods and conditional gradient methods. In fact the conditional gradient method which is treated is a generalization of the conditional gradient method which Kristian and I also proposed a while ago in the context of ${\ell^1}$-minimization in the paper Iterated hard shrinkage for minimization problems with sparsity constraints: To minimize the sum

$\displaystyle F(u) + \Phi(u)$

with a differentiable ${F}$ and a convex ${\Phi}$ for which the subgradient of ${\Phi}$ is easily invertible (or, put differently, for which you can minimize ${\langle u,a\rangle + \Phi(u)}$ easily), perform the following iteration:

1. At iterate ${u^n}$ linearize ${F}$ but not ${\Phi}$ and calculate a new point ${v^n}$ by

$\displaystyle v^n = \mathrm{argmin}_v \langle F'(u^n),v\rangle + \Phi(v)$

2. Choose a stepsize ${s^n\in [0,1]}$ and set the next iterate as a convex combination of ${u^n}$ and ${v^n}$

$\displaystyle u^{n+1} = u^n + s_n(v^n - u^n).$

Note that for and indicator function

$\displaystyle \Phi(u) = \begin{cases} 0 & u\in C\\ \infty & \text{else} \end{cases}$

you obtain the conditional gradient method (also known as Frank-Wolfe method). While Kristian and I derived convergence with an asymptotic rate for the case of ${F(u) = \tfrac12\|Ku-f\|^2}$ and ${\Phi}$ strongly coercive, Francis uses the formulation ${F(u) = f(Au)}$ the assumption that the dual ${f^*}$ of ${f}$ has a bounded effective domain (which say that ${f}$ has linear growth in all directions). With this assumption he obtains explicit constants and rates also for the primal-dual gap. It was great to see that eventually somebody really took the idea from the paper Iterated hard shrinkage for minimization problems with sparsity constraints (and does not think that we do heuristics for ${\ell^0}$ minimization…).

The mother example of optimization is to solve problems

$\displaystyle \min_{x\in C} f(x)$

for functions ${f:{\mathbb R}^n\rightarrow{\mathbb R}}$ and sets ${C\in{\mathbb R}^n}$. One further classifies problems according to additional properties of ${f}$ and ${C}$: If ${C={\mathbb R}^n}$ one speaks of unconstrained optimization, if ${f}$ is smooth one speaks of smooth optimization, if ${f}$ and ${C}$ are convex one speaks of convex optimization and so on.

1. Classification, goals and accuracy

Usually, optimization problems do not have a closed form solution. Consequently, optimization is not primarily concerned with calculating solutions to optimization problems, but with algorithms to solve them. However, having a convergent or terminating algorithm is not fully satisfactory without knowing an upper bound on the runtime. There are several concepts one can work with in this respect and one is the iteration complexity. Here, one gives an upper bound on the number of iterations (which are only allowed to use certain operations such as evaluations of the function ${f}$, its gradient ${\nabla f}$, its Hessian, solving linear systems of dimension ${n}$, projecting onto ${C}$, calculating halfspaces which contain ${C}$, or others) to reach a certain accuracy. But also for the notion of accuracy there are several definitions:

• For general problems one can of course desire to be within a certain distance to the optimal point ${x^*}$, i.e. ${\|x-x^*\|\leq \epsilon}$ for the solution ${x^*}$ and a given point ${x}$.
• One could also demand that one wants to be at a point which has a function value close to the optimal one ${f^*}$, i.e, ${f(x) - f^*\leq \epsilon}$. Note that for this and for the first point one could also desire relative accuracy.
• For convex and unconstrained problems, one knowns that the inclusion ${0\in\partial f(x^*)}$ (with the subgradient ${\partial f(x)}$) characterizes the minimizers and hence, accuracy can be defined by desiring that ${\min\{\|\xi\|\ :\ \xi\in\partial f(x)\}\leq \epsilon}$.

It turns out that the first two definitions of accuracy are much to hard to obtain for general problems and even for smooth and unconstrained problems. The main issue is that for general functions one can not decide if a local minimizer is also a solution (i.e. a global minimizer) by only considering local quantities. Hence, one resorts to different notions of accuracy, e.g.

• For a smooth, unconstrained problems aim at stationary points, i.e. find ${x}$ such that ${\|\nabla f(x)\|\leq \epsilon}$.
• For smoothly constrained smooth problems aim at “approximately KKT-points” i.e. a point that satisfies the Karush-Kuhn-Tucker conditions approximately.

(There are adaptions to the nonsmooth case that are in the same spirit.) Hence, it would be more honest not write ${\min_x f(x)}$ in these cases since this is often not really the problem one is interested in. However, people write “solve ${\min_x f(x)}$” all the time even if they only want to find “approximately stationary points”.

2. The gradient method for smooth, unconstrainted optimization

Consider a smooth function ${f:{\mathbb R}^n\rightarrow {\mathbb R}}$ (we’ll say more precisely how smooth in a minute). We make no assumption on convexity and hence, we are only interested in finding stationary points. From calculus in several dimensions it is known that ${-\nabla f(x)}$ is a direction of descent from the point ${x}$, i.e. there is a value ${h>0}$ such that ${f(x - h\nabla f(x))< f(x)}$. Hence, it seems like moving into the direction of the negative gradient is a good idea. We arrive at what is known as gradient method:

$\displaystyle x_{k+1} = x_k - h_k \nabla f(x_k).$

Now let’s be more specific about the smoothness of ${f}$. Of course we need that ${f}$ is differentiable and we also want the gradient to be continuous (to make the evaluation of ${\nabla f}$ stable). It turns out that some more smoothness makes the gradient method more efficient, namely we require that the gradient of ${f}$ is Lipschitz continuous with a known Lipschitz constant ${L}$. The Lipschitz constant can be used to produce efficient stepsizes ${h_k}$, namely, for ${h_k = 1/L}$ one has the estimate

$\displaystyle f(x_k) - f(x_{k+1})\geq \frac{1}{2L}\|\nabla f(x_k)\|^2.$

This inequality is really great because one can use telescoping to arrive at

$\displaystyle \frac{1}{2L}\sum_{k=0}^N \|\nabla f(x_k)\|^2 \leq f(x_0) - f(x_{N+1}) \leq f(x_0) - f^*$

with the optimal value ${f}$ (note that we do not need to know ${f^*}$ for the following). We immediately arrive at

$\displaystyle \min_{0\leq k\leq N} \|\nabla f(x_k)\| \leq \frac{1}{\sqrt{N+1}}\sqrt{2L(f(x_0)-f^*))}.$

That’s already a result on the iteration complexity! Among the first ${N}$ iterates there is one which has a gradient norm of order ${N^{-1/2}}$.

However, from here on it’s getting complicated: We can not say anything about the function values ${f(x_k)}$ and about convergence of the iterates ${x_k}$. And even for convex functions ${f}$ (which allow for more estimates from above and below) one needs some more effort to prove convergence of the functional values to the global minimal one.

But how about convergence of the iterates for the gradient method if convexity is not given? It turns out that this is a hard problem. As illustration, consider the continuous case, i.e. a trajectory of the dynamical system

$\displaystyle \dot x = -\nabla f(x)$

(which is a continuous limit of the gradient method as the stepsize goes to zero). A physical intuition about this dynamical system in ${{\mathbb R}^2}$ is as follows: The function ${f}$ describes a landscape and ${x}$ are the coordinates of an object. Now, if the landscape is slippery the object slides down the landscape and if we omit friction and inertia, the object will always slide in the direction of the negative gradient. Consider now a favorable situation: ${f}$ is smooth, bounded from below and the level sets ${\{f\leq t\}}$ are compact. What can one say about the trajectories of the ${\dot x = -\nabla f(x)}$? Well, it seems clear that one will arrive at a local minimum after some time. But with a little imagination one can see that the trajectory of ${x}$ does not even has to be of finite length! To see this consider a landscape ${f}$ that is a kind of bowl-shaped valley with a path which goes down the hillside in a spiral way such that it winds around the minimum infinitely often. This situation seems somewhat pathological and one usually does not expect situation like this in practice. If you tried to prove convergence of the iterates of gradient or subgradient descent you may have noticed that one sometimes wonders why the proof turns out to be so complicated. The reason lies in the fact that such pathological functions are not excluded. But what functions should be excluded in order to avoid this pathological behavior without restricting to too simple functions?

3. The Kurdyka-Łojasiewicz inequality

Here comes the so-called Kurdyka-Łojasiewicz inequality into play. I do not know its history well, but if you want a pointer, you could start with the paper “On gradients of functions definable in o-minimal structures” by Kurdyka.

The inequality shall be a way to turn a complexity estimate for the gradient of a function into a complexity estimate for the function values. Hence, one would like to control the difference in functional value by the gradient. One way to do so is the following:

Definition 1 Let ${f}$ be a real valued function and assume (without loss of generality) that ${f}$ has a unique minimum at ${0}$ and that ${f(0)=0}$. Then ${f}$ satisfies a Kurdyka-Łojasiewicz inequality if there exists a differentiable function ${\kappa:[0,r]\rightarrow {\mathbb R}}$ on some interval ${[0,r]}$ with ${\kappa'>0}$ and ${\kappa(0)=0}$ such that

$\displaystyle \|\nabla(\kappa\circ f)(x)\|\geq 1$

for all ${x}$ such that ${f(x).

Informally, this definition ensures that that one can “reparameterize the range of the function such that the resulting function has a kink in the minimum and is steep around that minimum”. This definition is due to the above paper by Kurdyka from 1998. In fact it is a slight generalization of the Łowasiewicz inequality (which dates back to a note of Łojasiewicz from 1963) which states that there is some ${C>0}$ and some exponent ${\theta}$ such that in the above situation it holds that

$\displaystyle \|\nabla f(x)\|\geq C|f(x)|^\theta.$

To that that, take ${\kappa(s) = s^{1-\theta}}$ and evaluate the gradient to ${\nabla(\kappa\circ f)(x) = (1-\theta)f(x)^{-\theta}\nabla f(x)}$ to obtain ${1\leq (1-\theta)|f(x)|^{-\theta}\|\nabla f(x)\|}$. This also makes clear that in the case the inequality is fulfilled, the gradient provides control over the function values.

The works of Łojasiewicz and Kurdyka show that a large class of functions ${f}$ fulfill the respective inequalities, e.g. piecewise analytic function and even a larger class (termed o-minimal structures) which I haven’t fully understood yet. Since the Kurdyka-Łojasiewicz inequality allows to turn estimates from ${\|\nabla f(x_k)\|}$ into estimates of ${|f(x_k)|}$ it plays a key role in the analysis of descent methods. It somehow explains, that one really never sees pathological behavior such as infinite minimization paths in practice. Lately there have been several works on further generalization of the Kurdyka-Łojasiewicz inequality to the non-smooth case, see e.g. Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity by Bolte, Daniilidis, Ley and Mazet Convergence of non-smooth descent methods using the Kurdyka-Łojasiewicz inequality by Noll (however, I do not try to give an overview over the latest developments here). Especially, here at the French-German-Polish Conference on Optimization which takes place these days in Krakow, the Kurdyka-Łojasiewicz inequality has popped up several times.

A quick post to keep track of several things:

• Christian Leonard  has lecture notes on convex optimization with an application to optimal transport on his website.
• The paper Variational Properties of Value Functions by  Aravkin, Burke, and Friedlander discuss how the value of minimization problems like $\min \rho(Ax-b)\quad \mbox{s.t}\quad \phi(x)\leq tau$ depend on $\tau$ and $\latex b$. In inverse problems, the value function seems to contain important information on the the regularization process and hence, the results in this paper maybe helpful in designing and analyzing parameter choice rules.
• The paper Accelerated and Inexact Forward-Backward Algorithms by Villa,Salzo, Baldassarre, and Verri looks like an interesting development in the fiel of splitting methods.
• The paper  Consistency of the posterior distribution in generalized linear inverse problems by Natalia Bochkina is another contribution on “probabilitic inverse problems” where one does not only try to infer a regularized solution to an ill posed problems but also how the uncertainty in the data in propagated through the regularization process.

This is the last post in a series of posts (which was not intended to be a series). The series started with this post in which I reported some experience with job interviews I had in spring last year, and continued with this post which told what happened afterwards. The story ended with the paragraph “Now the final steps will be: Preparation of the next list of wishes, negotiations to stay and finally, the most difficult part, forming a well-founded decision.” That was about eleven month ago and was not totally wrong. Here is what happened then:

1. The Bleibeverhandlung

The Bleibeverhandlung (the negotiations to stay) is in principle similar to the negotiations with other universities. But there are important differences: The first one is due to the fact that there has been no official committee of the department involved so far and hence, the department has to form an opinion on how much they want to keep you. As far as I know this usually happens in an informal way and can be very different in different places and also the dean (or, in my case, the “speaker of the department”) may handle this in its own way. Also, you will not be involved in this step in any way (I think). The amount of support you get is crucial for the next steps. The department has to report its opinion to the dean (who is basically in control of the money here) and the dean has to decide about a strategy how to keep you (if your support is strong enough). Again, this is very different in different cases. Also, I do not know too much about this process, but at least there will be some communication between the dean and the president (or their offices). But after this procedure, the next steps of negotiations are basically the same as before: First negotiations with the dean, then with the president. Again, the first negotiation is somehow more important as many things are handled by the dean. In my case there was the question on which kind of position the department could keep me and how it could be arranged to fill this position with me. I have been Juniorprofessor (basically equal to assistant professor) and according to German law, there is not way of promotion to the “next level”. The university had to find an “open position for a professor”. But there was such a position (which I knew before since I had a tenure track position). The next obstruction was that, again according to German law, there usually has to be an official public announcement if such a position is to be filled. Consequently, anyone who is qualified could apply and should have a positive probability to get the job. However, I learned that my state has the possibility to fill a position without public announcement and it was part of my offer that my university “offered to start the initiation of this procedure”. It is somehow difficult to translate but the university could not even offer to initiate this “filling without public announcement” because this is something on which several committees has to agree.

2. The decision

Well, I had the official offer of my university pretty quick after negotiations. Basically, it was on par with the other offers (slightly better in some respects, slightly worse in others – but clearly no clear winner). The only caveat was, that there was no guarantee that I could get a permanent position because this depended on the decision of more committees. However, I had a formal statement that the president and the dean would initiate and support the procedure. Moreover, colleagues told me that my university had done a great job in keeping its promises in similar respects.

So, the decision was not easy. However, I decided not to play “ping-pong” with the other universities (which could be possible – but I can not tell you how that works) and to decide on basis of the facts I had after one round of negotiations. It was a tough and close decision which I do not comment in more detail here. But I decided to stay at TU Braunschweig.

3. Another application