Some Procrustes-type Problems

Procrustes in the Greek mythology fits his guests to his bed. The mathematical problem after his name has a similar flavour. Suppose we have two set of vectors $X$, $Y$, where columns represent the vectors, and we want to find a linear transform $Q$ that takes $X$ to $Y$, $$ QX = Y $$ meaning we assume the column-to-column correspondence. If there is no constraint on what kind of transform we would like to have, then the answer can be $$ Q = YX^{\dagger} $$ where $X^{\dagger} = X^{T}(XX^{T})^{-1}$ assuming $XX^{T}$ invertible. The question is more instereting when we constrain $Q$ to be a rotation with possible reflection (i.e. orthonormal) $$ Q^{T}Q = I $$ which is of practical interests. Now its clear we don't always have $QX=Y$, so it's natural to ask for the solution of $$ \arg\min_{Q^TQ=I} \|QX-Y\|_{F}^2 $$ where $\|\cdot\|_{F}$ denotes the Frobenius norm. In other words, we want the transformed vectors $QX$ to be closest to $Y$ in a least square sense. Below the fold we will describe two ways to solve this problem with closed form solution.

Two approaches to the orthonormal Procrustes Problem


The first way is through standard constrained optimization approach. We consider the Lagrangian $$ L(Q, \Lambda) = \|QX-Y\|_{F}^2 + \sum_{i,j} \Lambda \odot (Q^TQ - I) $$ where $\odot$ denotes the entries-wise multiplication of matrices, sometimes called Hardamard product; the summation is over all indices. To find a critical point of $L$ and hence the minimum of the convex objective, we would need to differentiate $L(Q,\Lambda)$ with respect to entries of $Q$. To make our life easier, we will need to simplify the expression, by using the following identites for $A$, $B$ of size $m\times n$: $$ \begin{aligned} \|A\|_{F}^2 = \text{tr}(A^{T}A) \\ \text{tr}(A^TB) = \text{tr}(BA^T) = \sum_{i,j} A \odot B \\ \frac{\partial}{\partial X_{ij}}\text{tr}(X^TAX) = ((A^T + A)X)_{ij} \\ \frac{\partial}{\partial X_{ij}}\text{tr}(AX) = A_{ji} \end{aligned} $$ First, we write $$ L(Q,\Lambda) = \text{tr}(QXX^TQ^T) - 2\text{tr}(XY^TQ) + \text{tr}(Y^TY) + \text{tr}(\Lambda^T(Q^TQ-I)). $$ Differentiate in $Q$ gives $$ \frac{\partial L}{\partial Q} = 2QXX^T - 2YX^T + Q(\Lambda^T + \Lambda) $$ Setting this to zero, we have $$ (\Lambda^T + \Lambda)/2 + XX^T = Q^TYX^T $$ Now here is the crux. This means $Q^TXY^T$ must be symmetric $$ Q^TYX^T = XY^TQ $$ Multiplying both sides by $Q$, we get $$ YX^T = QXY^TQ $$ Squaring both sides, $$ YX^TXY^T = QXY^TYX^TQ^T $$ Suppose the eigen decopositions are $$ YX^TXY^T = VDV^T, XY^TYX^T = UDU^T $$ since we know they have the same eigenvalues, we must have $$ V = QU $$ or $Q = VU^T$. We can compute the matrices $U$ and $V$ by computing the singular value decomposition of $XY^T$ $$ XY^T = U\Sigma V^T $$ In the second approach we will effectively prove $Q = UV^T$ is the solution. We write $$ \|QX-Y\|_{F}^2 = \|QX\|_{F}^2 - 2\text{tr}(XY^TQ) + \|Y\|_{F}^2 $$ The last term has nothing to do with $Q$, so can be safely ignored. Now we invoke a lemma.
Lemma 1 If $Q$ is an orthonormal matrix, then $\|QX\|_{F}^2 = \|X\|_{F}^2$ for any matrix $X$.
Hence in fact the first term does not matter as well. We now want to maximize $$ \text{tr}(XY^TQ) $$ We proceed by taking the singluar value decomposition of $XY^T= U\Sigma V^T$, to get $$ \begin{aligned} \text{tr}(XY^TQ) &= \text{tr}(U\Sigma V^TQ) \\ &= \text{tr}(\Sigma V^TQU) \end{aligned} $$ Note that $\Sigma$ is diagonal and recall that they are all positive. So we are instereted in maximizing the diagonal entries of $V^TQU$.
Lemma 2 If $u, v$ are unit vectors, then $\langle u, v\rangle$ is at most $1$.
Viewing $V^TQ$ together, the first diagonal entry in the dot product between the fisrt row of $V^TQ$ and the first column of $U$, and it is at most $1$. It is then easy to see that the diagonal entries are maximized when $$ V^TQU = I $$ We again arrive at the solution $Q = VU^T$.

Relation to As-Rigid-As-Possible (ARAP) deformation


In ARAP, an important subproblem to solve is the following $$ \arg\min_{Q^TQ = I} \|\nabla f - Q\|_F^2 \,\,\,(1) $$ where $f$ is a mapping between two triangulated surfaes and we usually assume $\det \nabla f$ to be positive. Since locally on a triangle, $\nabla f$ maps edges to edges linearly. In the language of Procrustes problem, we are actually looking for $$ \arg\min_{Q^TQ = I} \|YX^{-1}- Q\|_F^2 $$ where we assume $X, Y$ are square and invertible. It is curious to note that this problem is not equivalent to the original Procrustes problem $$ \arg\min_{Q^TQ=I} \|QX-Y\|_{F}^2 \,\,\,(2) $$ which is actually equivalent to $$ \arg\min_{Q^TQ=I} \|YX^T-Q\|_{F}^2 $$ It turns out we can relate Problem (1) to Problem (2) by reweighing the quadratic energy column-wisely with respect to the geometry of the domain vectors, namely the cotangent weight. We refer to a previous post for details on that formulation.

 A more general problem 


Now we focus on Problem (1). Suppose we allow $Q$ to be different from a orthonormal a bit. To be precise, we set bounds to the singular values $\sigma_1, \sigma_2$ of $Q$, namely $$ k \leq \sigma_2 < \sigma_1 \leq K $$ We again ask what would be the best $Q$ that maps $X$ to $Y$ in a least square sense. It's tempting to suggest that the answer is $Q = U \Sigma^{*}V^T$, where $\Sigma^{*}$ is obtained by truncating the sigular values of $\nabla f=U \Sigma V^T$ to be inside the range $[k, K]$, i.e. $\sigma_i^* = \min(\max(\sigma_i, k), K)$. This is indeed true. To see it, we write $$ \begin{aligned} \|\nabla f - Q\|_F^2 & = \|U \Sigma V^T - Q\|_F^2 \\ & = \| \Sigma - U^T Q V\|_F^2 \end{aligned} $$ Let $Q = S\Sigma' P^T$ be the sigular value decomposition of $Q$. The above expression is $$ \| \Sigma - U^T S\Sigma' P^T V\|_F^2 =: \| \Sigma - U'^T\Sigma'V'\|_F^2 $$ which is equal to $$ \text{tr}(\Sigma^T \Sigma) - 2\text{tr}(\Sigma^TV'^T\Sigma'U') + \text{tr}(\Sigma'^T\Sigma') \,\,\,\,(3) $$ Note that only the middle term has to do with the rotation matrices. The problem now turns into maximizing $$ \text{tr}(\Sigma^TU'^T\Sigma'V') $$ over orthonormal matrices $U', V'$. Suppose we have ordered the diagonal entries of $\Sigma$ from large to small. Since they are all positive, we would like to maximize the diagonal entries of $U'^T\Sigma'V'$ and order them from large to small.
Proposition 3 Let $U',V'$ be orthonormal matrices, $\Sigma'$ be a matrix of zeros but non-negative on the diagonal. Suppose without loss of generality that the diagonal of $\Sigma'$ is ordered from large to small $\sigma_1' \geq \sigma_2' \geq \cdots$. Then we have $$ \text{diag}(U'^T\Sigma' V') \leq \text{diag}(\Sigma') $$ where we use $\leq$ as entry-wise comparison.
To prove this, observe that the $k$-th diagonal entry of $U'^T\Sigma' V'$ is $$ u_{k}'^T \Sigma' v_{k}' $$ We state some simple lemmas following the above notation, and the proposition will follow immediately.
Lemma 4 If $u, v$ are unit vectors, then $u^T \Sigma' v$ is at most $\sigma_1'$.
Lemma 5 If $u, v$ are unit vectors such that $u^T \Sigma' v = \sigma_1'$, then for any unit vectors $u', v'$ orthogonal to $u, v$ repectively, $u'^T \Sigma' v'$ is at most $\sigma_2'$.
These lemmas are in fact equivalent to the min-max characterization for singular values. Continue with the expression (3), we see when $U', V'$ are optimal, it is equal to $$ \text{tr}(\Sigma^T \Sigma) - 2\text{tr}(\Sigma^T\Sigma') + \text{tr}(\Sigma'^T\Sigma') $$ which is minimized when $\Sigma'$ is thresholded as described.

No comments:

Post a Comment