Hunting Down Shifts and Allocations

This post returns to QUBOs. In an earlier post, I made MixingCut a solver for the factorized version of the MAXCUT SDP relaxation.

\[\begin{align*} \min_{Y}\quad &\text{tr}(QYY^T) \\ s.t. &||Y_{i}||_2 = 1\\ \end{align*}\]

While the inital plan was to directly integrate that into Hercules as a subproblem solver, this is not in the cards for now. However, we can still use it to generate the optimal (or at least approximate) diagonal shift vector that we needed a full blown SDP solver to generate by finding the dual variables. The reason, we want to do that is so we can tighten the relaxations that we will be applying across the board.

If we plug the factorized form of the problem back into the original problem, as seen bellow, we can derive the KKT conditions, and use that as a simple guild to calculate the dual variables. In this problem, we are dropping the constraint $YY’ \succeq 0$, as this is true for any realization of $Y$.

\[\begin{align} \min_{Y} \quad &\text{tr}(QYY')\\ \text{s.t.} &\text{tr}(E_{ii}YY') = 1, \forall i \in I \end{align}\]

The stationarity conditions are somewhat direct to calculate, with just a little bit of matrix calculus, we can see the stationarity condition. Unfortunatly, there is not a very good route (that is apparent to me) of how to use this in the first order methods that we have implimented in the solver so far, but it does give us a route to find and approximation of what $\lambda$ should be given an approximate optimal $Y$.

\[\begin{align} (Q + \text{diag}(\lambda))Y = 0 \end{align}\]

If we have a candidate $Y$, then we would like to find a $\lambda$ that would minimixe the error in the stationarity condition.

\[\begin{align} \min_\lambda ||QY - \text{diag}(\lambda)Y||_2^2\\ \end{align}\]

Now this is an unconstrained least squares problem, which is a solved topic, but this is a very nice optimization problem in that, it is seperable, so we get a rather nice and simple answer on the solutions.

\[\begin{align} \lambda = \langle Q Y, Y\rangle \end{align}\]

Given that, the solutions $Y^*$, are approximate, then we shouldn’t nessisarilly trust that the shift will always exactly generate a shifted hessian that is psd e.g. $Q(\lambda) = Q + \text{diag}(\lambda) \succeq 0$, so a way to get around this is to compute the lowest eiganvalue of $Q(\lambda)$ and then shift $Q(\lambda)$ by $\mu \mathbb{I}$. So the overall diagonal shift applied would be $\mu \mathbb{I} + \text{diag}(\lambda)$.

What this means is that we can use a simple first order method to generate approximate solutions to $Y$, then use that to solve for an approximate $\lambda$, and then strenghten the problem formulation.

Some Performance Tests

We tested time to compute the shift vector and compared it to two other solvers, SDPA and SCS. Here OOM means out of memory, and OOT means out of time. The time is reported in seconds. I think it is somewhat clear that SDPA has a special case solver for MAXCUT type problems, as we can see that it is more or less able to scale up to the problem better then SCS is. SCS is a capable solver, but it doesn’t have a special case for this problem type so it can’t attack this problem as effectively as SPDA is. These are better results then I thought I would get, so it might make sense to try to apply this more extensively in the branch and bound tree. The email_eu problem was solved to lower precision then the previous problems, still feasible solution but with a stationarity error sum residual of 1E-2 instead of 1E-4 like the other tests.

Problem Name	Problem Size	NNZ	MIXINGCUT	SPDA	SCS
bqp50-1	51	316	0.00158	0.022	2.087
be100.1	101	10006	0.0135	0.74	118.1
mk487a	487	2870	0.088	2.145	OOT
G_3	800	38352	0.16	7.927	OOT
G_6	800	38352	0.19	1.989	OOT
G_27	2000	39980	0.35	23.81	OOT
G_48	3000	12000	0.59	33.28	OOT
G_81	20000	80000	51.71	OOM	OOM
enron	36692	367662	52.31	OOM	OOM
email_eu	265214	730051	205.06	OOM	OOM

An aside on accidental Allocations

When doing looking, at this I noticed that this was not as fast as it should have been. In that, in the innermost loop of the first order method we were allocating memory, which we shouldn’t have been! This memory allocation and deallocation, is wasting over 60% of run time in the hottest loop. If I hadn’t profiled it with Intel VTune, then I wouldn’t have seen the memory allocations, so it is important to always profile your code if you are trying to get every cycle out of your machine.

This is the hottest loop in the program. It might not seem like there is any problem here, but if you look closely we are computing a temperary Array at v * &Y.row(k), which allocates an Array at every single iteration of the inner loop.

    // make a scratch array
    let mut g_i = Array1::<f64>::zeros(Y.shape()[1]);

    for i in 0..n {

        ...
        for (k, &v) in Q_i.iter() {
            g_i = g_i + v * &Y.row(k);
        }
        ...
        g_i.fill(0.0f64);
    }

Now, this is a massive problem, as we are much slower then we should be. The solution to this is to make another scratch vector temp, and do inplace assignments to remove the construction of the temparary Array.

    // make a scratch arrays
    let mut g_i = Array1::<f64>::zeros(Y.shape()[1]);
    let mut temp = Array1::<f64>::zeros(Y.shape()[1]);

    for i in 0..n {

        ...
        for (k, &v) in Q_i.iter() {
            temp.assign(&Y.row(k));
            temp *= v;
            g_i -= &temp;
        }
        ...
        g_i.fill(0.0f64);
    }

With this small modification, we are able to speed this program up approximatly 60%, compaired to the previous version of the code.

Math Appendix

Derivation of the stationarity conditions.

\[\begin{align} \nabla_{Y} \text{Tr}(QYY') + \sum_i \lambda_i \nabla_{Y} \text{Tr}(E_{ii}YY') = 0\\ 2QY + 2 \sum_i \lambda_i E_{ii}Y = 0\\ QY + \left(\sum_i \lambda_i E_{ii}\right) Y = 0\\ (Q + \text{diag}(\lambda))Y = 0 \end{align}\]

Derivation of the approximate dual solution.

\[\begin{align} \min_{\lambda} &\quad ||QY - \text{diag}(\lambda)Y||_2^2 \\ \min_{\lambda_{i}} &\quad||(QY)_i - \lambda_i Y_i||_2^2 \\ \min_{\lambda_{i}} &\quad\lambda_i^2 ||Y_i||_2^2 - 2 \langle (QY)_i, Y_i \rangle \lambda + ||(QY)_i||^2_2 \\ \min_{\lambda_{i}} &\quad\lambda_i^2 - 2 \langle(QY)_i, Y_i\rangle\lambda + ||(QY)_i||^2_2\\ \lambda_i &= \langle(QY)_i, Y_i\rangle\\ \lambda &= \langle QY, Y\rangle\\ \end{align}\]

Some Performance Tests

An aside on accidental Allocations

Math Appendix

Related Posts

4-ier transforms - The most Fours 29 Jan 2025

MixingCut - Rust MAXCUT SDP solver 03 Oct 2024

Projected Gradient Descent Method - Line Search + GPU = Fast 15 Aug 2024