Branch And Bound of QUBO, Problem Reformulation

Again, we are still looking at the QUBO problem. Here, we will implement a diagonal shift to strengthen the relaxations that we generate. For dense problem instances, this strengthens the formulation quite a bit, and for sparse instances, it strengthens the problem an amazing amount.

Looking at different ways to state the same problem

We can state our optimization problem in a few different ways, such as where we have a nominal problem.

\[\begin{align} \min_{x\in\mathbb{B}^n} \frac{1}{2}x^TQx + c^Tx \end{align}\]

However, given $x\in\mathbb{B}^n$, we can shift the problem via a vector $u$ that will keep the function identical for any binary vector (but not continuous vectors). This gives us some play in how the problem is actually stated as a function of this $u$ vector.

\[\begin{align} \min_{x\in\mathbb{B}^n} \frac{1}{2}x^T(Q - \text{diag}\{u\})x + (c+0.5u)^Tx \end{align}\]

However, we need to assert some guard-rails on this problem, e.g., what choices of $u$ make a (reasonable) relaxed problem. As our current relaxed subproblems are QPs, we would prefer this shifted problem to be convex. That means we enforce the condition that $Q - \text{diag}\{u\} \succeq 0$. This gives us a somewhat clearer view of what this shift is actually doing; by playing with $u$, we are basically modifying the eigenvalues of the quadratic term of our optimization problem. In simpler terms, we can play with the curvature of the QUBO problem by messing with $u$.

One of the first things we can do is make $u$ a vector of a single constant, and see what happens to the lower bounds we generate from our QP relaxation. Here, we can make a random problem with 100 variables with a density of 0.02. If we vary $u$ to move the lowest eigenvalue of $Q$ to be between $0$ and $10$ and solve the relaxations, we see a huge effect in the value of the lower bound. This has been commented on in the literature as the QCR (Quadratic Convex Reformulation), where it is generally accepted that as long as all eigenvalues are positive, we want to minimize the magnitude of the eigenvalues. The conventional wisdom holds here, for at least just shifting the eigenvalues up and down.

However, by keeping $u$ a constant vector, we only have a 1D lever to play with the eigenvalues of the $Q$ term. If we open up the rest of our degrees of freedom, we actually arrive at an optimization problem to generate the ‘best’ $u$. This is an SDP (semidefinite program) in the same spirit as the one proposed in [1], where we are trying to minimize the sum of the eigenvalues of the shifted problem.

\[\begin{align} \max_{u\in\mathbb{R}^n} &\sum^n_i u_i\\ \text{s.t. } &Q - \text{diag}\{u\} \succeq 0 \end{align}\]

While, in general, SDPs are not generally ‘cheap’ operations, the improvement on the lower bound can be quite the boost when solving for global solutions. For this problem, the SDP took 0.2 seconds using the SDPA solver on my desktop. Now, we can compare how the eigenvalue shift method works compared to the SDP-driven method. The lower bound of the root node given by eigenvalue shift is -16.052, and the SDP-based method gives a lower bound of -15.637. Compare this with the global solution value of -15.417; this is significantly tighter than the original problem statement by a factor of 3x.

If we compare the nodes on the branch and bound tree, we need to check to find the global solution, which is even more dramatic. We solve both problems using the PSO algorithm (Particle Swarm Optimization) that comes with Hercules. The eigenvalue shift problem required traversing 40,000 nodes and solving 20,000 subproblems. If we compare this with the SDP-based approach, we only traversed 129 nodes and solved 64 subproblems. We have come quite a long way since the original version of this solver! We only needed to solve $\approx 5\cdot 10^{-27}\%$ of the possible number of problems we would have to solve if we compared ourselves to the direct enumerations approach. If we compare the times, we see a large speed up, 0.019(BnB) + 0.23(SDP) vs 0.868 seconds, or about a 4x speed uplift.

Benchmarking

Now that we know how to speed up the solve times let’s see if this generalizes or if we were just lucky. Here, I will make a test set of 50 problems with 75 variables and a density of 10%. We will compare how long it takes to solve the SDP-based shifted problem with the shifted eigenvalue formulation using the Hercules branch and bound solver. The time to solve the SDP problem includes the time to solve the SDP problem, so there is no hidden trickery.

It is not hard to see a massive boost in the solvability of these problems when we use the optimal $u$ shift to solve a more favorable problem. With an average speedup of 42x, this lifts the ceiling of problems Hercules can solve (with proof of global optimality). If we chain all of the gains starting from the first blog post, we are at roughly 700,000 times faster than the first version of the solver. The SDP reformulation is not quite in Hercules yet, as I am still trying to find out the best method to integrate the SDP solver, currently Clarabel, which I am using to solve the subproblems. We can solve this SDP, but it has a very special format, so I am wondering if we can implement a specialized algorithm to solve this rapidly without calling on another dependency. But for now, I am using the cvxpy package in Python to formulate the SDP and the SDPA solver to actually solve the problem.

I am very happy to announce that we now have an open source sponsor of this work, Thank you Integrated Reasoning!

SDP Code

import cvxpy as cp

def sdp_u_shift(Q):

    # get problem dim. 
    n = Q.shape[0]
    
    # Create a diag matrix variable
    L = cp.Variable((n,n), diag=True)

    # add our constraint
    constraints = [Q - L >> 0]
    prob = cp.Problem(cp.Maximize(cp.trace(L)),
                      constraints)
    prob.solve(solver = cp.SDPA,)

    return L.value.toarray()

[1] - Using a Mixed Integer Quadratic Programming Solver for the Unconstrained Quadratic 0-1 Problem; Alain Billionnet & Sourour Elloumi; Mathematical Programming; 2006