Numpy Block is slow

I have been writing a solver that makes heavy use of Numpy. Due to error checking and other overhead in NumPy, we can avoid these costs. In a situation where one needs to compose hundreds of millions of matrices can cause performance issues.

I am comparing 3 different classes of matrix blocking.

  • Blocking many single-row matrices into one matrix
  • Large matrix composition

As you can see that the specialized version of NumPy matrix blocking is approximately 2 to 3 times faster!

  Custom Numpy
Row Matrices .7 mSec 1.6 mSec
Block Matrices 67.8 uSec 198 uSec

This can be run using the following script. Try it out on your computer!

import numpy
row = numpy.array([[i] for i in range(10)])

row_blocking = [row for i in range(1000)]

%timeit dustin_block(row_blocking)
%timeit numpy.block(row_blocking)


brick = numpy.eye(5)

row = [brick for i in range(10)]
row_blocking = [row for i in range(10)]

%timeit dustin_block(row_blocking)
%timeit numpy.block(row_blocking)

def dustin_block(mat_list:List[List[numpy.ndarray]]):
    if type(mat_list[0]) is not list:
        mat_list = [mat_list]

    x_size = 0
    y_size = 0

    for i in mat_list[0]:
        x_size += i.shape[1]

    for j in range(len(mat_list)):
        y_size += mat_list[j][0].shape[0]

    output_data = numpy.zeros((y_size, x_size))

    x_cursor = 0
    y_cursor = 0

    for mat_row in mat_list:
        y_offset = 0

        for matrix_ in mat_row:
            shape_ = matrix_.shape
            output_data[y_cursor: y_cursor + shape_[0], x_cursor: x_cursor + shape_[1]] = matrix_
            x_cursor += shape_[1]
            y_offset = shape_[0]

        y_cursor += y_offset
        x_cursor = 0

    return output_data