I'm trying to parallelize a simple python program using numba
. It consists of two functions. The first one is the simple power method
@numba.jit(nopython=True)
def power_method(A, v):
u = v.copy()
for i in range(3 * 10**3):
u = A @ u
u /= np.linalg.norm(u)
return u
And the second function iterates with the vector v
over the grid and runs the power_method
function for various vectors v
.
@numba.jit(nopython=True, parallel=True)
def iterate_grid(A, scale, sz):
assert A.shape[0] == A.shape[1] == 3
n = A.shape[0]
results = np.empty((sz**3, n))
tmp = np.linspace(-scale, scale, sz)
for i1 in range(sz):
v = np.empty(n, dtype=np.float64)
v1 = tmp[i1]
for i2, v2 in enumerate(tmp):
for i3, v3 in enumerate(tmp):
v[0] = v1
v[1] = v2
v[2] = v3
u = power_method(A, v)
idx = i1 * sz**2 + i2 * sz + i3
results[idx] = u.copy()
return results
Then I run it with
n = 3
A = np.random.randn(n, n)
iterate_grid(A, 5.0, 20)
All iterations are independent. Moreover, the calculations ideally fall into the cache, so I would expect that parallelizing the first loop with prange
will give approximately linear acceleration.
However, the wall time for the sequential code is 6.07 s, while for the parallel code
@numba.jit(nopython=True, parallel=True)
def iterate_grid(A, scale, sz):
assert A.shape[0] == A.shape[1] == 3
n = A.shape[0]
results = np.empty((sz**3, n))
tmp = np.linspace(-scale, scale, sz)
for i1 in numba.prange(sz):
v = np.empty(n, dtype=np.float64)
v1 = tmp[i1]
for i2, v2 in enumerate(tmp):
for i3, v3 in enumerate(tmp):
v[0] = v1
v[1] = v2
v[2] = v3
u = power_method(A, v)
idx = i1 * sz**2 + i2 * sz + i3
results[idx] = u.copy()
return results
the wall time is 7.79 s. That is, parallelization slows down the code in this case.
Moreover, as I can see from iterate_grid.parallel_diagnostics(level=4)
, numba fuses
tmp = np.linspace(-scale, scale, sz)
and for i1 in range(sz):
which is incorrect, because I need all values of tmp
in the inner loops.
Parallelizing the power method is not the task I'm actually trying to solve, it's just a small example on which numba does not behave as I expect. Can you explain to me this behavior of numba and advise me how I can parallelize iterations on the grid to get linear acceleration?
Thank you in advance for your help!