假设我有一个num_indices * n
指数矩阵(在range(m)
中)和一个num_indices * n
值矩阵,即.,
m, n = 100, 50
num_indices = 100000
indices = np.random.randint(0, m, (num_indices, n))
values = np.random.uniform(0, 1, (num_indices, n))
我想得到一个形状为m * n
的结果矩阵,这样每个列都被索引,并在索引和值矩阵中的相应列上求和.以下是一些实现:
- The basic for loop
tic = time.time()
res1 = np.zeros((m, n))
for i in range(num_indices):
for j in range(n):
res1[indices[i, j], j] += values[i, j]
print(f'element-wise for loop used {time.time() - tic:.5f}s')
- np.add.at, in multidimensional
tic = time.time()
res2 = np.zeros((m, n))
np.add.at(res2, (indices, np.arange(n)[None, :]), values)
print(f'np.add.at used {time.time() - tic:.5f}s')
- np.add.at, with for loop
tic = time.time()
reslst = []
for colid in range(n):
rescol = np.zeros((m, 1))
np.add.at(rescol, (indices[:, colid], np.zeros(num_indices, dtype=int)), values[:, colid])
reslst.append(rescol)
res3 = np.hstack(reslst)
print(f'np.add.at with loop used {time.time() - tic:.5f}s')
- scipy.sparse, in multidimensional
from scipy import sparse
tic = time.time()
res4 = sparse.csr_matrix((values.ravel(), (indices.ravel(), np.tile(np.arange(n), num_indices))), shape=(m, n)).toarray()
print(f'sparse.csr used {time.time() - tic:.5f}s')
- scipy.sparse, with for loop
tic = time.time()
reslst = []
for colid in range(n):
rescol = sparse.csr_matrix((values[:, colid], (indices[:, colid], np.zeros(num_indices, dtype=int))), shape=(m, 1)).toarray()
reslst.append(rescol)
res5 = np.hstack(reslst)
print(f'sparse.csr with loop used {time.time() - tic:.5f}s')
结果都是一样的:
assert np.isclose(res2, res1).all()
assert np.isclose(res3, res1).all()
assert np.isclose(res4, res1).all()
assert np.isclose(res5, res1).all()
然而,速度很奇怪,也不令人满意:
for loop used 1.69889s
np.add.at used 0.17287s
np.add.at with loop used 0.47767s
sparse.csr used 0.16847s
sparse.csr with loop used 0.09845s
我的基本问题是:
- 为什么
np.add.at
这么慢——甚至比scipy.sparse
慢?我觉得numpy会受益匪浅,尤其是在多维的情况下. - 对于
scipy.sparse
,为什么多维比循环更慢?不使用并发吗?(为什么对于numpy来说,多维更快?) - 总的来说,有没有更快的方法来实现我想要的?