我正在测试Numba JIT和PythonC扩展的性能.对于基于循环的函数,在计算2D数组中所有元素的总和时,C扩展似乎比Numba扩展快约3-4倍.
最新情况:
基于有价值的 comments ,我意识到了一个错误,我应该编译(调用)一次Numba JIT.我提供了修复后的测试结果以及额外的 case .But.问题仍然是何时以及如何考虑采用哪种方法.
下面是结果(time_s,value):
# 200 tests mean (including JIT compile inside the loop)
Pure Python: (0.09232537984848023, 29693825)
Numba: (0.003188209533691406, 29693825)
C Extension: (0.000905141830444336, 29693825.0)
# JIT once called before the test loop (to avoid compile time)
Normal: (0.0948486328125, 29685065)
Numba: (0.00031280517578125, 29685065)
C Extension: (0.0025129318237304688, 29685065.0)
# JIT no warm-up also no test loop (only calling once)
Normal: (0.10458517074584961, 29715115)
Numba: (0.314251184463501, 29715115)
C Extension: (0.0025091171264648438, 29715115.0)
- 我的实现正确吗?
- 为什么C扩展速度更快,有什么原因吗?
- 如果我想要最好的性能,我应该总是使用C扩展吗?(非矢量化函数)
main.py
个
import numpy as np
import pandas as pd
import numba
import time
import loop_test # ext
def test(fn, *args):
res = []
val = None
for _ in range(100):
start = time.time()
val = fn(*args)
res.append(time.time() - start)
return np.mean(res), val
sh = (30_000, 20)
col_names = [f"col_{i}" for i in range(sh[1])]
df = pd.DataFrame(np.random.randint(0, 100, size=sh), columns=col_names)
arr = df.to_numpy()
def sum_columns(arr):
_sum = 0
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
_sum += arr[i, j]
return _sum
@numba.njit
def sum_columns_numba(arr):
_sum = 0
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
_sum += arr[i, j]
return _sum
print("Pure Python:", test(sum_columns, arr))
print("Numba:", test(sum_columns_numba, arr))
print("C Extension:", test(loop_test.loop_fn, arr))
ext.c
个
#define PY_SSIZE_CLEAN
#include <Python.h>
#include <numpy/arrayobject.h>
static PyObject *loop_fn(PyObject *module, PyObject *args)
{
PyObject *arr;
if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &arr))
return NULL;
npy_intp *dims = PyArray_DIMS(arr);
npy_intp rows = dims[0];
npy_intp cols = dims[1];
double sum = 0;
PyArrayObject *arr_new = (PyArrayObject *)PyArray_FROM_OTF(arr, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);
double *data = (double *)PyArray_DATA(arr_new);
npy_intp i, j;
for (i = 0; i < rows; i++)
for (j = 0; j < cols; j++)
sum += data[i * cols + j];
Py_DECREF(arr_new);
return Py_BuildValue("d", sum);
};
static PyMethodDef Methods[] = {
{
.ml_name = "loop_fn",
.ml_meth = loop_fn,
.ml_flags = METH_VARARGS,
.ml_doc = "Returns the sum using for loop, but in C.",
},
{NULL, NULL, 0, NULL},
};
static struct PyModuleDef Module = {
PyModuleDef_HEAD_INIT,
"loop_test",
"A benchmark module test",
-1,
Methods};
PyMODINIT_FUNC PyInit_loop_test(void)
{
import_array();
return PyModule_Create(&Module);
}
setup.py
个
from distutils.core import setup, Extension
import numpy as np
module = Extension(
"loop_test",
sources=["ext.c"],
include_dirs=[
np.get_include(),
],
)
setup(
name="loop_test",
version="1.0",
description="This is a test package",
ext_modules=[module],
)
python3 setup.py install