无论您 Select 了shuffle
个,您仍然会得到随机 Select .但是,如果 Select shuffle=False
,则输出的顺序与输入的顺序无关.
当 Select 的项目数等于项目总数时,这最容易看出:
import numpy as np
rng = np.random.default_rng()
x = np.arange(10)
rng.choice(x, 10, replace=False, shuffle=False)
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
rng.choice(x, 10, replace=False, shuffle=True)
# array([8, 1, 3, 9, 6, 5, 0, 7, 4, 2])
如果您减少 Select 的物品数量并使用shuffle=False
,则可以确认缺失的物品是否按预期分发.
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
x = np.arange(10)
set_x = set(x)
missing = []
for i in range(10000):
# By default, all `p` are equal, so which item is
# missing should be uniformly distributed
y = rng.choice(x, 9, replace=False, shuffle=False)
set_y = set(y)
missing.append(set_x.difference(set_y).pop())
plt.hist(missing)
但您会看到x
中较早出现的项往往会在输出中较早出现,反之亦然.也就是说,输入和输出顺序是相关的.
x = np.arange(10)
correlations = []
for i in range(10000):
y = rng.choice(x, 9, replace=False, shuffle=False)
correlations.append(stats.spearmanr(np.arange(9), y).statistic)
plt.hist(correlations)
如果这对您的应用程序来说可以,请随时设置shuffle=False
以加速.
%timeit rng.choice(10000, 5000, replace=False, shuffle=True)
# 187 µs ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rng.choice(10000, 5000, replace=False, shuffle=False)
# 146 µs ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
要 Select 的项目越多,加速就越明显.
%timeit rng.choice(10000, 1, replace=False, shuffle=True)
# 17.6 µs ± 3.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rng.choice(10000, 1, replace=False, shuffle=False)
# 16.5 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
vs
%timeit rng.choice(10000, 9999, replace=False, shuffle=True)
# 214 µs ± 32.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rng.choice(10000, 9999, replace=False, shuffle=False)
# 124 µs ± 27.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)