我有以下测试日期帧:
| tag | list | Count |
| -------- | ----------------------------------------------------|-------|
| icecream | [['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]] | 5 |
| potato | [['U',0.8],['V',0.7],['W',0.4],['X',0.3]] | 4 |
| cheese | [['I',0.2],['J',0.4]] | 2 |
我想随机抽样列表列,从列表的前4个列表中挑选任何3个.(就像[‘E’,0.1]甚至不考虑标签=冰淇淋).
该规则应该能够从列表中随机挑选3个列表.如果少于3个,则 Select 存在的任何一个并随机化.
每次结果都应该是随机的,因此需要为相同的输出设定种子:
| tag | list |
| -------- | -------------------------------|
| icecream | [['B',0.6],['C',0.5],['A',0.9]]|
| potato | [['W',0.4],['X',0.3],['U',0.8]]|
| cheese | [['J',0.4],['I',0.2]] |
这就是我try 过的:
data = [['icecream', [['A', 0.9],['B', 0.6],['C',0.5],['D',0.3],['E',0.1]]],
['potato', [['U', 0.8],['V', 0.7],['W',0.4],['X',0.3]]],
['cheese',[['I',0.2],['J',0.4]]]]
df = pd.DataFrame(data, columns=['tag', 'list'])
df['Count'] = df['list'].str.len().sort_values( ascending=[False])
df
--
import random
item_top_3 = []
find = 4
num = 3
for i in range(df.shape[0]):
item_id = df["tag"].iloc[i]
whole_list = df["list"].iloc[i]
item_top_3.append([item_id, random.sample(whole_list[0:find], num)])
--
I get this error:
ValueError: Sample larger than population or is negative.
有谁能帮忙把它随机化.最初的DataFrame有超过50,000行,我想对任何规则进行随机化,比如明天有人可能想从列表的前20个元素中随机挑选5个项目,但它应该仍然有效.