在R数据表中,有一种方法可以结合最终向量的长度来重复向量.例如,我创建了名为‘PERIOD’的变量,它重复了向量Q.
R代码:
require(data.table)
q = c(1:3)
test = data.table(IDx = c(rep('42', 5) , rep('76', 3), rep('43', 3), rep('5', 2)),
IDy = c(rep('A', 5) , rep('A', 3), rep('B', 3) , rep('C',2)))
test[, period := rep(q, length.out = .N), by =c('IDx','IDy')]
IDx IDy period
1: 42 A 1
2: 42 A 2
3: 42 A 3
4: 42 A 1
5: 42 A 2
6: 76 A 1
7: 76 A 2
8: 76 A 3
9: 43 B 1
10: 43 B 2
我正试图用Python复制这个函数,但我有点卡住了.Cumcount函数只能通过考虑一旦达到最后一个索引就应该重新开始的序列Q来apply.
q = [1,2,3]
valuesX = ['42'] * 5 + ['76'] * 3 + ['43'] * 3 + ['5'] * 2
valuesY = ['A'] * 5 + ['A'] * 3 + ['B'] * 3 + ['C'] * 2
test = pd.DataFrame({'IDx':valuesX,
'IDy':valuesY})
print(test.groupby(['IDx','IDy']).cumcount()+1)
试图接近,
def repeat(seq, ind):
length = len(ind.index)
print(length)
multiple, remainder = divmod(length, len(seq))
test['t'] = seq * multiple + seq[:remainder]
print(test.groupby(['IDx']).apply(lambda x: repeat(q, x)))