我试图根据连续值所处的范围,将一列连续浮点值映射为一些离散(带扣)值
例如
df_lookup = pd.DataFrame(data=[[0.0, 0.3, 10.1],
[0.3, 0.65, 30.3],
[0.65, 1.0, 50.5]],
columns=['start', 'end', 'mapped_value'])
# create intervals
df_lookup['interval'] = df_lookup.apply(lambda x:
pd.Interval(x['start'],
x['end'],
closed='both' if x['end']==1.0 else 'left'), axis=1)
df_lookup
输出:
start | end | mapped_value | interval | |
---|---|---|---|---|
0 | 0.00 | 0.30 | 10.1 | [0.0, 0.3) |
1 | 0.30 | 0.65 | 30.3 | [0.3, 0.65) |
2 | 0.65 | 1.00 | 50.5 | [0.65, 1.0] |
df_data=pd.DataFrame(data=[['A', 0.3],
['B', 0.65],
['C', 0.6],
['D', 0.75],
['E', 0.4]],
columns=['ID', 'original_value'])
df_data
ID | original_value | |
---|---|---|
0 | A | 0.30 |
1 | B | 0.65 |
2 | C | 0.60 |
3 | D | 0.75 |
4 | E | 0.40 |
此时,我使用pandas.DataFrame.apply
获得查找值,但
df_data['mapped_value'] = df_data.apply(
lambda x: df_lookup.loc[x['original_value'] in df_lookup['interval']]['mapped_value'],
axis=1)
但这告诉我KeyError: 'False: boolean label can not be used without a boolean index'
Further investigation shows me that the issue I have is that when I do the in
I just get a single boolean value returned not a list of booleans, 例如, for data ID='A'
where the original value
is 0.3, I am hoping that x['original_value'] in df_lookup['interval']
would return [False, True, False]
but in fact its returning False
我很想在这里了解一下如何实现这种"查找"映射.谢谢