我有一个TimeSeries DataFrame,我用它来创建一个热图(显示两个相关值之间最近观察到的差异)和两个折线图--一个显示序列本身,另一个显示序列之间的差异.我能够制作前两个图表,但是我不知道如何动态地计算两个所选系列之间的差额(使用.select_point()).下面是代码片段.
import pandas as pd
import numpy as np
import altair as alt
ex_ts = pd.DataFrame(
np.random.random((10, 5)),
columns=['a', 'b', 'c', 'd', 'e'],
index=(pd.date_range(start=pd.to_datetime('today')-pd.Timedelta(9, unit='D'), end=pd.to_datetime('today')).strftime('%Y-%m-%d'))
)
ex_ts_long = ex_ts.stack().reset_index().set_axis(['date', 'category', 'diff'], axis=1).assign(
x = lambda a: a['category'],
y = lambda a: a['category']
)
print(ex_ts_long.head())
# date category diff x y
# 0 2024-03-03 a 0.910670 a a
# 1 2024-03-03 b 0.608069 b b
# 2 2024-03-03 c 0.797001 c c
# 3 2024-03-03 d 0.139386 d d
# 4 2024-03-03 e 0.147499 e e
def get_last_diff(i):
return ex_ts.sub(ex_ts.iloc[:,i], axis=0).iloc[-1,:]
ex_z = pd.concat([get_last_diff(i) for i in np.arange(0, 5)], axis=1).set_axis(ex_ts.columns, axis=1).stack().reset_index().set_axis(['x', 'y', 'diff'], axis=1).round(2)
print(ex_z.head())
# x y diff
# 0 a a 0.00
# 1 a b -0.29
# 2 a c -0.16
# 3 a d -0.27
# 4 a e -0.38
select_x = alt.selection_point(fields=['x'], name='select_x')
select_y = alt.selection_point(fields=['y'], name='select_y')
base = alt.Chart(ex_z).encode(x='x', y='y', color='diff').add_params(select_x).add_params(select_y).properties(width=500, height=500)
hmap = base.mark_rect()
text = base.mark_text(fontWeight='bold').encode(text='diff', color=alt.value('red'))
hmap_chart = (hmap + text)
line_1 = alt.Chart(ex_ts_long).mark_line().encode(x='date', y='diff', color='category').transform_filter(select_x | select_y)
tmp = alt.vconcat(hmap_chart, line_1)
上面的代码创建了一个热图,您可以单击该热图来过滤底部的图表.然而,问题是,我想计算第一个折线图中两个序列之间的差异,并将其绘制出来.
最有希望的try 是通过将两个过滤后的图表添加到一起来创建一个新图表.我在两个过滤后的图表中进行了聚合,以便可以引用新变量来创建我要查找的变量,但这似乎不起作用.下面是更多示例代码.
rhs_line1 = alt.Chart(df_long).mark_line().transform_filter(select_y).transform_aggregate(
agg_y = 'sum(spread)', groupby=['date']
).encode(x='date:T', y='agg_y:Q')
rhs_line2 = alt.Chart(df_long).mark_line().transform_filter(select_x).transform_aggregate(
agg_x = 'sum(spread)', groupby=['date']
).encode(x='date:T', y='agg_x:Q')
rhs_line =(rhs_line1 + rhs_line1).transform_calculate(spread = 'datum.agg_y - datum.agg_x').encode(x='date:T', y='spread:Q')
final = alt.vconcat(hmap_chart, alt.hconcat(line_1, rhs_line))