使用Formula 1数据集——Pandas DataFrame有一个列"constructorID",该列的构造函数采用全小写下划线格式(如"red_bull"),而"constructor"列的构造函数格式正确且大写("red bull").我正在try 使用 colored颜色 编码&;图例链接到"constructor"列,而不是"constructorID",以便更清晰地显示,因为这是用于最终项目.但是,当"constructorID"按预期显示图表时,"constructor"返回一个Javascript错误.
constructorID唯一列值:
array(['williams', 'red_bull', 'toro_rosso', 'mclaren', 'alpine',
'mercedes', 'sauber', 'alphatauri', 'alfa', 'haas', 'renault',
'racing_point', 'ferrari', 'force_india', 'aston_martin'],
dtype=object)
构造函数的唯一列值为:
array(['Williams', 'Red Bull', 'Toro Rosso', 'McLaren', 'Alpine F1 Team',
'Mercedes', 'Sauber', 'AlphaTauri', 'Alfa Romeo', 'Haas F1 Team',
'Renault', 'Racing Point', 'Ferrari', 'Force India',
'Aston Martin'], dtype=object)
我试图通过创建最简单的图表进行调试:
alt.Chart(df_year_pts).mark_circle().encode(
color='constructorID:N',
x='driver_yr_pts:Q',
y='constructor_yr_pts:Q'
)
上面的代码非常有效.
alt.Chart(df_year_pts).mark_circle().encode(
color='constructor:N',
x='driver_yr_pts:Q',
y='constructor_yr_pts:Q'
)
上面的代码返回以下错误:
Javascript Error: Cannot read properties of undefined (reading 'params')
This usually means there's a typo in your chart specification. See the javascript console for the full traceback.
关于使用"构造函数"列作为编码可能会出现什么问题,有什么建议或 idea 吗?我真的不知道该列和"constructorID"列之间有什么区别.
编辑:
{'constructor': {0: 'Williams',
1: 'Red Bull',
2: 'Toro Rosso',
3: 'Red Bull',
4: 'McLaren'},
'constructorID': {0: 'williams',
1: 'red_bull',
2: 'toro_rosso',
3: 'red_bull',
4: 'mclaren'},
'constructor_yr_pts': {0: 0.0, 1: 417.0, 2: 85.0, 3: 319.0, 4: 30.0},
'driver': {0: 'Jack Aitken',
1: 'Alexander Albon',
2: 'Alexander Albon',
3: 'Alexander Albon',
4: 'Fernando Alonso'},
'driverID': {0: 'aitken', 1: 'albon', 2: 'albon', 3: 'albon', 4: 'alonso'},
'driver_yr_pts': {0: 0.0, 1: 76.0, 2: 16.0, 3: 105.0, 4: 17.0},
'year': {0: 2020, 1: 2019, 2: 2019, 3: 2020, 4: 2017}}
下面是我用来从Ergast API读入数据的基本代码
import altair as alt
import pandas as pd
from pyergast import pyergast
import requests
rounds_17 = pyergast.get_schedule(2017)
df_2017 = pd.DataFrame()
for i in range(len(rounds_17)):
i += 1
temp = pyergast.get_race_result(2017, i)
temp['year'] = 2017
temp['round'] = i
df_2017 = pd.concat([df_2017, temp], ignore_index=True)
rounds_18 = pyergast.get_schedule(2018)
df_2018 = pd.DataFrame()
for i in range(len(rounds_18)):
i += 1
temp = pyergast.get_race_result(2018, i)
temp['year'] = 2018
temp['round'] = i
df_2018 = pd.concat([df_2018, temp], ignore_index=True)
rounds_19 = pyergast.get_schedule(2019)
df_2019 = pd.DataFrame()
for i in range(len(rounds_19)):
i += 1
temp = pyergast.get_race_result(2019, i)
temp['year'] = 2019
temp['round'] = i
df_2019 = pd.concat([df_2019, temp], ignore_index=True)
rounds_20 = pyergast.get_schedule(2020)
df_2020 = pd.DataFrame()
for i in range(len(rounds_20)):
i += 1
temp = pyergast.get_race_result(2020, i)
temp['year'] = 2020
temp['round'] = i
df_2020 = pd.concat([df_2020, temp], ignore_index=True)
rounds_21 = pyergast.get_schedule(2021)
df_2021 = pd.DataFrame()
for i in range(len(rounds_21)):
i += 1
temp = pyergast.get_race_result(2021, i)
temp['year'] = 2021
temp['round'] = i
df_2021 = pd.concat([df_2021, temp], ignore_index=True)
df_total = pd.concat([df_2017,df_2018,df_2019,df_2020, df_2021], ignore_index=True)
df_total['points'] = pd.to_numeric(df_total['points'])
df_year_pts = df_total.groupby(['driverID','driver','year','constructorID','constructor'])['points'].sum().to_frame('year_pts').reset_index()
s_constructor_yr_pts = df_year_pts.groupby(['constructorID','year'])['year_pts'].sum()
df_year_pts = df_year_pts.merge(s_constructor_yr_pts, how='left',on=['constructorID','year']).rename(columns={'year_pts_x':'driver_yr_pts','year_pts_y':'constructor_yr_pts'})
所以df_year_pts
是我在alt.Chart中所说的