from statistics import mean
import pandas as pd
df = pd.DataFrame(columns=['A', 'B', 'C'])
df["A"] = [1, 2, 3, 4, 4, 5, 6]
df["B"] = ["Feb", "Feb", "Feb", "May", "May", "May", "May"]
df["C"] = [10, 20, 30, 40, 30, 50, 60]
df1 = df.groupby(["A","B"]).agg(mean_err=("C", mean)).reset_index()
df1["threshold"] = df1["A"] * df1["mean_err"]
我如何才能像Pyspark那样完成它,而不是最后一行代码.withColumn()?
此代码无效.我想通过动态使用操作的输出来创建新列,就像我们在Pyspark withColumn方法中所做的那样.
有人知道怎么做吗?