我有以下代码,可以正常工作.

import numpy as np 
import polars as pl 

data = {
    "date": ["2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-07", "2021-01-08", "2021-01-09", "2021-01-10", "2021-01-11", "2021-01-12", "2021-01-13", "2021-01-14", "2021-01-15", "2021-01-16", "2021-01-17", "2021-01-18", "2021-01-19", "2021-01-20"],
    "close": np.random.randint(100, 110, 10).tolist() + np.random.randint(200, 210, 10).tolist(),
    "company": ["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B"] 
}
df = pl.DataFrame(data).with_columns(date = pl.col("date").cast(pl.Date))

# Calculate Returns
R = pl.col("close").pct_change()

# Calculate Gains and Losses
G = pl.when(R > 0).then(R).otherwise(0).alias("gain")
L = pl.when(R < 0).then(R).otherwise(0).alias("loss")

# Calculate Moving Averages for Gains and Losses
window = 3
MA_G = G.rolling_mean(window).alias("MA_gain")
MA_L = L.rolling_mean(window).alias("MA_loss")

# Calculate Relative Strength Index based on Moving Averages
RSI = (100 - (100 / (1 + MA_G / MA_L))).alias("RSI")

df = df.with_columns(R, G, L, MA_G, MA_L, RSI)

df.head()

我喜欢使用polars组成不同步骤的能力,因为它保持了代码的可读性和易维护性(与方法链接相反).请注意,归根结底,计算更加复杂.

然而,现在我想计算上面的列,但按"公司"分组.我试着在相关的地方加了.over("company").然而,这并不管用.

# Calculate Returns
R = pl.col("close").pct_change().over("company")

# Calculate Gains and Losses
G = pl.when(R > 0).then(R).otherwise(0).alias("gain")
L = pl.when(R < 0).then(R).otherwise(0).alias("loss")

# Calculate Moving Averages for Gains and Losses
window = 3
MA_G = G.rolling_mean(window).alias("MA_gain").over("company")
MA_L = L.rolling_mean(window).alias("MA_loss").over("company")

# Calculate Relative Strength Index based on Moving Averages
RSI = (100 - (100 / (1 + MA_G / MA_L))).over("company").alias("RSI")

df = df.with_columns(R, G, L, MA_G, MA_L, RSI)

df.head()

Questions

1.)在保持上述代码方法的同时,修复此"window expression not allowed in aggregation"错误的最佳方法是什么?

2.)相关问题:为什么聚合中不允许使用窗口表达式?从技术Angular 来看,这有什么问题?有人能用通俗易懂的语言给我解释一下吗?

谢谢!

推荐答案

over()适用于整个表达式,因此如果您删除MA_G/MA_L列上的over,则代码实际上可以正常工作:

# Calculate Returns
R = pl.col("close").pct_change().over("company")

# Calculate Gains and Losses
G = pl.when(R > 0).then(R).otherwise(0).alias("gain")
L = pl.when(R < 0).then(R).otherwise(0).alias("loss")

# Calculate Moving Averages for Gains and Losses
window = 3
MA_G = G.rolling_mean(window).alias("MA_gain")
MA_L = L.rolling_mean(window).alias("MA_loss")

# Calculate Relative Strength Index based on Moving Averages
RSI = (100 - (100 / (1 + MA_G / MA_L))).over("company").alias("RSI")

df = df.with_columns(R, G, L, MA_G, MA_L, RSI)

df.head()

────────────┬───────────┬─────────┬──────────┬───────────┬──────────┬───────────┬────────────┐
│ date       ┆ close     ┆ company ┆ gain     ┆ loss      ┆ MA_gain  ┆ MA_loss   ┆ RSI        │
│ ---        ┆ ---       ┆ ---     ┆ ---      ┆ ---       ┆ ---      ┆ ---       ┆ ---        │
│ date       ┆ f64       ┆ str     ┆ f64      ┆ f64       ┆ f64      ┆ f64       ┆ f64        │
╞════════════╪═══════════╪═════════╪══════════╪═══════════╪══════════╪═══════════╪════════════╡
│ 2021-01-01 ┆ null      ┆ A       ┆ 0.0      ┆ 0.0       ┆ null     ┆ null      ┆ null       │
│ 2021-01-02 ┆ -0.055046 ┆ A       ┆ 0.0      ┆ -0.055046 ┆ null     ┆ null      ┆ null       │
│ 2021-01-03 ┆ 0.019417  ┆ A       ┆ 0.019417 ┆ 0.0       ┆ 0.006472 ┆ -0.018349 ┆ -54.5      │
│ 2021-01-04 ┆ -0.038095 ┆ A       ┆ 0.0      ┆ -0.038095 ┆ 0.006472 ┆ -0.031047 ┆ -26.338197 │
│ 2021-01-05 ┆ 0.059406  ┆ A       ┆ 0.059406 ┆ 0.0       ┆ 0.026274 ┆ -0.012698 ┆ 193.535335 │
└────────────┴───────────┴─────────┴──────────┴───────────┴──────────┴───────────┴────────────┘

Python相关问答推荐

大Pandas 胚胎中产生组合

如何使用pandasDataFrames和scipy高度优化相关性计算

如何将双框框列中的成对变成两个新列

从dict的列中分钟

如何在虚拟Python环境中运行Python程序?

如何从在虚拟Python环境中运行的脚本中运行需要宿主Python环境的Shell脚本?

Python,Fitting into a System of Equations

我如何根据前一个连续数字改变一串数字?

如何在UserSerializer中添加显式字段?

当我try 在django中更新模型时,模型表单数据不可见

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

在Python中调用变量(特别是Tkinter)

当单元测试失败时,是否有一个惯例会抛出许多类似的错误消息?

将链中的矩阵乘法应用于多组值

如何获得3D点的平移和旋转,给定的点已经旋转?

当输入是字典时,`pandas. concat`如何工作?

如何在Django模板中显示串行化器错误

Matplotlib中的曲线箭头样式

是否需要依赖反转来确保呼叫方和被呼叫方之间的分离?

Groupby并在组内比较单独行上的两个时间戳