我还定义了一个名为info
的列:
| Timestamp | info |
+-------------------+----------+
|2016-01-01 17:54:30| 0 |
|2016-02-01 12:16:18| 0 |
|2016-03-01 12:17:57| 0 |
|2016-04-01 10:05:21| 0 |
|2016-05-11 18:58:25| 1 |
|2016-06-11 11:18:29| 1 |
|2016-07-01 12:05:21| 0 |
|2016-08-11 11:58:25| 0 |
|2016-09-11 15:18:29| 1 |
我想计算连续出现的1,否则插入0.最后一列是:
--------------------+----------+----------+
| Timestamp | info | res |
+-------------------+----------+----------+
|2016-01-01 17:54:30| 0 | 0 |
|2016-02-01 12:16:18| 0 | 0 |
|2016-03-01 12:17:57| 0 | 0 |
|2016-04-01 10:05:21| 0 | 0 |
|2016-05-11 18:58:25| 1 | 1 |
|2016-06-11 11:18:29| 1 | 2 |
|2016-07-01 12:05:21| 0 | 0 |
|2016-08-11 11:58:25| 0 | 0 |
|2016-09-11 15:18:29| 1 | 1 |
我try 使用以下函数,但没有成功.
df_input = df_input.withColumn(
"res",
F.when(
df_input.info == F.lag(df_input.info).over(w1),
F.sum(F.lit(1)).over(w1)
).otherwise(0)
)