我有一个这样的数据帧
Column A | Column B |
---|---|
Hello | [{id: 1000, abbreviatedId: 1, name: “John", planet: “Earth”, solarsystem: “Milky Way”, universe: “this one”, continent: {id: 33, country: “China", Capital: “Bejing”}, otherId: 400, language: “Cantonese”, species: 23409, creature: “Human”}] |
Bye | [{id: 2000, abbreviatedId: 2, name: “James", planet: “Earth”, solarsystem: “Milky Way”, universe: “this one”, continent: {id: 33, country: “Russia", Capital: “Moscow”}, otherId: 500, language: “Russian”, species: 12308, creature: “Human”}] |
在写入外部位置之前,如何遍历数据帧的各行以删除所有包含country: "China"
的行?
我试过了
if df.select(array_contains(col("columnb.continent.country"), "China")) != True:
df.write.format("delta").mode("overwrite").save("file://path/")
和
for row in df.rdd.collect():
if df.select(array_contains(col("columnb.continent.country"), "China")) != True:
df.drop(row)
df.write.format("delta").mode("overwrite").save("file://path/")