布设
df.show()
+-------------------------------------------------------------------------------------------------------------------------------------+
|col |
+-------------------------------------------------------------------------------------------------------------------------------------+
|{o*5ftr -> {fname -> fname2, lname -> lname2, city -> xyz}, h#9l00 -> {fname -> fname1, lname -> lname1, salary -> 100, city -> xyz}}|
+-------------------------------------------------------------------------------------------------------------------------------------+
代码
没有必要使用UDF,因为UDF本身就很慢.在这里,我们可以使用内置的高阶函数来实现结果.将值转换函数应用于外部映射中的每个键、值对,然后在内部映射上应用映射过滤器以删除预定义的keys
keys = ['fname', 'lname']
func = F.transform_values('col', lambda _, x: F.map_filter(x, lambda k, _: ~k.isin(keys)))
result = df.withColumn('col', func)
结果
result.show()
+-----------------------------------------------------------------+
|col |
+-----------------------------------------------------------------+
|{o*5ftr -> {city -> xyz}, h#9l00 -> {salary -> 100, city -> xyz}}|
+-----------------------------------------------------------------+