我想把这个json转换成一个pyspark数据帧,我已经添加了我当前的代码.
json = {
"key1": 0.75,
"values":[
{
"id": 2313,
"val1": 350,
"val2": 6000
},
{
"id": 2477,
"val1": 340,
"val2": 6500
}
]
}
my code:个 我可以使用我的代码获得预期的输出.希望有人能改进这一点.
import json
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("CreateDataFrame").getOrCreate()
json_string = json.dumps({
"key1": 0.75,
"values":[
{
"id": 2313,
"val1": 350,
"val2": 6000
},
{
"id": 2477,
"val1": 340,
"val2": 6500
}
]
})
df = spark.read.json(spark.sparkContext.parallelize([json_string]))
df = df.select("key1", "values.id", "values.val1", "values.val2")
df.show()
output个
+----+-------------+-------------+-------------+
|key1| id| val1| val2|
+----+-------------+-------------+-------------+
|0.75| [2313, 2477]| [350, 340]| [6000, 6500]|
+----+-------------+-------------+-------------+
帮助欣赏,以获得预期的输出.
Expecting output:个
+----+----+----+----+
|key1| id|val1|val2|
+----+----+----+----+
|0.75|2313| 350|6000|
|0.75|2477| 340|6500|
+----+----+----+----+