在Snowflake中给出此示例表:

CREATE OR REPLACE TABLE vnt
(src variant)
AS SELECT parse_json(column1) as src
FROM values
('{"a": 1,"b": 2,"c": 3}'),
('{"a": 1,"b": 2,"c": 3,"d": 4}');

select * from vnt;

我想输出一个包含两行的表,例如

a b c d
1 2 3 NULL
1 2 3 4

这意味着我希望将JSON数据展平为列,而不是行. 我试着在雪地公园把它压平,但我对枢轴有问题,因为它不起作用.既然密钥可以动态变化,我该如何处理呢?

import snowflake.snowpark as snowpark

def main(session: snowpark.Session): 
    df = session.sql("select * from vnt")
    df = df.join_table_function("flatten", df["SRC"]) \
            .drop(["SEQ", "SRC", "PATH", "INDEX", "THIS"])
    df = df.pivot("VALUE",['a','b','c','d']).min("KEY")

    # Return value will appear in the Results tab.
    return df

推荐答案

可以通过生成列列表进行透视来实现:

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import any_value

def main(session: snowpark.Session): 
    df = session.table("vnt")
    df = df.join_table_function("flatten", df["SRC"]).drop(["SEQ", "PATH", "INDEX", "THIS"])
    cols = [c[0] for c in df.group_by("key").agg(any_value("key")).collect()]
    df = df.pivot("key", cols).min("value")
    return df

输出:

enter image description here


EDIT:

为了删除旋转列周围的'个,您需要按照文档的说明显式地为列设置别名:

PIVOT

If you prefer the column names without quotes,或如果您希望输出的列名与输入的列名不同,则为you can include the column names in the AS clause, as shown below:

SELECT EMPID AS EMP_ID, "'JAN'" AS JANUARY, "'FEB'" AS FEBRUARY,
    "'MAR'" AS MARCH, "'APR'" AS APRIL
FROM monthly_sales
PIVOT(sum(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR')) AS p
ORDER BY EMPID;

使用Snowpark for Python:

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import any_value, col

def main(session: snowpark.Session): 
    df = session.table("vnt")
    df = df.join_table_function("flatten", df["SRC"]).drop(["SEQ", "PATH", "INDEX", "THIS"])
    cols = [c[0] for c in df.group_by("key").agg(any_value("key")).collect()]
    cols_alias = [col("'" + c + "'").alias(c) for c in cols]
    df = df.pivot("key", cols).min("value").select(cols_alias)
    return df

输出:

enter image description here

Python相关问答推荐

手动为pandas中的列上色

Python中的锁定类和线程以实现dict移动

解析讨论论坛只给我第一个用户 comments ,但没有给我其他用户回复

如何在Python中按组应用简单的线性回归?

Polars Dataframe:如何按组删除交替行?

绘制系列时如何反转轴?

基本链合同的地址是如何计算的?

按照行主要蛇扫描顺序对点列表进行排序

如何通过多2多字段过滤查询集

Python中MongoDB的BSON时间戳

将HLS纳入媒体包

线性模型PanelOLS和statmodels OLS之间的区别

根据不同列的值在收件箱中移动数据

如何使用Python将工作表从一个Excel工作簿复制粘贴到另一个工作簿?

为什么符号没有按顺序添加?

无法通过python-jira访问jira工作日志(log)中的 comments

如何使用pytest来查看Python中是否存在class attribution属性?

有没有一种方法可以从python的pussompy比较结果中提取文本?

使用NeuralProphet绘制置信区间时出错

当点击tkinter菜单而不是菜单选项时,如何执行命令?