我在S3中保存了类似的JSON数据.我用ATHENA来写select语句.

  {
   "sample_data":{
      "people":[
         {
            "firstName":"Emily",
            "address":{
               "streetAddress":"101",
               "city":"abc",
               "state":"",
               "phoneNumbers":[
                  {
                     "type":"home",
                     "number":"3"
                  },
                  {
                     "type":"city",
                     "number":"4"
                  }
               ]
            }
         },
          {
            "firstName":"Smily",
            "address":{
               "streetAddress":"102",
               "city":"def",
               "state":"",
               "phoneNumbers":[
                  {
                     "type":"home",
                     "number":"1"
                  },
                  {
                     "type":"city",
                     "number":"1"
                  }
               ]
            }
         }
      ]
   }
}

如何编写select语句来 Select streetaddresscity,其中home>2city=4

我尽了最大的努力,但没用.

预期输出:

streetAddress  city
101            abc   

try 了这个不耐烦的方法,但它将电话号码提取到了多行.所以不能

SELECT  idx,JSON_EXTRACT_SCALAR(x.n, '$.address.streetaddress') as streetaddress,
JSON_EXTRACT_SCALAR(x.n, '$.address.city') as city, JSON_EXTRACT_SCALAR(x.m, '$.type') as type, JSON_EXTRACT_SCALAR(x.m, '$.number')  as value
  FROM sample_data1 cross join
  UNNEST (CAST(JSON_EXTRACT(sample_data,'$.people') AS ARRAY<JSON>)) AS x(n)
  CROSS JOIN
  UNNEST (CAST(JSON_EXTRACT(x.n,'$.address.phonenumbers') AS ARRAY<JSON>))  WITH ordinality AS x(m,idx) ;

推荐答案

unnest将数据展平为多行,因此您可以在不使用数组函数的情况下处理array.雅典娜目前使用的Presto版本不支持any_match,因此需要使用cardinality+filter组合(并且不支持通过json路径过滤):

-- sample data
WITH dataset (json_str) AS (
    VALUES (
            json '{
            "firstName":"Emily",
            "address":{
               "streetAddress":"101",
               "city":"abc",
               "state":"",
               "phoneNumbers":[
                  {
                     "type":"home",
                     "number":"11"
                  },
                  {
                     "type":"city",
                     "number":"4"
                  }
               ]
            }
         }'
        ),
        (
            json '{
            "firstName":"Smily",
            "address":{
               "streetAddress":"102",
               "city":"def",
               "state":"",
               "phoneNumbers":[
                  {
                     "type":"home",
                     "number":"1"
                  },
                  {
                     "type":"city",
                     "number":"1"
                  }
               ]
            }
         }'
        )
) -- query
select street_address,
    city
from (
        select JSON_EXTRACT_SCALAR(json_str, '$.address.streetAddress') as street_address,
            JSON_EXTRACT_SCALAR(json_str, '$.address.city') as city,
            cast(
                JSON_EXTRACT(json_str, '$.address.phoneNumbers') as array(json)
            ) phones
        from dataset
    )
where cardinality(
        filter(
            phones,
            js->json_extract_scalar(js, '$.type') = 'home'
                and try_cast(json_extract_scalar(js, '$.number') as integer) > 2
        )
    ) > 0 -- check for home
    and
    cardinality(
        filter(
            phones,
            js->json_extract_scalar(js, '$.type') = 'city'
                and json_extract_scalar(js, '$.number') = '4'
        )
    ) > 0 -- check for city

输出:

street_address city
101 abc

Sql相关问答推荐

PostgreSQL:获取每家店铺收入最高的员工

SQL—如何在搜索的元素之后和之前获取元素?

将主表与历史表连接以获取主表的当前汇率以及历史表中的上一个和最后一个汇率

如何在presto/SQL中使用两个数组列创建(分解)单独的行

查找表中特定值的上次更新日期

两个不同星期的销售额,不加成一行

通过对象分离实现原子性

在Power Bi中将SQL代码转换为DAX

如何使子查询在UPDATE语句期间获得最新更新

显示所有组并计算给定组中的项目(包括 0 个结果)

SQL SUM Filter逻辑解释

基于变量的条件 WHERE 子句

有没有办法在雅典娜中将字符串转换为 int ?

Postgres存在限制问题「小值」

joins 组合多个重复数据删除策略

SQL:如何从时间戳数据生成时间序列并计算不同事件类型的累计总和?

如何对 SQL 表中的连续时间戳进行分组?

Athena:从字符串birth_dt列计算年龄

从 JSON 数组中移除对象

如何优化sql请求?