我有以下SQL查询,它们应该写入类似的结果WRT偏移量列,但没有.
问题1:一个问题:
SELECT visitor_id, array_agg(timestamp) as time, array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
问题2:你是什么意思?
SELECT visitor_id ,array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
这里的uri_args只是一个json文件,在关键字‘Offset’下包含特定API响应的偏移量的值.这来自服务器的响应日志(log)表.
尽管这两个查询很相似,并且我认为应该在OFFSET_LIST列中返回相同的值,但我发现了以下差异:
为了清楚地表达这一点,我将考虑一个特定的访问者ID,对于访问者id=‘12345’查询,我在Offset_List列中返回以下行
[0, 0, 0, 10, 0, 10, 20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 57, 77, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41]
对于查询2,输出如下:
[20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 77, 57, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41, 0, 0, 0, 10, 0, 10]
我可以观察到这两者是彼此的循环排列,但我看不出为什么会发生这种情况.请帮助我了解每个查询的内部工作原理有什么不同.第一个回复符合我的任务意图,即在主页上捕捉访客的旅程.