这是我的数据.

username building action timestamp
user-1 building-1 IN 2024-04-10 01:00:00.000
user-1 building-1 OUT 2024-04-10 02:00:00.000
user-1 building-1 IN 2024-04-10 02:30:00.000
user-1 building-1 OUT 2024-04-10 04:00:00.000
user-1 building-1 IN 2024-04-11 10:00:00.000
user-1 building-1 OUT 2024-04-11 11:00:00.000
user-2 building-2 IN 2024-04-10 08:00:00.000
user-2 building-2 OUT 2024-04-10 09:00:00.000
user-2 building-3 OUT 2024-04-11 02:30:00.000
user-2 building-4 IN 2024-04-11 04:00:00.000
user-2 building-1 IN 2024-04-12 10:00:00.000
user-2 building-1 OUT 2024-04-12 11:00:00.000

我需要计算刷卡和刷卡时间.有时我们收到部分记录,我们还需要包括这些记录.

我的预期输出是:

enter image description here

随后使用leadlag等窗口功能.使用这些函数后,我无法继续获取此输出.

附表数据:

select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-10 01:00:00'::timestamp as timestamp
union all 
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-10 02:00:00'::timestamp as timestamp
union all 
select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-10 02:30:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-10 04:00:00'::timestamp as timestamp
union all 
select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-11 10:00:00'::timestamp as timestamp
union all 
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-11 11:00:00'::timestamp as timestamp
union all 

select 'user-2' as username,'building-2' as building,'IN' as action,'2024-04-10 08:00:00'::timestamp as timestamp
union all 
select 'user-2' as username,'building-2' as building,'OUT' as action,'2024-04-10 09:00:00'::timestamp as timestamp
union all 
select 'user-2' as username,'building-3' as building,'OUT' as action,'2024-04-11 02:30:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-4' as building,'IN' as action,'2024-04-11 04:00:00'::timestamp as timestamp
union all 
select 'user-2' as username,'building-1' as building,'IN' as action,'2024-04-12 10:00:00'::timestamp as timestamp
union all 
select 'user-2' as username,'building-1' as building,'OUT' as action,'2024-04-12 11:00:00'::timestamp as timestamp

)

select * from _data

推荐答案

您可以使用lead/lag以及一些Case/WHEN/End来实现这一点.

Schema (PostgreSQL v15)

CREATE TABLE swipes (
    "username"  VARCHAR(6),
    "building"  VARCHAR(10),
    "action"    VARCHAR(3),
    "timestamp" TIMESTAMP
);

INSERT INTO swipes
    ("username", "building", "action", "timestamp")
VALUES ('user-1', 'building-1', 'IN', '2024-04-10 01:00:00.000')
     , ('user-1', 'building-1', 'OUT', '2024-04-10 02:00:00.000')
     , ('user-1', 'building-1', 'IN', '2024-04-10 02:30:00.000')
     , ('user-1', 'building-1', 'OUT', '2024-04-10 04:00:00.000')
     , ('user-1', 'building-1', 'IN', '2024-04-11 10:00:00.000')
     , ('user-1', 'building-1', 'OUT', '2024-04-11 11:00:00.000')
     , ('user-2', 'building-2', 'IN', '2024-04-10 08:00:00.000')
     , ('user-2', 'building-2', 'OUT', '2024-04-10 09:00:00.000')
     , ('user-2', 'building-3', 'OUT', '2024-04-11 02:30:00.000')
     , ('user-2', 'building-4', 'IN', '2024-04-11 04:00:00.000')
     , ('user-2', 'building-1', 'IN', '2024-04-12 10:00:00.000')
     , ('user-2', 'building-1', 'OUT', '2024-04-12 11:00:00.000')
     -- added those rows to showcase the results when the same action happens twice in a row
     , ('user-3', 'building-1', 'IN', '2024-04-12 11:00:00.000')
     , ('user-3', 'building-1', 'IN', '2024-04-12 12:00:00.000')
     , ('user-3', 'building-1', 'OUT', '2024-04-12 13:00:00.000')
;

Query #1

-- Since the information may be partial we can't simply look at all "in"s or "out"s,
-- so we look at all the rows & end up with everything twice, hence the DISTINCT.
SELECT DISTINCT
       username
     , building
     , CASE
           WHEN action = 'IN' THEN timestamp
           -- To find the entry time corresponding to this "out" row, we'll look at the previous row.
           -- If it's an "in", then we keep the timestamp.
           -- If it's an "out", then there were 2 "outs" in a row, so the "in" value is set to null (see user-3's rows for an example of this).
           WHEN action = 'OUT' THEN
               CASE WHEN LAG(action) OVER user_actions_per_building_by_timestamp = 'IN'
                   THEN LAG(timestamp) OVER user_actions_per_building_by_timestamp
               END
       END AS "in"
     , CASE
           -- same concept as above, with IN/OUT & LAG/LEAD flipped.
           WHEN action = 'IN' THEN
               CASE WHEN LEAD(action) OVER user_actions_per_building_by_timestamp = 'OUT'
                   THEN LEAD(timestamp) OVER user_actions_per_building_by_timestamp
               END
           WHEN action = 'OUT' THEN timestamp
    END AS "out"
FROM swipes
WINDOW user_actions_per_building_by_timestamp AS (PARTITION BY username, building ORDER BY timestamp)
ORDER BY username, building, "in", "out";
username building in out
user-1 building-1 2024-04-10T01:00:00.000Z 2024-04-10T02:00:00.000Z
user-1 building-1 2024-04-10T02:30:00.000Z 2024-04-10T04:00:00.000Z
user-1 building-1 2024-04-11T10:00:00.000Z 2024-04-11T11:00:00.000Z
user-2 building-1 2024-04-12T10:00:00.000Z 2024-04-12T11:00:00.000Z
user-2 building-2 2024-04-10T08:00:00.000Z 2024-04-10T09:00:00.000Z
user-2 building-3 2024-04-11T02:30:00.000Z
user-2 building-4 2024-04-11T04:00:00.000Z
user-3 building-1 2024-04-12T11:00:00.000Z
user-3 building-1 2024-04-12T12:00:00.000Z 2024-04-12T13:00:00.000Z

View on DB Fiddle


另一种方法可以是使用循环CTE,如果条件变得更加复杂,它可能会更容易理解(而且它的性能也会有所不同,尽管您需要对照一些实际数据判断这一点).

Sql相关问答推荐

如何计算帐户在SQL中随着时间的推移购买的SKU数量?

使用自动增量ID插入失败(无法将值空插入列ID)

Lag()函数的差异:R与SQL(将R代码转换为SQL)

如果多行科目有一行在指定的日期范围内,如何 Select 该科目在该日期之前的所有行?

具有多个条件的SQL否定

无法将发票与产品价格相关联

合并分层表SQL中的第一个非空、变化的空位置

表函数的作用域和功能

在子窗口SQL Presto中使用特定条件执行值计数

两个月之间的WHERE CASE WHEN-ORA-00905:缺少关键字

向表中添加新列取决于表的日期列(unpivot)

一次 Select 语句中按组累计的SQL累计数

如何使用Informix创建一个临时表,将数据从根表导入并使用筛选条件

更新表 A,然后将新值插入表 B(包含更新内容的历史日志(log))

String_Split 多列

MIN MAX 值与条件绑定

如何通过子 Select 在一次更新(并行数组)中多次更新相同的行

Athena:从字符串birth_dt列计算年龄

将单行中的多个行值转换为列

REGEXP 用于字符串格式化以对用空格分隔的字符和数字进行分组