我有下表(我的输入只有NIF、FILE和FILESIZE列).

我想通过以下规则获得列分区和子分区:

  1. 分区应该从1开始,并且每调整SUBPARTITION达到8就递增1.
  2. 单个NIF不能位于两个不同的分区中.(这实际上是最具限制性的条件).
  3. 每当文件大小中的运行总数达到10时,SUBPARTITION必须递增1.当SUBPARTITION递增1时,必须重置该运行总计.

生效日期:

  NIF  FILE FILESIZE    PARTITION   SUBPARTITION
  ----------------------------------------------
   A     C1        1            1              1
   A     C2        1            1              1
   A     C3        2            1              1
   A     C4        1            1              1
   B     C5        5            1              2
   B     C6        1            1              2
   C     C7        2            1              2
   C     C8        1            1              2
   D     C9        4            1              3
   D    C10        5            1              3
   D    C11        1            1              3
   D    C12        2            1              4
   D    C13        3            1              4
   D    C14        4            1              4
   D    C15        5            1              5
   E    C16        3            1              6
   E    C17        2            1              6
   E    C18        3            1              6
   E    C19        4            1              7
   F    C20        6            2              1
   F    C20        2            2              1

它不需要是单个查询,执行PSLQL游标也可以.

注意:这是创建输入表的代码.

DROP TABLE my_table;
-- Create the table
CREATE TABLE my_table (
  NIF VARCHAR2(10),
  FILE_ VARCHAR2(10),
  FILESIZE NUMBER
);

-- Insert the input data
INSERT INTO my_table (NIF, FILE_, FILESIZE)
    SELECT 'A', 'C1', 1 FROM DUAL UNION ALL
    SELECT 'A', 'C2', 1 FROM DUAL UNION ALL
    SELECT 'A', 'C3', 2 FROM DUAL UNION ALL
    SELECT 'A', 'C4', 1 FROM DUAL UNION ALL
    SELECT 'B', 'C5', 5 FROM DUAL UNION ALL
    SELECT 'B', 'C6', 1 FROM DUAL UNION ALL
    SELECT 'C', 'C7', 2 FROM DUAL UNION ALL
    SELECT 'C', 'C8', 1 FROM DUAL UNION ALL
    SELECT 'D', 'C9', 4 FROM DUAL UNION ALL
    SELECT 'D', 'C10', 5 FROM DUAL UNION ALL
    SELECT 'D', 'C11', 1 FROM DUAL UNION ALL
    SELECT 'D', 'C12', 2 FROM DUAL UNION ALL
    SELECT 'D', 'C13', 3 FROM DUAL UNION ALL
    SELECT 'D', 'C14', 4 FROM DUAL UNION ALL
    SELECT 'D', 'C15', 5 FROM DUAL UNION ALL
    SELECT 'E', 'C16', 3 FROM DUAL UNION ALL
    SELECT 'E', 'C17', 2 FROM DUAL UNION ALL
    SELECT 'E', 'C18', 3 FROM DUAL UNION ALL
    SELECT 'E', 'C19', 4 FROM DUAL UNION ALL
    SELECT 'F', 'C20', 6 FROM DUAL UNION ALL
    SELECT 'F', 'C21', 2 FROM DUAL;
    
COMMIT;

推荐答案

在Oracle 12中,您可以结合使用分析函数和MATCH_RECOGNIZE来执行逐行处理:

SELECT cto,
       nif,
       file_,
       partition,
       mn - first_mn + 1 AS subpartition,
       total_size
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER (
           PARTITION BY cto
           ORDER BY SUBSTR(file_, 1, 1), TO_NUMBER(SUBSTR(file_, 2))
         ) AS rn,
         SUM(filesize) OVER (PARTITION BY nif) AS nif_size
  FROM   my_table t
)
MATCH_RECOGNIZE(
  PARTITION BY cto
  ORDER BY rn
  MEASURES
    MATCH_NUMBER() AS mn,
    SUM(filesize) AS total_size
  ALL ROWS PER MATCH
  PATTERN ( (same_nif* nif)* same_nif* end_nif )
  DEFINE
    same_nif AS nif = NEXT(nif),
    nif      AS  (NEXT(nif) IS NULL OR nif != NEXT(nif))
             AND SUM(filesize) + NEXT(nif_size) <= 10,
    end_nif  AS  SUM(filesize) <= 10
)
MATCH_RECOGNIZE(
  PARTITION BY cto
  ORDER BY rn
  MEASURES
    MATCH_NUMBER() AS partition,
    FIRST(mn) AS first_mn
  ALL ROWS PER MATCH
  PATTERN ( (same_nif* nif)+ )
  DEFINE
    same_nif AS nif = NEXT(nif),
    nif      AS (NEXT(nif) IS NULL OR nif != NEXT(nif))
             AND mn - FIRST(mn) < 7
);

在第一次匹配中,对子分区,模式:

  • same_nif* nif将从当前行匹配到当前NIF分区的末尾,并通过判断当前大小加上下一个NIF分区的总大小是否在限制内来确保下一个NIF分区也将完全包含在匹配中;
  • 将其包装在()*中允许零次或多次匹配(尽可能多);
  • 最后的same_nif* end_nif允许匹配部分匹配具有相同nif值的一组行(例如文件C9、C10和C11,其具有与C12、C13和C14相同的nif),但由于它们的文件大小总和超过限制,它们将处于不同的组中.如果没有不同的结束条件,您可能会将C1分组到C5,或者无法拆分Dnif的值.

在第二个匹配中,对于分区,nif个值需要保持在同一匹配中(而对于子分区,nif个值可以跨越多个匹配),除此之外,唯一需要判断的是一个组中没有超过7个子分区.

对于样本数据(其具有添加的CTO列和附加的nifG,其太大而不能与先前的nifF组合在同一分区中):

CREATE TABLE my_table (
  CTO NUMBER,
  NIF VARCHAR2(10),
  FILE_ VARCHAR2(10),
  FILESIZE NUMBER
);

INSERT INTO my_table (CTO, NIF, FILE_, FILESIZE)
    SELECT 1, 'A', 'C1', 1 FROM DUAL UNION ALL
    SELECT 1, 'A', 'C2', 1 FROM DUAL UNION ALL
    SELECT 1, 'A', 'C3', 2 FROM DUAL UNION ALL
    SELECT 1, 'A', 'C4', 1 FROM DUAL UNION ALL
    SELECT 1, 'B', 'C5', 5 FROM DUAL UNION ALL
    SELECT 1, 'B', 'C6', 1 FROM DUAL UNION ALL
    SELECT 1, 'C', 'C7', 2 FROM DUAL UNION ALL
    SELECT 1, 'C', 'C8', 1 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C9', 4 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C10', 5 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C11', 1 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C12', 2 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C13', 3 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C14', 4 FROM DUAL UNION ALL
    SELECT 1, 'D', 'C15', 5 FROM DUAL UNION ALL
    SELECT 1, 'E', 'C16', 3 FROM DUAL UNION ALL
    SELECT 1, 'E', 'C17', 2 FROM DUAL UNION ALL
    SELECT 1, 'E', 'C18', 3 FROM DUAL UNION ALL
    SELECT 1, 'E', 'C19', 4 FROM DUAL UNION ALL
    SELECT 1, 'F', 'C20', 6 FROM DUAL UNION ALL
    SELECT 1, 'F', 'C21', 2 FROM DUAL UNION ALL
    SELECT 1, 'G', 'C' || (LEVEL + 21), 10 FROM DUAL CONNECT BY LEVEL <= 7;

输出:

CTO NIF FILE_ PARTITION SUBPARTITION TOTAL_SIZE
1 A C1 1 1 1
1 A C2 1 1 2
1 A C3 1 1 4
1 A C4 1 1 5
1 B C5 1 2 5
1 B C6 1 2 6
1 C C7 1 2 8
1 C C8 1 2 9
1 D C9 1 3 4
1 D C10 1 3 9
1 D C11 1 3 10
1 D C12 1 4 2
1 D C13 1 4 5
1 D C14 1 4 9
1 D C15 1 5 5
1 E C16 1 6 3
1 E C17 1 6 5
1 E C18 1 6 8
1 E C19 1 7 4
1 F C20 2 1 6
1 F C21 2 1 8
1 G C22 3 1 10
1 G C23 3 2 10
1 G C24 3 3 10
1 G C25 3 4 10
1 G C26 3 5 10
1 G C27 3 6 10
1 G C28 3 7 10

fiddle

Sql相关问答推荐

Snowflake SQL比较克隆x原始计数

PostgreSQL:获取每家店铺收入最高的员工

Trino/Presto sq:仅当空值位于组中第一个非空值之后时,才用值替换空值

Stack Exchange站点上的最短帖子(按正文长度计算,用户名为原始发帖(SEDE))

SQL:如何将相应位置的两个数组中的元素组合在一起

UPDATE查询中的乐观锁

我可以在SQL的IN子句中使用比子查询包含的值更少的值吗?

为什么我的SQL标量函数有时会抛出";子查询返回多个值.这是不允许的.

SQL将三个表中的三列组合为一列

将二维数组的第一个和第二个元素取消嵌套到两个一维数组中

YEAR 函数仍然不可SARGable 吗?

AdventureWorks 查询

SQL 查询是否返回列表中仅包含某些值而不包含其他值的行?

如何使用 Google BigQuery 中的条件根据特定列值连接列的 N 行?

Postgres如何在一个日历周中前进和回填值

在Snowflake中如何使用SQL对版本字符串进行排序?

BigQuery Pivot 遗漏行

过滤具有一对多关系的两个表之间的数据

PostgresQL-根据另一列找到 3 个最低值

在 Microsoft SQL Server 中,如何只为特定值保留不同的行?