如何使用两个聚合函数(例如Double string_agg或只有sum),但确保结果不包含由另一个聚合函数(由Second Join导致)引起的重复项?我使用的是PostgreSQL.

示例

我有三张桌子:

create table boxes
(
    id   bigserial primary key,
    name varchar(255)
);

create table animals
(
    id     bigserial primary key,
    name   varchar(255),
    age    numeric,
    box_id bigint constraint animals_boxes_id references boxes
);

create table vegetables
(
    id     bigserial primary key,
    name   varchar(255),
    weight numeric,
    box_id bigint constraint vegatables_box_id references boxes
);

一些输入数据:

insert into boxes (name) values ('First box');
insert into animals (box_id, name, age) values (1, 'Cat', 2);
insert into animals (box_id, name, age) values (1, 'Cat', 3);
insert into animals (box_id, name, age) values (1, 'Dog', 5);
insert into vegetables (box_id, name, weight) values (1, 'Tomato', 20);
insert into vegetables (box_id, name, weight) values (1, 'Cucumber', 30);
insert into vegetables (box_id, name, weight) values (1, 'Potato', 50);

我想把动物的名字放在盒子里:

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names
from boxes as b
         left join animals a on b.id = a.box_id
group by b.name;

它是有效的:

box_name animal_names
First box Cat, Cat, Dog

但我也想知道蔬菜的名字.但它是doesn't work:

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names,
       string_agg(v.name, ', ' order by v.id) as vegatable_names
from boxes as b
         left join animals a on b.id = a.box_id
         left join vegetables v on b.id = v.box_id
group by b.name;

它会产生动物名称和蔬菜名称的重复:

box_name animal_names vegatable_names
First box Cat, Cat, Cat, Cat, Cat, Cat, Dog, Dog, Dog Tomato, Tomato, Tomato, Cucumber, Cucumber, Cucumber, Potato, Potato, Potato

结果应该是:

box_name animal_names vegatable_names
First box Cat, Cat, Dog Tomato, Cucumber, Potato

我不能简单地添加distinct来删除重复项,因为:

  • 表中的名称可以重复(名称为Cat的两只动物).如果我使用distinct,它将生成Cat, Dog而不是Cat, Cat, Dog.
  • 我在string_agg中用order by(distinct加起来就是ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list).即使我移走order by(string_agg(distinct a.name, ', ')),我也不能使用它,因为第一点.

更多信息

它适用于所有聚合函数:string_aggarray_aggjson_object_agg甚至sum.

动物年龄总和:

select sum(a.age)
from boxes as b
         left join animals a on b.id = a.box_id
         -- left join vegetables v on b.id = v.box_id
group by b.name;

在没有第二次联接的情况下,它计算正确(10),但由于重复,计算错误(30).

推荐答案

这里解释了基本问题:

For a small selection, aggregation per row is typically faster.
With LATERAL subqueries (more versatile):

SELECT b.name AS box_name, a.*, v.*
FROM   boxes b
LEFT  JOIN LATERAL (
   SELECT string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
   FROM   animals a
   WHERE  a.box_id = b.id
   ) a ON true
LEFT  JOIN LATERAL (
   SELECT string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
   FROM   vegetables v
   WHERE  v.box_id = b.id
   ) v ON true;

或者使用correlated subqueries(更简单,通常更快):

SELECT b.name AS box_name
    , (SELECT string_agg(a.name, ', ' ORDER BY a.id)
       FROM   animals a
       WHERE  a.box_id = b.id)  AS animal_names
    , (SELECT string_agg(v.name, ', ' ORDER BY v.id)
       FROM   vegetables v
       WHERE  v.box_id = b.id) AS vegetable_names
FROM   boxes b;

请参见:

在聚合整个表时,这样做速度更快:

SELECT b.name AS box_name, a.animal_names, v.vegetable_names
FROM   boxes b
LEFT   JOIN (
   SELECT box_id, string_agg(a.name, ', ') AS animal_names   
   FROM  (
      SELECT box_id, id, name
      FROM   animals a
      ORDER  BY box_id, id
      ) a
   GROUP  BY 1
   ) a ON a.box_id = b.id
LEFT   JOIN (
   SELECT box_id, string_agg(v.name, ', ') AS vegetable_names 
   FROM  (
      SELECT box_id, id, name
      FROM   vegetables v
      ORDER  BY box_id, id
      ) v
   GROUP  BY 1   
   ) v ON v.box_id = b.id;

fiddle

请注意我是如何对子查询进行排序的,这通常比按聚合排序要快.可选的优化.

旁白:在您的测试设置中约有varchar(255)个:

Sql相关问答推荐

基于时间的SQL聚合

如何在SQL Server中列出从当前月份开始的过go 10年中的月份

PostgreSQL 9.6嵌套的INSERT/RETURN语句的CTE性能低得令人无法接受

返回找到的最小和最大row_number()spark SQL

对多个条件的SQL进行排名

如何用客户名称计算sum(dr)和sum(cr)

Redshift PL/pgSQL循环中的参数化列名

具有多个条件的SQL否定

重用传递给 node 的参数-postgres upsert查询

合并分层表SQL中的第一个非空、变化的空位置

根据是否出现过零来筛选数据(跨多行)

如何在连接中使用三个不同的列,从而在PostgreSQL中只获得两个列?

在SQL中将项分配给容器

查询中获取审批者不起作用

批量更改WooCommerce中所有产品的税收状态

SQL:如何从时间戳数据生成时间序列并计算不同事件类型的累计总和?

MIN MAX 值与条件绑定

使用 SAVE TRANSACTION 时 BEGIN 和 COMMIT 语句的数量不匹配

snowflake插入覆盖行为

从多个连接返回 1 行到同一个表 - SQL Server