我喜欢@erwin brandstetter的解决方案,但想展示一个包含USING
关键字的解决方案:
DELETE FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
如果要在删除记录之前查看记录,只需将DELETE
替换为SELECT *
,将USING
替换为逗号,
即可.
SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
更新:为了提高速度,我在这里测试了一些不同的解决方案.如果您不希望有太多重复项,那么这个解决方案的性能要比那些有NOT IN (...)
子句的解决方案好得多,因为它们会在子查询中生成很多行.
如果您将查询重写为使用IN (...)
,那么它的性能与这里介绍的解决方案类似,但SQL代码变得不那么简洁.
更新2:如果你在其中一个关键列中有NULL
个值(你真的不应该这么做),那么你可以在该列的条件中使用COALESCE()
,例如.
AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')