您的问题是,在找到与NOT EXISTS
匹配的10000
行之前,它需要判断的客户行数每批都在增加.
匹配行的比率将稳步下降,直到最后一批您扫描全部1.98亿行以找到最后10,000行.
你在做200个批次.平均每个批次读取Customers
行中的1亿行(最早的批次要少得多,后面的批次更多)-仅从该表读取的总行就达到200亿行,在CustomerInvoices
行中读取的行数也差不多.
如果执行计划是连续扫描,那么很可能每个批次都会判断所有已经处理过的批次,并在最终到达感兴趣的批次之前发现不符合条件.
您可以创建具有顺序整型列的临时表...
DECLARE @LastRow INT
CREATE TABLE #DeleteCandidates(Id int PRIMARY KEY, CustomerId INT);
INSERT #DeleteCandidates
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT 0)) AS Id,
Customers.CustomerId
FROM Customers
WHERE NOT EXISTS (SELECT *
FROM CustomerInvoices
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @LastRow = @@ROWCOUNT
然后编写一些代码来处理包含Id
个范围的<batch_size>
个块的临时表.
例如,如下所示
DECLARE @BatchSize INT = 10000
DECLARE @MinId INT = 1
WHILE @MinId <= @LastRow
BEGIN
DELETE FROM Customers
WHERE Customers.CustomerId IN (SELECT dc.CustomerId
FROM #DeleteCandidates dc
WHERE dc.Id >= @MinId
AND dc.Id < @MinId + @BatchSize)
AND NOT EXISTS (SELECT *
FROM CustomerInvoices/*WITH (HOLDLOCK )*/
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @MinId = @MinId + @BatchSize
END
如果有插入,您仍然需要在实际的DELETE
上加上NOT EXISTS
,因为该标识意味着删除候选者不再符合条件.
您还可以考虑使用HOLDLOCK
提示来处理DELETE
查询本身正在运行时真正并发插入的可能性.