我们有一个长期存在的问题,在我们的数据库中存在某个死锁.相互冲突的查询几乎总是这两个:
UPDATE db.tbl
SET pid = 68111
WHERE pid IS NULL
AND processed IS NULL
AND `time` <= now()
LIMIT 10
和
UPDATE `db`.`tbl`
SET `processed` = -1,
`updated_on` = NOW()
WHERE (`tbl`.`id` = 108588129)
The pid
, id
和 processed
values vary, but the queries are the same.
What I don't underst和 is what exactly could deadlock there, since the second query updates a specific record by its ID. There's just one lock to be had, but you need at least 2 to create a deadlock. There are also no transactions that would lock other rows - each of these queries is st和alone.
Why is this happening 和 how could I avoid it?
匿名/缩短show engine status innodb;
的输出:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-06-08 08:16:27 0x7f70f3ef1700
*** (1) TRANSACTION:
TRANSACTION 2857804352, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2480 lock struct(s), heap size 286928, 9581 row lock(s)
MySQL thread id 25966272, OS thread h和le 140144661681920, query id 847014117 x.x.x.x db_user updating
UPDATE db.tbl SET pid = ''68111'' WHERE pid IS NULL AND processed IS NULL AND `time` <= now() LIMIT 10
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804352 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cf9b; asc x ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cfad; asc x ;;
<Snip a lot of Record Locks just like the one above>
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804352 lock_mode X locks rec but not gap waiting
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) TRANSACTION:
TRANSACTION 2857804381, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 25966282, OS thread h和le 140122483259136, query id 847014368 x.x.x.x other_db_user updating
UPDATE `db`.`tbl` SET `processed` = ''-1'', `updated_on` = NOW() WHERE (`tbl`.`id` = 108588129)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap waiting
Record lock, heap no 1462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678ec61; asc x a;;
*** WE ROLL BACK TRANSACTION (2)
此外,还有一个简化表定义:
CREATE TABLE `tbl` (
`id` int NOT NULL AUTO_INCREMENT,
`pid` int DEFAULT NULL,
`processed` tinyint DEFAULT NULL,
`time` datetime DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`other` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_other` (`other`),
KEY `ix_processed` (`processed`),
KEY `ix_time` (`time`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
我还应该注意到,有很多行(当前超过5000行,但这个数字在一天中不断变化)有processed=null, pid=null
行,但将来有time
行.
Added:根据 comments ,有更多的背景信息:
There are currently a little over 5 million rows in this table. It's a sort of a queue 和 we get roughly 1.4M rows per month. The earliest rows are in February. When time
is in the future, it's almost always in the same day.
pid
和 processed
are related to the queue. When a row (job) is inserted, both pid
和 processed
are null
. time
is when the job should be performed (some jobs need to be performed only AFTER a specified moment in time; others need to be performed ASAP).
When one of the processors is ready to execute some jobs, it starts by running the first query 和 "reserving" a batch of 10 jobs for itself by settings its PID (Linux process ID) to those rows. It then selects those rows 和 works through them one-by-one, setting the processed
flag when it's done (the second query).
However sometimes those two queries clash 和 we get a deadlock.