posted on Wednesday, July 25, 2007 5:43 AM
by
mz1313
Storage engine and locking - nondeterministic behavior
This post (or maybe series of 2 posts) is inspired by short discussion in one of the Israely SQL Server user group meetings; Ami Levin - SQL Server MVP - mentioned that when you query first row of the page, SQL Server goes not only to the page itself but also to the previous page because the same value can be in the end of the previous page also. Ami said that he himself had found it together with Itizk Ben-Gan after hot discussion that almost ended in a fight :-).
Background for the beginning. The problem occurs when SQL Server uses index seek. It starts from non-leaf levels of an index. Non-leaf level row contains PageID and indexed column values of the first row on the leaf-level page (I'm talking about the lowest non-leaf level row). Here is example of the data on such page (data refers to the demo below; c1 and c2 are indexed columns):
| PageID | Col1 | Col2 |
| 171 | 476 | 476 |
| 173 | 952 | 952 |
| 175 | 1428 | 1428 |
It means that Col1 value in the first row of page 171 is 476, in the first row of page 173 is 952 etc. But nobody can garantee that 952 can't be the value of Col1 also in the last row of page 171. Or in some number of last rows - nobody garanteed that index on Col1 is unique. Although all this explanation looked pretty logical, I still found it hard to believe that SQL Server behaviour can be such undeterministic - same query on the same data sometimes locks 1 row and sometimes more. So I decided to check. I created table with 2 integer columns c1 and c2, both are primary key though in fact c1 is also unique.
IF EXISTS( SELECT 1
FROM sys.databases WHERE [name] =
'LocksDemo' )
DROP DATABASE LocksDemo
GO
CREATE DATABASE LocksDemo
GO
USE [LocksDemo]
GO
CREATE TABLE Demo( c1
INT NOT NULL, c2
INT NOT NULL )
GO
ALTER TABLE Demo
ADD CONSTRAINT PK_Demo
PRIMARY KEY CLUSTERED (c1, c2)
GO
set nocount on
begin
declare @i
int
set @i = 0
while @i < 2000
begin
insert into Demo
values (@i, @i)
set @i = @i + 1
end
end
GO
My DBID is 7, objectID of Demo table is 2137058649. First of all let's see what pages does Demo table occupy.
DBCC TRACEON(3604); GO;
DBCC IND(7, 2137058649, -1);
The table contains one IAM page, 1 non-leaf level index page and 5 leaf-level clustered index pages (data pages actually).
BTW, question for storage engine geeks: data page in my example contains 476 records. Record lenght is 15 byte (for the explanation of what else besides data each row contains see this great book by Kalen Delaney). FLOOR(8060 / 15) = 537. So I counted 537 should-be-rows but you can check DBCC PAGE yourself - page contains only 476 rows. So where is the space for another 61 rows? Hint: fillfactor is not the answer.
In my case first row of page 171 (second leaf-level page) starts from row (476; 476). Le't start from second row in order to see IO and locks:
set statistics io on
begin tran
select c1, c2
from Demo
with(rowlock, xlock)
where c1 = 477
exec sp_lock
commit tran
Since shared lock on row is released immidiately after row had been read, I lock it exclusively using
xlock hint in order to be able to see the lock using
sp_lock. The result is as expected: 2 reads (one non-leaf level page and one leaf-level), 1 exclusive key lock (plus intent exclusive on page and on table). Now let's query first row of the page:
set statistics io on
begin tran
select c1, c2
from Demo
with(rowlock, xlock)
where c1 = 476
exec sp_lock
commit tran
Now we have 3 reads, still 1 key lock but 2 intent exclusive locks on pages. So it really goes to previous page. OK, let's try to lock previous page and see what happens.
| Session 1 (SPID 53) | Session 2 (SPID 55) |
begin tran
select c1, c2 from Demo with(paglock, xlock)
where c1 = 470 |
|
| select c1, c2 from Demo
where c1 = 476 |
It is really waiting for the first transaction to release the lock on page that doesn't contain any rows that answer to the query criteria! Here is what
sp_lock shows:
| SPID | Type | Resource | Mode | Status |
| 53 | PAG | 1:169 | X | GRANT |
| 55 | PAG | 1:169 | IS | WAIT |
Just wanted to remind that the relevant row is on page 171, not on 169. Now let's try to reverse scan order of session 2 and see what does it lock:
| Session 1 (SPID 53) | Session 2 (SPID 55) |
begin tran
select c1, c2 from Demo with(paglock, xlock)
where c1 = 470 |
|
| select c1, c2 from Demo
where c1 = 476
order by c1 desc, c2 desc |
Results of
sp_lock:
| SPID | Type | Resource | Mode | Status |
| 53 | PAG | 1:169 | X | GRANT |
| 55 | PAG | 1:169 | IX | WAIT |
| 55 | PAG | 1:171 | IX | GRANT |
Session 2 has read the relevant row on page 171 and now waits for session 1 to release lock on page 169 in order to check whether it also contains relevant rows.
BTW, all this doesn't happen when you specify both c1 and c2 in search criteria:
select c1, c2
from Demo
where c1 = 476
and c2 = 476
Why? Because SQL Server knows that c1 and c2 together are unique (they're primary key, remember?). So if it is first row on page 171, there can't be another row with the same value on any other page. On the other hand, adding unique index on c1 doesn't help initial query - it still goes to previous page - but somehow it reverses scan order of session 2 from backward to forward. Don't have a clue, why it is implemented this way.
To end with - two funny consequences of all this mess. Do you think that select of 1 row that uses index seek in read committed isolation level can't cause deadlock? You're wrong!
| Session 1 (SPID 53) | Session 2 (SPID 55) |
begin tran
select c1, c2 from Demo with(paglock, xlock)
where c1 = 470 |
|
| select c1, c2 from Demo
where c1 = 476
order by c1 desc, c2 desc |
select c1, c2 from Demo with(rowlock, xlock)
where c1 = 476 and c2 = 476 |
|
Session 2 selected only 1 row and it was enough for deadlock. Session 2 was also the one to survive while session 1 had been chosen as deadlock victim.
Last trick: I add column c3 to the table and create unique index on it:
CREATE TABLE Demo( c1
INT NOT NULL, c2
INT NOT NULL, c3
INT NOT NULL )
GO
ALTER TABLE Demo
ADD CONSTRAINT PK_Demo
PRIMARY KEY CLUSTERED (c1, c2)
GO
CREATE UNIQUE INDEX IX_Demo_c3
ON Demo(c3)
GO
set nocount on
begin
declare @i
int
set @i = 0
while @i < 2000
begin
insert into Demo
values (@i, @i, @i)
set @i = @i + 1
end
end
GO
Now last record in the first page contains values (384, 384, 384), first row of the second page contains (385, 385, 385). And don't forget, c3 is unique.
| Session 1 (SPID 53) | Session 2 (SPID 55) |
begin tran
select c1, c2 from Demo with(paglock, xlock)
where c1 = 470 |
|
| select c1, c2 from Demo
where c1 = 476
order by c1 desc, c2 desc |
delete from Demo
where c1 = 385 and c2 = 385
update Demo
set c1 = 385, c3 = 385
where c1 = 384 and c2 = 384
commit tran |
|
Resultset of session 2:
| c1 | c2 | c3 |
| 385 | 385 | 385 |
| 385 | 384 | 385 |
Are you really, really sure c3 is unique?
I hope to write at least one more post on this theme - next time I will play around with different isolation levels.
*** Answer for the question about missing rows on page: remember, in the end of the page there is an offset array - for each row it contains an offset from the page start. Since we have pretty many rows on page, offset array also requires significant space.