posted on Wednesday, August 31, 2005 2:19 PM
by
znichter
Using GUIDs - Inserts
Currently we are designing a database schema. When I heard they were using GUID's (Global Unique Identifier) as PK's (primary keys) my jaw about dropped. Turns out the reasoning behind the use of the keys was completely legitimate and actually is the reason GUID's were implemented by MS in the first place. They needed a key that would be unique across databases and database servers. The GUID certainly fulfills this need but it comes a cost.
The GUID is a wide column (16 bytes to be specific) and contains a unique combination of 33 hex value characters. This column, because it is the primary key, is going to be stored in, of course, the clustered index (unless specified to be a non-clustered index), and will be the page pointer for each leaf page in a non-clustered index. Also, if a GUID is used instead of an integer identity column then the bits need to be matched up for each row which is relatively fast. If a high volume of inserts are done on these tables then GUID's being large will contribute to page splits, as will the fact that NEWID() generates a random value, which could place a new record on any of the data pages will cause performance problems. Another reason is that typically primary keys are searched most frequently and trying to recall a GUID from memory to type into a query is nearly impossible (my feeble mind wouldn't be able to do it).
If you needed a key that would be unique across all servers I would recommend a composite key as the PK instead of the GUID. Use a single integer identity column to uniquely identify the rows and another column to identify the server, placing a default server id value and placing a check constraint on the server id column to validate that the value is only what you would expect for it to be for that server.
I have mocked a test to validate my theory on inserts only (results below), I'll get to the selects a little later, but I am interested in what the rest of you have found.
Test 1 - Completed in 79 seconds on test machine
Created table with the following syntax.
create table test1
(tid uniqueidentifier default newid() not null,
col1 char(1) not null
primary key (tid)) on [primary]
I used the following script to insert into the table.
declare @d1 datetime, @d2 datetime, @i int, @j int
select @d1=getdate(), @i=1, @j=100000
while @i <= @j
begin
insert test1 (col1)
values ('a')
set @i = @i + 1
end
select @d2=getdate()
select datediff(s,@d1,@d2)
Test 2 - Completed in 33 seconds on test machine
Created the second test table with the following syntax.
create table test 2
(tid int identity(1,1) not null,
sid int default 2 not null
check (sid=2),
col1 char(1) not null
primary key (tid,sid)) on [primary]
I used the following script to insert into the test2 table.
declare @d1 datetime, @d2 datetime, @i int, @j int
select @d1=getdate(), @i=1, @j=100000
while @i <= @j
begin
insert test2 (col1)
values ('a')
set @i = @i + 1
end
select @d2=getdate()
select datediff(s,@d1,@d2)
Let me know if you have feedback or different experiences on this. I am specifically interested in real world performance differences in the two approaches.
Performance is the key,
Zach
P.S. Please read the comments to this entry.