Contains vs. Contains?
There was a pretty interesting question yesterday on the SQL Server 2005 XML Newsgroup:
OK here is my problem: I have a DB with a XML column which holds the xml representation of a document (Think Word XML) in each row. Now I need to find a specific document based on some small bit of data it contains (something like the authors name or the title but I have no context for where in the document this data may live (think keyword search)). Is it possible to find and return this document using SQL and XQuery and if so how? Or better yet where can I find an example.
Sure, this is possible, in fact it can be pretty simple as the following shows.
use scratch
go
drop table someTable
go
create table someTable(id tinyint identity(1,1),doc xml)
go
alter table sometable with nocheck
add constraint pk_someTable primary key clustered (id)
go
insert into someTable(doc) values ('Sam is Cat.')
insert into someTable(doc) values ('Spot is Dog.')
insert into someTable(doc) values ('Honey, where is the Cabela''s catalog?')
go
insert into someTable(doc) values ('He was best know, however, for
his dog eared copy of Blizard Voices.')
go
declare @keyword varchar(max)
set @keyword='Cat'
select id,doc
from someTable
where doc.value('contains(/,sql:variable("@keyword"))','bit')=1
go
The meat of this query is the contains function in the where clause. That's how XQuery supports a "find anywhere in the node" text match.
But that's not all of the story, of course. It makes sense that this query should find the record for "Sam is Cat," but make a small change and something maybe you didn't expect will happen. The change? Change "set @keyword = 'Cat'" to Change "set @keyword = 'cat'". Now it doesn't find Sam, but it does find my question about location of our favorite catalog. This happens because XQuery's contains simply says "does this pattern of characters occur in any part of the specified path." In a case senstive way. Thus, it only finds "catalog" but not "Cat." Poor Sam.
However, I don't know that I'd choose to actually use XQuery like this when Microsoft already has a "better" (in the sense of more efficient, faster and geeky) technology to apply to the problem. Yes, I'm talking about Full-Text-Search and yes, you can do that on XML Data, as the following demonstrates:
drop fulltext index on someTable
drop fulltext catalog tempCatalog
go
backup log scratch with truncate_only
go
create fulltext catalog tempCatalog
go
-- give the catalog time to work
waitfor delay '00:00:10'
create fulltext index on someTable(doc) key index pk_someTable
on tempCatalog
with change_tracking auto
-- give the index time to work
waitfor delay '00:00:10'
go
declare @keyword varchar(50)
set @keyword='dog'
select id,doc
from someTable
where contains(doc,@keyword)
go
Sometimes you just have to experiment to know what contains the kind of contains that contains the kind of seach that best contains the kind of data your searching for... :)
For more on FTS over XML, see: http://msdn.microsoft.com/library/en-us/dnsql90/html/sql2005ftsearch.asp