2007.05.29 10:39 PM
SQL Server 2005 Full-text Search and XML Attributes
There are only a couple of documents on Microsoft's site that describe the full-text search indexing of attribute values in XML columns. The first, XML Best Practices for Microsoft SQL Server 2005, says:
You can create a full-text index on XML columns; this indexes the content of the XML values while ignoring the XML markup. Attribute values are not full-text indexed (since they are considered part of the markup) and element tags are used as token boundaries.
The other, XML Indexes in SQL Server 2005, says:
To search XML documents or do content-sensitive queries on text, you can combine SQL Server full-text search with XQuery. Full-text search will index the text nodes of XML documents, though not the elements or attributes.
This one is a little vague and might be referring to just attribute names, but we know that attributes don't have "text nodes":
Characters inside comments, processing instructions and attribute values do not produce text nodes.
So together these documents seem to suggest that the values of XML attributes are not indexed by the full-text search engine.
But they are. And it's pretty easy to prove.
Connect to a SQL Server 2005 instance that has full-text search installed (run a SELECT FULLTEXTSERVICEPROPERTY('IsFullTextInstalled') query to verify that it is installed).
Use or create a SQL Server 2005 database. If created using DDL, full-text search will be enabled by default; if created with the visual tools in SQL Server Management Studio, you have to check the "Use full-text indexing" option on the General properties page. If full-text search isn't enabled, either because you're using an existing database or you used the visual tools and forgot to check the option, you can enable it with the visual tools or run "exec sp_fulltext_database enable", even though the latter is considered deprecated (I'm not sure what the alternative will be when it is "removed in a future version of Microsoft SQL Server").
Now add a full-text catalog to the database (make sure the target folder exists and that the SQL Server service account has CRUD permissions on it):
CREATE FULLTEXT CATALOG FTCatalog IN PATH 'fully-qualified folder path' AS DEFAULT
Create a table having an XML column and a column suitable for use as a unique key:
CREATE TABLE SearchContent ( ContentID bigint IDENTITY(1,1) NOT NULL, ContentXML xml NOT NULL )
Create a unique index or primary key constraint:
CREATE UNIQUE CLUSTERED INDEX IX_SearchContent_ContentID ON SearchContent ( ContentID ASC )
Finally, create a full-text index on the table's XML column:
CREATE FULLTEXT INDEX ON SearchContent (ContentXML) KEY INDEX IX_SearchContent_ContentID ON FTCatalog WITH CHANGE_TRACKING AUTO
Now insert some data into the table:
INSERT INTO SearchContent(ContentXML) VALUES('<root><e1>element value</e1></root>') INSERT INTO SearchContent(ContentXML) VALUES('<root><e1 attrib="attribute value"/></root>')
The following queries clearly show that attribute values are indexed by the full-text search engine:
SELECT * FROM SearchContent WHERE CONTAINS(ContentXML, 'element') ContentID ContentXML 1 <root><e1>element value</e1></root> SELECT * FROM SearchContent WHERE CONTAINS(ContentXML, 'attribute') ContentID ContentXML 2 <root><e1 attrib="attribute value" /></root> SELECT * FROM SearchContent WHERE CONTAINS(ContentXML, 'value') ContentID ContentXML 1 <root><e1>element value</e1></root> 2 <root><e1 attrib="attribute value" /></root>
So there you go.
Just found this. Sounds like exactly what I was looking for! Will let you know; thanks!
Pennidren | 2007.07.12 12:48 PM
Oops, that comment was meant for the normalization post (FullTextSearch) :)
Pennidren | 2007.07.12 12:50 PM
Eric, I sent u an email at firstname.lastname@example.org so I could manage the content better. This form is unfamiliar.
Jerry E. Allgood | 2007.08.31 02:00 PM
Jerry - Okay, I'll look for it.
ewbi.develops | 2007.08.31 02:22 PM
There's more to it. Im working on a project and just struck the prob. My xml field is mostly attribute based. It has an FT defined on it. The funny thing is that the ft does find an attribute value, but only !on the most recent entry in the table! and not on others. I with they either enabled it or disabled it altogether.
Nik | 2007.12.18 07:01 AM
I can't reproduce what I think you're describing. If you have the time, I would really appreciate you describing your schema and a couple of rows that will reproduce this behavior. Btw, did you try the example given in the post above? Did it produce the same results you described, where only the last inserted row was returned?
ewbi.develops | 2007.12.18 08:04 AM
I can't get xml attributes to get indexed. Do you think that DB compatibility level may play a role there? My DB is still on 7.
rajko | 2008.03.04 10:00 AM
Do you mean it's a SQL Server 2005 instance, but the particular database has a compatibility level of SQL Server 7.0? Also, when you say you can't get xml attributes indexed, do you mean Full Text Search indexing? And what have you done to prove to yourself that they haven't been indexed?
Let me know and I'll try to help.
ewbi.develops | 2008.03.04 11:18 AM
Is this a bug? Since they were attened to avoid attribute value or it is a documentation bug. Thanks anyway. If I were depending on only their doc I would never think it works this way.
Routh | 2008.04.10 01:17 PM
Is there any way to turn off the attribute indexing? I'd actually like SQL Server not to take the attributes into account when I do a search, but as you write in the post, the attributes are indexed.
Thomas L | 2009.02.25 02:03 AM
Thomas, sorry for the delay - not being notified of comments by TypePad. I don't think there's anyway to explicitly avoid indexing attributes; however, I'm now experiencing inconsistent attribute value indexing depending on whether the attribute includes whitespace. Will try to figure out and post more later.
ewbi.develops | 2009.03.27 03:57 PM
Yes this is an interesting problem, we use the XML column to store XHTML data so obviously don't want to index the Attributes as this will means that FTS gets hits on things like style="font-weight: bold;" etc etc
Anyone know if there is a way to turn off attribute indexing....?
David James | 2009.05.06 02:46 AM
You can't disable attribute indexing but there is a trick you can do to prevent attributes from being searched.
SELECT * FROM T WHERE CONTAINS(xCol,'custom') AND xCol.exist('//*/text()[contains(., "custom")]') =1
jarrin | 2010.01.23 02:08 PM
Nice tip, Jarrin - thanks for sharing it!
ewbi.develops | 2010.01.23 08:07 PM
I’m working on a project where we need to do a full text search on XML documents in a SQL 2005 Server. The element values are found but not the attributes. I tried your example and it works but if I add a sub node, the CONTAINS wont find the attribute value.
set ContentXML = ''
where ContentID = 2
Have you been able to index the attribute values on deeper XML nodes?
Mathias B | 2010.03.30 05:40 AM
Unfortunately, the TypePad comment system ate the XML in your SQL statement above. However, I get your question. A quick test using the same setup described in the post above produced the same results you're reporting, which is that attribute values on elements more than one level deep (counting from the root element) aren't indexed. The element text node values are indexed, but not their attribute values.
That's not a good thing. I'm going to poke around some more and see if this is still the case in SQL Server 2008 and whether Microsoft is reporting this.
Thanks for bringing this to everyone's attention.
ewbi.develops | 2010.03.30 12:16 PM
ewbi.develops | 2010.03.30 12:19 PM
Thanks for a quick reply! And yeah the XML part disappeared :)
I’ve started a new thread specific for SQL Server 2005 with a similar example like yours above just to clarify the problem.
Mathias Bartoll | 2010.03.30 12:42 PM
I Think the search engine is a great invention, because, in my case, i love to surf the web, and a search engine is essencial!!!
'course the google is the best one, yahoo is a good one too, but a MILES away from the google supremacy!!!
Can't wait to see the new Google Caffeine, i'm so excited!!!!!!
generic viagra | 2010.05.14 03:11 PM
TrackBack URL: http://www.typepad.com/services/trackback/6a00d8341c7bd453ef00d8357fa7e269e2
Listed below are links to weblogs that reference SQL Server 2005 Full-text Search and XML Attributes: