2007.05.26 01:50 AM

Normalizing SQL Server Full-text Search Conditions

You've added a SQL Server 2005 full-text search catalog to your database, configured the appropriate table columns with full-text indexes and added your data. Now you want to expose the data to your users to, well, search, but you don't want to (or can't, maybe, if the search is internet-facing) instruct your users in the proper syntax for SQL Server 2005 full-text search CONTAINS and CONTAINSTABLE search conditions:

< contains_search_condition > ::= 
    { < simple_term > 
    | < prefix_term > 
    | < generation_term > 
    | < proximity_term > 
    | < weighted_term > 
    } 
    | { ( < contains_search_condition > ) 
    [ { < AND > | < AND NOT > | < OR > } ] 
    < contains_search_condition > [ ...n ] 
    } 
< simple_term > ::= 
          word | " phrase "
< prefix term > ::= 
     { "word * " | "phrase *" }
< generation_term > ::= 
     FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ ,...n ] ) 
< proximity_term > ::= 
     { < simple_term > | < prefix_term > } 
     { { NEAR | ~ }
     { < simple_term > | < prefix_term > } 
     } [ ...n ] 
< weighted_term > ::= 
     ISABOUT 
        ( { { 
  < simple_term > 
  | < prefix_term > 
  | < generation_term > 
  | < proximity_term > 
  } 
   [ WEIGHT ( weight_value ) ] 
   } [ ,...n ] 
        ) 
< AND > ::= 
     { AND | & }
< AND NOT > ::= 
     { AND NOT | & !}
< OR > ::= 
     { OR | | }

What you want is for users to just type in their search criteria just like they would in Google. Some words, maybe some quoted phrases, maybe a few operators, and have it just work. So, what to do?

Well, you could try and parse and rearrange the mixed bag of crap your users will submit into a valid normal form that CONTAINS and CONTAINSTABLE will accept. Or, if you're a .NET 2.0 developer, you can just use my FullTextSearch class, which will do it for you.

The constructor for FullTextSearch takes a search condition string and one or more bit-masked enum options. After construction, the originally submitted search condition and construction options are accessed via the Condition and Options properties and the properly formatted search condition is accessed via the NormalForm property. In addition, the search condition's terms are exposed as an array of strings via the SearchTerms property (see the comments in the code to learn why).

Here's an example of how to use FullTextSearch:

string condition = "some crazy search criteria submitted by a user...";

FullTextSearch fts = new FullTextSearch(condition);

using (SqlConnection conn = new SqlConnection(connString)) {
  conn.Open();
  using (SqlCommand command = conn.CreateCommand()) {

    string sql = @"
      SELECT 
        ItemID,
        ItemDate,
        ContentID,
        ContentXML,
        RANK() OVER (order by A.[RANK] DESC) Ranking
      FROM 
        SearchContent
      INNER JOIN
        CONTAINSTABLE(SearchContent, ContentXML, @Criteria) as A 
        ON A.[KEY] = SearchContent.ContentID
      ORDER BY 
        Ranking, ItemDate";

    command.CommandType = CommandType.Text;
    command.CommandText = sql;
    command.Parameters.AddWithValue("@Criteria", fts.NormalForm);

    using (SqlDataReader reader = command.ExecuteReader()) {
      while (reader.Read()) {
        // prepare presentation, maybe highlight search terms...
      }
    }

  }
}

Note that if you opt not to use a parameterized SqlCommand and instead add the FullTextSearch.NormalForm search condition to a SQL string yourself, be sure to double-up the single quotes first, perhaps using String.Replace like this:

string s = fts.NormalForm.Replace("'", "''")

The beauty of FullTextSearch is that, by default, it will never throw an exception, no matter how bad the raw search condition is that you give it (well, unless I've done something wrong). It will turn every search condition it is given into a syntactically valid CONTAINS and CONTAINSTABLE search condition. Of course, that doesn't mean that the resulting condition will reflect what the user meant or that it will result in any matches (after all, I'm not a magician). What it does mean, though, is that the SqlCommand will not throw an exception when executed because you submitted an invalid CONTAINS or CONTAINSTABLE search condition.

Alternatively, if you prefer to alert users to significant issues in their queries instead of letting FullTextSearch silently patch them up, then you can include one or more of the following FullTextSearchOptions in your construction options:

  • ThrowOnUnbalancedParens
    This condition occurs when a user fails to close an opened parenthetical subexpression, or they close one they never opened. If not thrown, FullTextSearch will ignore extra closes and automatically close those left open.
  • ThrowOnUnbalancedQuotes
    This condition occurs when a user fails to close an opened quoted phrase. If not thrown, FullTextSearch will close the quoted phrase at the end of the query and it will assume that inner word single instance quotes are intentional, and will double them up.
  • ThrowOnInvalidNearUse
    This occurs when a user applies a NEAR operator to a subexpression or to a term following a subexpression. NEAR requires two terms. If not thrown, FullTextSearch will switch bad NEARs to ANDs.

All of these conditions, if silently handled by FullTextSearch, have the potential to dramatically change a query's meaning. Of course, users screw up their searches all the time in Google and never realize it, so maybe it's not important to you. That's why they're optional. Here are some examples of search conditions that will cause FullTextSearch to throw exceptions when given the options above, and the resulting normalized search conditions if their exceptions are not thrown:

in:  )(cat hat and (sat or bat
out: ("cat" and "hat" and ("sat" or "bat"))

in:  (hat -"cat sat) or "bat rat"
out: ("hat" and not "cat sat) or ""bat rat")

in:  cat near (sat or hat) near bat
out: "cat" and ("sat" or "hat") and "bat"

Btw, I really wanted to front the code here using an ASP.NET handler and provide a textbox for sampling its behavior, like I did with my dynamic text graphics generator, but I'm unfortunately still using an ASP.NET 1.1 hosting plan and the code requires .NET 2.0, so you'll just have to use your imagination (or grab a copy and try it yourself).

Below are additional details regarding the behavior of FullTextSearch and an explanation of the other FullTextSearchOptions.

Operators
Most users, raised on Google, will enter search terms without specifying operators, perhaps with an occasional exclusion operator in the form of a minus sign. FullTextSearch will add an AND operator when no operator is specified and will interpret - as AND NOT. Support is also provided for &, +, , (comma), and ; (semicolon) for AND, | for OR, and ~ for NEAR. The NOT and ! operators negate AND, making it AND NOT; they must appear alone or in combination with AND (or its alternate symbols), because OR NOT and NEAR NOT are not valid. If NOT or ! are combined with OR or NEAR then they are treated as stand-alone terms. All full-text operators are binary, requiring two operands, so unary (leading) and dangling operators are ignored. If multiple operators are specified between terms, then the last one wins. I considered adding a ThrowOnBadOperators option for all of these conditions, but didn't get around to it (if it's important to you, please let me know). Whitespace between terms and around operators is unimportant. Here are some examples:

in:  cat sat not bat
out: "cat" and "sat" and not "bat"

in:  cat, fat, bat
out: "cat" and "fat" and "bat"

in:  cat;fat;bat
out: "cat" and "fat" and "bat"

in:  cat & fat + bat and hat
out: "cat" and "fat" and "bat" and "hat"

in:  cat - fat ! bat &! hat and! mat +! sat
out: "cat" and not "fat" and not "bat" and not "hat" and not "mat" and not "sat"

in:  cat - fat not bat and not hat
out: "cat" and not "fat" and not "bat" and not "hat"

in:  cat | fat or bat
out: "cat" or "fat" or "bat"

in:  cat ~ fat near bat
out: "cat" near "fat" near "bat"

in:  ~cat -sat + ~ -  bat near
out: "cat" and not "sat" and not "bat"

in:  cat |! fat ~! bat or not hat
out: "cat" or "!" and "fat" near "!" and "bat" or "not" and "hat"

Simple Terms
These are the individual words and multi-word phrases being sought. They can be specified with or without double quotes. Multi-word phrases must be double quoted. Double quotes inside of quoted phrases must be doubled up. FullTextSearch double quotes and strips the whitespace from all terms. Single quotes are treated like any other character. Here are some examples:

in:  cat
out: "cat"

in:  "  cat      hat "
out: "cat hat"

in:  cat "sat on a hat"
out: "cat" and "sat on a hat"

in:  cat "sat on a hat" near "rat's ""fat"" bat"
out: "cat" and "sat on a hat" near "rat's ""fat"" bat"

Prefix Terms
Prefix term is a fancy way of saying terms that end with a wildcard, as denoted by an asterisk. The only tricky thing about prefix terms is that they are intended for "prefix" matching, not suffix or intra-word matching. So, an asterisk appearing anywhere other than at the end of a term, whether an individual word or multi-word phrase, is pointless. In fact, it's worse than pointless, because SQL Server full-text search doesn't index asterisks, so an asterisk appearing anywhere other than at the end of a term will never match. Of course, users may not realize this, and may submit things like "c*t" or "c*t in the hat", neither of which will ever result in a match. However, if given the FullTextSearchOptions TrimPrefixTerms or TrimPrefixPhrases (which are included in the default constructor), FullTextSearch will massage terms containing intra-word asterisks into valid prefix terms. For individual word terms, this means trimming off asterisk-containing words up to the first asterisk. For multi-word phrase terms, this means trimming off each asterisk-containing word up to the character before the asterisk and appending an asterisk to the end of the phrase. This trailing asterisk tells CONTAINS and CONTAINSTABLE that every word in the phrase is a prefix term (SQL Server full-text search doesn't support mixing prefix and non-prefix terms in a multi-word phrase - either every word in the phrase is a prefix term or none are). Here are some examples:

in:  cat*
out: "cat*"

in:  c*t
out: "c*"

in:  "c*t in the hat"
out: "c in the hat*"

in:  "c*t in the hat *"
out: "c in the hat*"

Generation Terms
Between the operator, term, and prefix syntaxes described above, and the subexpression formulations described below, there's probably not much else that anonymous users can be expected to know to perform a search. However, SQL Server full-text search offers another feature that would be nice to utilize without users having to do anything special: generation terms. Generation terms result in the matching of alternative word forms. This is done by passing the term desired into the FORMSOF() function and specifying either a THESAURUS or INFLECTIONAL lookup. I have no interest in the THESAURUS lookup, but the INFLECTIONAL lookup, which performs word stemming-based matching, seemed like a reasonable thing for users to expect. For instance, when searching for "cat" users will expect to match "cats", and when searching for "foot" they'll expect to match "feet", and when phrase searching for "cat mat" they'll expect to match "cat's mats". However, it's too much to ask users to learn the rules regarding FORMSOF(), and even if learned it's too much to ask them to type it for every term. So, if given the FullTextSearchOptions StemTerms and StemPhrases (which are included in the default constructor), FullTextSearch will wrap non-prefix individual word (with length > 1) terms and non-prefix multi-word phrase terms, that are not operands to NEAR operators, in calls to FORMSOF(INFLECTIONAL). Here are some examples:

in:  cat
out: formsof(inflectional, "cat")

in:  "cat in the hat"
out: formsof(inflectional, "cat in the hat")

in:  cat ha*
out: formsof(inflectional, "cat") and "ha*"

in:  mat and "ca* in the hat"
out: formsof(inflectional, "mat") and "ca in the hat*"

in:  cat near hat and mat
out: "cat" near "hat" and formsof(inflectional, "mat")

in:  (t* and cat) near hat
out: ("t*" and formsof(inflectional, "cat")) and formsof(inflectional, "hat")

That last one was a trick to see if you're paying attention. The NEAR was converted to AND, for reasons described below, so it was valid for FullTextSearch to apply FORMOF(INFLECTIONAL) to the "hat" term.

Subexpressions
CONTAINS and CONTAINSTABLE search conditions are really Boolean expressions. Each term (simple, prefix, and generation) in a full-text search is really a Boolean value resulting from an index lookup. The AND [NOT] and OR operators perform logical evaluations of their Boolean operands (the values appearing on their left and right sides) and return True or False. SQL Server supports parenthetically grouping these evaluations into Boolean subexpressions. These subexpressions are evaluated independently and their Boolean results are then evaluated just like any other True or False (term) appearing in the search condition. In fact, because ANDs are evaluated before ORs, subexpressions are required to successfully disambiguate certain AND and OR combinations. For instance, the condition "cat" AND "hat" OR "mat" will return all rows containing both "cat" and "hat", and rows that do not have "cat" and "hat", but do have "mat". To get all rows containing "cat" and either "hat" or "mat", you need a subexpression. It gets even more confusing when you toss in NOT, because it is evaluated before AND. Anyhow, here are some examples:

in:  cat (hat | mat)
out: "cat" and ("hat" or "mat")

in:  (cat hat) | mat
out: ("cat" and "hat") or "mat"

in:  (cat hat) | (mat -(pat and "rat bat"))
out:  ("cat" and "hat") or ("mat" and not ("pat" and "rat bat"))

As mentioned above, NEAR is not a Boolean operator, it's actually a function that takes two term operands and returns a Boolean value. It's unfortunate that Microsoft gave it the same Boolean syntax as AND and OR. They really should have treated NEAR like FORMSOF() or ISABOUT() to make it clear how it is supposed to be used. Anyhow, what this means is that you can't NEAR a Boolean subexpression or Boolean-returning function, like FORMSOF(). When FullTextSearch finds a NEAR with a subexpression operand, it changes the NEAR to AND, unless given the ThrowOnInvalidNearUse option. Here are some examples:

in:  cat ~ (hat | mat)
out: "cat" and ("hat" or "mat")

in:  (hat | mat) ~ cat
out: ("hat" or "mat") and "cat"

in:  cat ~ (hat | mat) ~ sat
out: "cat" and ("hat" or "mat") and "sat"

Finally, as mentioned above, FullTextSearch is forgiving of unbalanced parentheses, ignoring unnecessary closures and closing unclosed subexpressions. Unless, of course, you give it the ThrowOnUnbalancedParens option.


So there you go. In case you missed the earlier link, you can get a copy of FullTextSearch here. It includes an MIT-like license, which means with attribution you can use it any way you like. Just remember, my code occasionally isn't perfect, so you should be wary that if used improperly (i.e., without sufficient testing) it might piss off your users, give your DBA heartburn, or run you out of your home. Like the sidebar says: Proceed at your own risk. That said, if you find something wrong, please come back and let me know.


Comments

One drag concerning support for parenthetical precedence and non-quoted terms is that terms like 401(k) *must* be quoted by the user to avoid being normalized to "401" and ("k"), which will never match anything. I could probably special case closing parens after finding left parens in unquoted terms, but I'm not sure the cure would be better than the problem. Hmm.

ewbi.develops | 2007.05.31 08:42 AM

I am not a .net developer but would love to find out how to use your code. It is exactly what I am looking for.

Can you tell me how to create a class file with .net so that I can create a sql sp to call it?

JJ

JJ | 2007.07.09 08:31 AM

Hi JJ,

I'm afraid that question is a little broad for the tiny time and space I've got available for blogging. You might try starting here:

http://msdn2.microsoft.com/en-us/library/78f4aasd(VS.80).aspx
http://msdn2.microsoft.com/en-us/library/ms131089.aspx

I've actually gotten a couple of requests to build this as a SQL Server 2005 assembly, so I might be posting on this at a later date. Good luck.

ewbi.develops | 2007.07.09 02:54 PM

This is fantastic; great job!

I'm curious if you have any thoughts on how to tackle allowing metadata search critera IN ADDITION to the full-text search.
Like the way gmail allows searching by date as well as email contents (ewbi AND before:2007/07/18 AND after:2007/07/15).

Pennidren | 2007.07.18 08:56 AM

Hi Pennidren,

Funny you should ask. For a current implementation I'm doing I considered integrating the metadata search criteria with the full-text search criteria using a syntax similar to what you've described. Parsing it was not a problem, but rearranging the resulting query (and getting the ANDs/ORs/subexpressions/etc. right) after removing the metadata aspects, which are applied to the actual query separately, and then stringing it all back together in an actual SQL query that preserved the Boolean/subexpression order/combinations proved impossible, for me anyway. I considered, then, allowing the metadata tags, but removing their Boolean operations (i.e., treating them all like adjunct ANDs). This was easier, but not expressive enough. In the end, I simply added properties for the metadata pieces to my search criteria class, which wraps a FullTextSearch object. The front-end then populates these properties explicitly with values collected from separate controls on an "advanced" search page. If you've got some ideas about how this might be done in-line, I'd love to hear them.

Thanks for taking the time to write!

ewbi.develops | 2007.07.19 08:25 AM

Gmail's interpretation of a given search string is either buggy or limited for performance reasons. If you want an example, I can drum one up. I played with it for a while recently and noticed some serious problems, imo. Perhaps they ran into trouble similar to what you did.

I had considered a form of advanced search, but it isn't expressive enough either -- you can't AND/OR these separated elements with search text.

I discussed this problem space with my wife and she suggested breaking the search down into each distinct implicit query and then using unions and joins to pull it all together. Perhaps this is what you mean when you say "stringing it all back together" into actual SQL.

Even if her idea worked, I fear that it wouldn't perform well enough. I'll probably try that out and if it doesn't work, go with advanced search and deal with the limitation imposed.

I'll let you know. Thanks!

Pennidren | 2007.07.31 06:25 AM

This is very useful.

I was doing a "clean contains" user-defined function on the T-SQL side, after some initial cleaning on the .NET side. I got to the point where I could support inflection forms of words and phrases. I just happened upon your post (and included code), and I think you have the right approach here :-)

Thanks so much :-)

Dale Newman | 2007.10.09 06:54 AM

Hi Dale,

You're welcome, hope it helps.

Btw, if I ever get time, I've got a version that includes removal of noise words during normalization to prevent users from getting no search results just because they toss in an occasional "an, of, the, etc", and it includes routines for highlighting search terms in text for presentation of results. Hope I get it posted up here soon!

ewbi.develops | 2007.10.09 12:02 PM

Hi
Thanks for the information on normalizing the search queries, I found it really useful in my current project.
I am interested in how you would go about to automatically remove the noise words from the query. Do you retrieve a list of noise words from SQL Server, or do you specify certain words as noise words?

Thanks

mjm | 2007.10.11 04:47 AM

Hi mjm,

Glad you found it useful. I couldn't find any way to get to the noise words from SQL Server, short of authoring some highly privileged SP there to cruise the file system and open the file directly. Rather than bother with that, I just added an overload to the FullTextSearch constructor to take an IEnumerable, allowing the caller to supply the noise words, which one assumes will match those being used by the targeted SQL Server database (where the caller gets them is their problem; however, in my case, we added a copy of the noise words file to our parent component's configuration). Then, the FullTextSearch ConditionParser PutToken method got some new conditional logic around the expression addition that determines whether the token is a noise word. Pulling noise words at that point allows the expression addition to silently patch up the operators (in case the user had, for instance, NEARed the noise word to another term). Then the code operator-walks the expression stack to determine the impact of the (now missing) noise word on the query and adds it to the highlight terms only if it is a net positive AND.

Hopefully I'll get this posted soon. Good luck!

ewbi.develops | 2007.10.11 10:47 AM

This looks excellent. Any chance you have a VB version? Thanks

Richard | 2007.10.23 02:01 AM

Hi Richard,

Thanks. Re the VB, I'm afraid not. I don't get much chance to work with VB these days and barely have time to prepare and post C# code here. Sometimes, though, as with some of my other posts, folks are kind enough to pick up, translate, and return versions in different languages. This one hasn't gotten much traffic, but perhaps someone will pick it up and convert it.

ewbi.develops | 2007.10.23 08:51 AM

Hi,
Ive been using this for a couple of months now - im very happy with it. Thankyou.

I thought i would give you feedback on one change i recently made.
My users have to sometimes search for part of a file path.
I soon realised the stemming and paths dont mix so well. I switched off StemPhrases and tried double quotes around the file paths. I thought that would fix it. It didnt.
I eventually found that your code strips the quotes from the string early on and then identifies phrases as terms with a space in them. With a bit of effort ive managed to change that so that phrases are now identified by double quotes.
This has a nice side benefit that if i quote a single word it wont get stemmed.
i hope that is helpful info.

Thanks again.

Art | 2007.11.06 05:39 PM

That's great feedback, thanks Art. This implementation certainly exhibits characteristics unique to the needs of my client at the time. I don't recall the details now, but determining whether a token is a phrase (and subsequently applying to it phrase-like search behavior) based on whether it came in already quoted, as opposed to using quotes merely as parsing delimiters and then using the existence of space(s) in the token to determine whether it is a phrase, is a fine alternative approach, though it does result in a subtle difference in behavior, in that it becomes possible for users to opt certain tokens out of stemmed searches. I think we considered the former, but the client opted for the latter. It'd be nice if the code had implemented both and allowed the caller to opt in or out with one of the flags. Perhaps if I get time(!) I'll toss that in and re-post it, along with the noise word skipping and result highlighting logic I've already got done.

ewbi.develops | 2007.11.06 07:10 PM

That sounds like a great tool for website, that's the reason why I need to use it as a stored procedure under SQL Server 2005. Did someone succeeded with this task? I was able to compile and create the SQL assembly, but I can't unserstand how to declare the sp linked to a specific class method.

Any help would be apreciated.

fabio.gava | 2008.01.18 05:55 AM

I repeated Fabio's instructions in a new post:

http://ewbi.blogs.com/develops/2008/01/fulltextsearch.html

Thanks Fabio!

ewbi.develops | 2008.01.29 11:38 AM

You are the man my friend!!!
Thank you so much.

Eric | 2008.02.09 07:41 PM

thanks so much, I found it very useful. You save me a lot of time...

jozo rybarik | 2008.03.14 01:39 AM

I've been using this and it works great. How would this work with SQL 2000 FTS? Would the returned string be accepted by SQL 2000? Thanks.

tf | 2008.03.18 12:07 PM

tf, glad to hear you've found this useful. You ask a good question, but I'm afraid I haven't looked at the SQL Server 2000 FTS syntax close enough to know the differences. If you try it and run into any issues, we'd appreciate you coming back and letting us know. Good luck!

ewbi.develops | 2008.03.18 12:11 PM

Thanks! This is a huge improvement over just feeding in what the user types. The one big issue you already know about: noise words. Users often type them and are more than perplexed when they get back no results.

However, for users raised on Google, there's a couple operators that they interpret different in Google-land: + and ~. The fix seems reasonably doable...

First, don't treat + or ~ as operators... instead, treat them as prefix characters (kinda like + is a postfix character). Then, when you form your phrase for the term, if the first character is +, don't put the stemming FORMSOF wrapper around it. Or, if the first character is ~, use THESAURUS instead of INFLECTIONAL in that wrapper.

Issues in that approach? How hard would it be to take + out of the list of AND's and ~ out of the list of NEAR's?

BMK | 2008.03.27 03:51 PM

Uhh... I meant "(kinda like * is a postfix character)"... sorry, that was a confusing typo.

BMK | 2008.03.27 03:53 PM

Hi BMK, thanks for the comments. It shouldn't be too hard to take those characters out of the AND and NEAR logic, but adding the behavior you describe might be tricky. There are some unfortunate complexity issues caused by the approach I took to parsing search strings which I think might impact this. If I ever get back to this (work continues to take all my time), I'll definitely give this a look as well. Good luck!

ewbi.develops | 2008.03.28 03:31 PM

Back to thank again for this useful tool. I just want to let you know that I had to remove the "-" (minus) char as "and not" operator, since it's too much "intrusive": it should be replaced only if it's preceeded by a space, otherwise search like close-up is translated as close AND NOT up which leads to no results. If you test in Google you'll see difference between close-up and close -up.
Just my $0.02.

fabio.gava | 2008.04.16 06:08 AM

Hi Fabio, I actually fixed that "-" behavior some time ago in my production versions, but like everything else I've done to it, I just haven't had time to post the changes here. Thanks for mentioning it.

ewbi.develops | 2008.04.16 01:49 PM

Thanks for this. I have the equivalent in VBScript, but this was easier than converting it:-) I have just implemented it as an [SQLFunction] and it looks good on initial tests.

I see you have advanced the code since your original post. Is there any way to get your latest and greatest? BTW, www.codeplex.com comes to mind as a good place for this project.

Neil Burnett | 2008.05.29 06:54 AM

Hi Neil,

Thanks for the comment. I'm afraid work and a general lack of time prevents me from posting the latest code. I don't suppose I'll be creating a project on codeplex for the same reason. However, the code is available for anyone to use in anyway they like, as long as the attribution copyright remains, so maybe someone else will pick it up and run with it.

ewbi.develops | 2008.05.29 09:39 AM

Thanks for this - you've saved me many hours of head banging.

BKahuna | 2009.04.16 10:11 AM

Hi,

Many thanks for this. Would it be possible to get the fix for the "-" behavior mentioned above? Or even a pointer on how to fix it?

Thanks

James

James Spibey | 2009.06.17 02:26 AM

Hi James,

Unfortunately the versions of this that fix the hyphen are intertwined with lots of other behavior that I can't (or don't have time to) publish.

Off the top of my head, though, it strikes me that a quick fix would be to find and replace, with a Regex, all instances of "-" not preceded and followed by a whitespace character with a unique string (e.g., "zzzz", "myhyphenfix", etc.) in the ConditionStream constructor. Then, allow the parsing to proceed as normal, but in the PutToken method, just before the currentExpression.AddTerm call, swap out the unique string with a hyphen.

Not pretty, but effective, I think. Good luck!

ewbi.develops | 2009.06.17 09:03 AM

Thank you for the workaround of "hyphen" issue, it works.

Igor | 2011.01.04 04:09 PM


TrackBack

TrackBack URL:  http://www.typepad.com/services/trackback/6a00d8341c7bd453ef00d8354891f353ef

Listed below are links to weblogs that reference Normalizing SQL Server Full-text Search Conditions: