2007.12.12

Google Does Numeric HTML Entities

Ran into an interesting Google feature today while researching some obscure entities found in some scraps of HTML that I had to scrub, store, and index. Maybe everyone knows this, but it was new to me.

If a Google search contains a numeric HTML entity, in the form &#xxx;, Google will convert it to its proper value. So, for instance, if you submit a search for "Air á Danser", it will return "Air á Danser" and perform the expected search. It will not do the same thing for the equivalent named entity reference "Air á Danser".

So when faced with an unfamiliar numeric entity, like ∞ or ℵ, finding out what it looks like is as easy as a Google search:

http://www.google.com/search?q=%26%238734%3B  =  ∞
http://www.google.com/search?q=%26%238776%3B  =  ≈

Lest one think this is a mere byproduct of a web page taking a value in a POST and returning it as the value of a text input element, consider that neither of the other two major search engines provides this feature:


Google is clearly evaluating the numeric entity, converting it to its proper character, and subsequently using the character in its search. Nice touch.



