2005.07.10 01:33 PM

Weird Anti-spam Email Filtering Bug

Just to show that I haven't become all FlexWiki all the time, I figured I'd write about a weird anti-spam email filtering bug that's been plaguing a web app of mine for a couple of years. The app includes some background processes that send email notifications to users when their various asynch jobs are finished. The emails are in HTML format and include links (i.e., anchors) to the server-side files written by the users' jobs.

With few exceptions, this has worked well for many years. However, once in a while a user would report that the notification email they received included an invalid link. After looking at the HTML source for a few of these emails, I found that the bad anchors were always bad in the same way. They were all missing the period just before the "htm" extension at the end of the anchor's href attribute. What was really weird, though, was that the file name given in the href was repeated as the anchor's text and that text was never missing the period.

Let me illustrate what I'm talking about. Here's an example of a good anchor:

<a href="http://server.client.com/appname/job.results/result.file.name.htm">result.file.name.htm</a>

Here's an example of a bad one (note the missing period before the "htm" extension in the href):

<a href="http://server.client.com/appname/job.results/result.file.namehtm">result.file.name.htm</a>

A quick check of the code responsible for writing the HTML showed that the variable used to write the file name in the anchor text was the same variable appended to the qualified path to write the anchor's href. Because the anchor text copy of the file name was never missing the pre-extension period, this had to mean that the period was being lost sometime after the message was sent. Time to look at the email pipeline.

The code responsible for preparing and sending the emails uses an instance of the NewMail object from the CDONTS library, and the mail is shipped off the box using the standard Windows SMTP service to an SMTP gateway server somewhere on my client's network. A quick mod to the local SMTP configuration allowed me to examine some repeats of some bad email examples while they awaited delivery in the server's SMTP queue. As expected, I found that the hrefs were all fine. This meant that the periods were being dropped somewhere downstream.

Having satisfied myself that the code and the server's configuration were working as expected, and after sending a note off to my client's network support staff for some information, I set off to find out how come just some emails were bad while most were fine. I'll spare you the numerous iterations I ran through to rule out all the possibilities and jump right to the punch line. What I discovered was that if the period before the extension of a file name in an anchor's href fell on exactly position 142 in the href's complete path, something downstream would drop it. If other periods appeared at that position, such as those within the file name or in the name of a folder in the path, they would not be dropped. Only periods appearing just before the file's extension, and only at position 142. Weird, huh?

Of course, this explained why only some emails had bad anchors. Unlike the example anchor shown above, in real life, href lengths vary because the "job.results" and "result.file.name" portions of the href vary by user name and certain job parameters. This also explained why we never saw this problem in emails originating from our production web server, only from our test staging server, even when the same jobs were run on both servers by the same user with the same job parameters. The reason is because the "appname" portion of the href includes an extra "-test" extension on the test staging server, and those 5 extra characters were just enough to put some href combinations into the 142 character sweet spot.

Later I discovered that my client utilizes the services of Syntegra, a professional services division of British Telecom, to provide anti-spam filtering of in-bound and out-bound email. I'm not certain, but I think they're using BT's Message Management Platform, which describes its anti-spam service as utilizing Brightmail, BT Content Filter, NAI, and Trend Micro. I don't know which one of those products is causing this problem, but it's easy to imagine one of them having a bug in a function responsible for tearing apart the hrefs of anchors in HTML emails in order to examine their extensions; maybe some sort of unchecked overflow on a 142 character max length variable or something.

Anyhow, the solution was simple. I just added a BASE element to the email HTML so the anchor hrefs could be reduced to just the file name. Now they never get anywhere near 142 characters long.


Comments

We seem to have a similar problem. We have the period being dropped in first part of the link and not on every link.

For example,

http://www.abc.net/... becomes

http://www/abcnet/...

Please let me know if you've found out anything else on this problem.

Jim Cross | 2005.11.21 01:15 PM


TrackBack

TrackBack URL:  https://www.typepad.com/services/trackback/6a00d8341c7bd453ef00d834a21cd669e2

Listed below are links to weblogs that reference Weird Anti-spam Email Filtering Bug: