So I’m working on a project where I need to validate that the user has entered a correct email address.

I don’t really care if someone sticks in a “fake” email address – to complete registration, the user needs to enter a confirmation code that I send via email, so if they put in a fake address, they never complete registration, and the aborted account signup is eventually purged anyway.  No sweat off my back.

But I do want to catch those few cases where the user makes a legitimate mistake, like mistakenly entering just their name into the email address field.  (Don’t laugh, it happens more often than you’d think.)

One of the biggest mistakes users make when filling out these forms is that they omit the TLD (that’s “Top-Level-Domain,” like .com, or .edu) from the email address.  I assume that it’s mostly new computer users who do this, and I’ve seen it happen often enough that I’m sure these folks just don’t know any better.  So unsophisticated user “billg@microsoft.com” might mistakenly enter only “billg@microsoft”, thinking he did everything correctly.  Then he sits waiting for a confirmation email that never arrives.

I don’t really like fancy email address validation techniques.  They seem so unnecessary.  Programmers like to get all caught up in writing the perfect regular expression that “catches” all cases, and not only is it a complete mess to maintain, but it’s easily circumvented by the end user who types in “fake@fake.com”.  Just use a simple confirmation-email method if you need to verify that the user entered a real email address.  It’s so much better.

Besides, most of these fancy schemes aren’t even correct, and can be cumbersome for users.  For instance, many websites out there don’t allow the plus character (“+”) in an email address, even though “+” is a perfectly valid character in an email address. The plus is also a tremendously useful character for advanced Gmail users who use it for spam prevention and email filtering.  Right now, as you read this, there’s probably some programmer out there who thinks he did a stellar job of writing an email validation script that rejects all pluses. All he did was create problems.

For these reasons, my preferred method has always been to just check if the entered email address contains an “at” sign (“@”).  If it does, I try to send the email.  If not, I tell the user, “Whoops, you’ve made a mistake.”  This catches most errors.   But not all.  It doesn’t catch the billg@microsoft problem.

I searched around a bit and saw that a lot of people are pimping PHP’s native filter_var() function as a way of validating email addresses.  It goes something like this:

if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)){
      print $_POST['email']. " looks good";
}else{
      print $_POST['email'].   " is a no go";
}

That’s great, but after running some tests, I saw that it doesn’t solve the problem of “billg@microsoft”, either.

PHP’s filter_var reports that user@domain (with no TLD) is a valid email address, even though for my purposes, it isn’t.  I think that filter_var is likely technically correct – user@machine is a fine email address, if you happen to be operating within the confines of whatever network “machine” is a part of.  But on the Internet, it’s not gonna get you very far.  On the Internet, user@machine is completely useless.

To make filter_var more suitable for Internet purposes you need to do something like this:

if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)){
         if (preg_match("/\@.+?\../", $_POST['email'])){
            print $_POST['email']. " looks good";
         }else{
            print $_POST['email'].   " is a no go - you forgot the TLD!";
         }
      }else{
         print $_POST['email'].   " is a no go";
      }
}

The added preg_match line makes sure that there is some character after the @ sign(.), and that a period follows that character(\.), and that another character follows that period (.).

This catches those users who mistakenly left out the TLD, without getting into silly, overly restrictive input validation techniques that often exclude perfectly valid email addresses and are easily circumvented anyway.  And for those users who enter fake addresses – so be it.  Just send a confirmation email and make them click a link to confirm their email address.  You can’t stop people from putting fake info into your form.  All you can do is ask them to verify what they’ve entered.