Validating an Email Address with PHP’s filter_var isn’t perfect
So I’m working on a project where I need to validate that the user has entered a correct email address.
I don’t really care if someone sticks in a “fake” email address – to complete registration, the user needs to enter a confirmation code that I send via email, so if they put in a fake address, they never complete registration, and the aborted account signup is eventually purged anyway. No sweat off my back.
But I do want to catch those few cases where the user makes a legitimate mistake, like mistakenly entering just their name into the email address field. (Don’t laugh, it happens more often than you’d think.)
One of the biggest mistakes users make when filling out these forms is that they omit the TLD (that’s “Top-Level-Domain,” like .com, or .edu) from the email address. I assume that it’s mostly new computer users who do this, and I’ve seen it happen often enough that I’m sure these folks just don’t know any better. So unsophisticated user “billg@microsoft.com” might mistakenly enter only “billg@microsoft”, thinking he did everything correctly. Then he sits waiting for a confirmation email that never arrives.
I don’t really like fancy email address validation techniques. They seem so unnecessary. Programmers like to get all caught up in writing the perfect regular expression that “catches” all cases, and not only is it a complete mess to maintain, but it’s easily circumvented by the end user who types in “fake@fake.com”. Just use a simple confirmation-email method if you need to verify that the user entered a real email address. It’s so much better.
Besides, most of these fancy schemes aren’t even correct, and can be cumbersome for users. For instance, many websites out there don’t allow the plus character (“+”) in an email address, even though “+” is a perfectly valid character in an email address. The plus is also a tremendously useful character for advanced Gmail users who use it for spam prevention and email filtering. Right now, as you read this, there’s probably some programmer out there who thinks he did a stellar job of writing an email validation script that rejects all pluses. All he did was create problems.
For these reasons, my preferred method has always been to just check if the entered email address contains an “at” sign (“@”). If it does, I try to send the email. If not, I tell the user, “Whoops, you’ve made a mistake.” This catches most errors. But not all. It doesn’t catch the billg@microsoft problem.
I searched around a bit and saw that a lot of people are pimping PHP’s native filter_var() function as a way of validating email addresses. It goes something like this:
if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)){
print $_POST['email']. " looks good";
}else{
print $_POST['email']. " is a no go";
}
That’s great, but after running some tests, I saw that it doesn’t solve the problem of “billg@microsoft”, either.
PHP’s filter_var reports that user@domain (with no TLD) is a valid email address, even though for my purposes, it isn’t. I think that filter_var is likely technically correct – user@machine is a fine email address, if you happen to be operating within the confines of whatever network “machine” is a part of. But on the Internet, it’s not gonna get you very far. On the Internet, user@machine is completely useless.
To make filter_var more suitable for Internet purposes you need to do something like this:
if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)){
if (preg_match("/\@.+?\../", $_POST['email'])){
print $_POST['email']. " looks good";
}else{
print $_POST['email']. " is a no go - you forgot the TLD!";
}
}else{
print $_POST['email']. " is a no go";
}
}
The added preg_match line makes sure that there is some character after the @ sign(.), and that a period follows that character(\.), and that another character follows that period (.).
This catches those users who mistakenly left out the TLD, without getting into silly, overly restrictive input validation techniques that often exclude perfectly valid email addresses and are easily circumvented anyway. And for those users who enter fake addresses – so be it. Just send a confirmation email and make them click a link to confirm their email address. You can’t stop people from putting fake info into your form. All you can do is ask them to verify what they’ve entered.


October 6th, 2011 at 9:28 pm
The syntax user@domain is actually valid on the internet – for owners of top level domains. Though I will agree that filter_var() on an email probably isn’t what you want. I would look into using something like Zend_Validate_EmailAddress to validate your email addresses. It can even check MX records to ensure that the domain for the email address is correct.
October 6th, 2011 at 9:37 pm
Hey AndrewX192,
Are you telling me that it’s theoretically possible for me to acquire the email address “eddie@com”? That would be completely awesome.
Makes perfect sense, but it never dawned on me.
October 13th, 2011 at 1:54 am
Yes, a few of the CCTLDs are actually set up that way right now. I think .ua, Ukraine, is one of them, so “eddie@ua” is an entirely feasible address. All it requires is an MX record on the dns entry for the TLD. (Similarly, Some TLDs have A records, making e.g. http://to/ a working address. (Well, they seem to be down at the moment, but they’re unreachable, not unlocatable.)
The *really* interesting question is what it would mean to have an MX record for the root itself. The root’s name is the empty string, so would an email address there be simply “eddie@”? The mind boggles….