scanning excel list for invalid e-mail addresses

G

Guest

Hi. I'm not sure I'm in the right forum, but here goes.

Given a list of several hundred e-mail addresses in a column of an Excel
worksheet, is there a good way to identify invalid addresses? (for example,
if the '@' is missing) ?

thanks in advance
Becky
 
G

Guest

You probably could do better with VBA, but try this.
Copy your addresses to a new worksheet
Replace > Replace what:= *@*.???
All the BAD will remain
 
G

Guest

Or rather do this:
Assuming your address are in column A, copy this down a column
=AND(LEFT(RIGHT(A1,4),1)=".",NOT(ISERR(FIND("@",A1))))
Invalid address are singled out as FALSE
 
I

iliace

This will not account for longer domain designations such as .info, or
international ones like .ca, .ru, etc.

You want a function that searches for at least one period after @, as
well as the @ symbol. I think you can do both with one stone. This
function returns TRUE if the address is invalid.

=ISERROR(1/(LEN(MID(A1,FIND("@",A1,1)+1,LEN(A1)-FIND("@",A1,1)))-
LEN(SUBSTITUTE(MID(A1,FIND("@",A1,1)+1,LEN(A1)-
FIND("@",A1,1)),".",""))))

1 is divided by the difference of the two LEN() functions. The
difference of the two LEN() functions will give you the number of
periods after the @ symbol. If @ symbol is not present, FIND() will
return an error that will filter through to ISERROR() function. If @
is present but no periods exist after it, the division will return an
error also filtering through to ISERROR().
 
P

Peo Sjoblom

Your formula accepts the following as valid email addresses


(e-mail address removed)

[email protected]

(e-mail address removed)

[email protected]



which only shows that it is basically impossible unless you use a list
of all country abbreviations in use for emails that the formula check
against plus the different domains and that it also checks the number of
characters in the last string after the period.




--
Regards,

Peo Sjoblom
 
I

iliace

Sorry, just realized there's a much easier way to do this.

=ISERROR(SEARCH("*@*.*",A1))

I completely forgot that SEARCH() supports wildcard characters.
 
I

iliace

If you want to account for that possibility - which I would say is
soundness and not validity - you can use this (press Ctrl+Shift+Enter
instead of Enter):

=OR(ISERROR(SEARCH("*@*.*",A1)),ISERROR(MATCH(TRUE,MID(A1,FIND(CHAR(150),SUBSTITUTE(A1,".",CHAR(150),LEN(A1)-
LEN(SUBSTITUTE(A1,".",""))))+1,LEN(A1)-
FIND(CHAR(150),SUBSTITUTE(A1,".",CHAR(150),LEN(A1)-
LEN(SUBSTITUTE(A1,".","")))))={"biz","com","edu","gov","info","int","mil","name","net","org","aero","asia","cat","coop","jobs","mobi","museum","pro","tel","travel","arpa","root","berlin","bzh","cym","gal","geo","kid","kids","lat","mail","nyc","post","sco","web","xxx","nato","example","invalid","localhost","test","bitnet","csnet","ip","local","onion","uucp","ac","ad","ae","af","ag","ai","al","am","an","ao","aq","ar","as","at","au","aw","ax","az","ba","bb","bd","be","bf","bg","bh","bi","bj","bm","bn","bo","br","bs","bt","bw","by","bz","ca","cc","cd","cf","cg","ch","ci","ck","cl","cm","cn","co","cr","cu","cv","cx","cy","cz","de","dj","dk","dm","do","dz","ec","ee","eg","er","es","et","eu","fi","fj","fk","fm","fo","fr","ga","gd","ge","gf","gg","gh","gi","gl","gm","gn","gp","gq","gr","gs","gt","gu","gw","gy","hk","hm","hn","hr","ht","hu","id","ie","il","im","in","io","iq","ir","is","it","je","jm","jo","jp","ke","kg","kh","ki","km","kn","kr","kw","ky","kz","la","lb","lc","li","lk","lr","ls","lt","lu","lv","ly","ma","mc","md","mg","mh","mk","ml","mm","mn","mo","mp","mq","mr","ms","mt","mu","mv","mw","mx","my","mz","na","nc","ne","nf","ng","ni","nl","no","np","nr","nu","nz","om","pa","pe","pf","pg","ph","pk","pl","pn","pr","ps","pt","pw","py","qa","re","ro","ru","rw","sa","sb","sc","sd","se","sg","sh","si","sk","sl","sm","sn","sr","st","sv","sy","sz","tc","td","tf","tg","th","tj","tk","tl","tm","tn","to","tr","tt","tv","tw","tz","ua","ug","uk","us","uy","uz","va","vc","ve","vg","vi","vn","vu","wf","ws","ye","yu","za","zm","zw","eh","kp","me","rs","um","bv","gb","pm","sj","so","yt","su","tp","bu","cs","dd","zr"},
0)))
 
I

iliace

Sorry Peo, I didn't mean you, I meant OP.

The formula is too long for Excel 2003 and earlier versions - for some
reason compatibility checker didn't pick up on that. You can take the
list of global domain designations, save them as a CSV file in notepad
(use linefeed instead of comma due to 256 columns limitation), open
the file in Excel, and copy the resulting worksheet to your main
workbook. Then, use the range reference in formula instead of the
actual list.

In reality, I don't think all this is necessary - unless you have some
global United Nations list or something - but it can be done via
formula, if desired. It calculates fairly quickly even for long
lists: a full recalculation of ~22,000 email addresses took 70 seconds
in Excel 2007 on XP, dual-core 2.8ghz, 512MB RAM with a bunch of other
stuff running. I have not tested for speed as a conditional format
rule, which would probably be the best application for this. On a
side note, make sure you remove the line carry-over dashes in the
above, if you do plan to use it.

Another item you might check for is ensuring that there is at least
one character before the @ symbol, and at least one before the last
period. This is just adding items to the OR() function. Honestly,
the most reasonable solution is probably this:

=OR(ISERROR(SEARCH("*@*.*",A1)),LEFT(TRIM(A1),
1)="@",MID(TRIM(A1),FIND(CHAR(150),SUBSTITUTE(TRIM(A1),".",CHAR(150),LEN(TRIM(A1))-
LEN(SUBSTITUTE(TRIM(A1),".",""))))-1,1)="@",RIGHT(TRIM(A1),1)=".")

This will not allow any of the following values:

@.
@b.c
[email protected]
a@b.
anything missing either @ or . or both

But it will allow anything in this format:

(e-mail address removed)
(e-mail address removed)
etc...

regardless of the number of characters any of the letters represent
stand for. Used as a conditional format, this formula can highlight
invalid e-mail addresses.
 
R

Ron Rosenfeld

Sorry Peo, I didn't mean you, I meant OP.

The formula is too long for Excel 2003 and earlier versions - for some
reason compatibility checker didn't pick up on that. You can take the
list of global domain designations, save them as a CSV file in notepad
(use linefeed instead of comma due to 256 columns limitation), open
the file in Excel, and copy the resulting worksheet to your main
workbook. Then, use the range reference in formula instead of the
actual list.

In reality, I don't think all this is necessary - unless you have some
global United Nations list or something - but it can be done via
formula, if desired. It calculates fairly quickly even for long
lists: a full recalculation of ~22,000 email addresses took 70 seconds
in Excel 2007 on XP, dual-core 2.8ghz, 512MB RAM with a bunch of other
stuff running. I have not tested for speed as a conditional format
rule, which would probably be the best application for this. On a
side note, make sure you remove the line carry-over dashes in the
above, if you do plan to use it.

Another item you might check for is ensuring that there is at least
one character before the @ symbol, and at least one before the last
period. This is just adding items to the OR() function. Honestly,
the most reasonable solution is probably this:

=OR(ISERROR(SEARCH("*@*.*",A1)),LEFT(TRIM(A1),
1)="@",MID(TRIM(A1),FIND(CHAR(150),SUBSTITUTE(TRIM(A1),".",CHAR(150),LEN(TRIM(A1))-
LEN(SUBSTITUTE(TRIM(A1),".",""))))-1,1)="@",RIGHT(TRIM(A1),1)=".")

This will not allow any of the following values:

@.
@b.c
[email protected]
a@b.
anything missing either @ or . or both

But it will allow anything in this format:

(e-mail address removed)
(e-mail address removed)
etc...

regardless of the number of characters any of the letters represent
stand for. Used as a conditional format, this formula can highlight
invalid e-mail addresses.



"

Here is a routine using a UDF that should detect valid email addresses along
with matching all country code top level domains, and specific common top level
domains. You can easily add more top level domains as they become available.

The formula will return FALSE if the email test fails, or TRUE if it succeeds.

It uses regular expressions, and the regex can either be hard coded in the UDF,
or included in the formula on the worksheet. I chose to put it into the
worksheet function.

To use the UDF, enter the code below into a worksheet module; then use the
function:

=REChk(A1,"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_
`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|
net|gov|biz|info|name|aero|biz|info|jobs|museum)\b")

To enter the UDF, <alt-F11> opens the VBEditor. Ensure your project is
highlighted in the project explorer window, then Insert/Module and paste the
code below into the window that opens:

==================================================
Option Explicit
Function REChk(str As String, P As String) As Boolean
Dim re As Object
Set re = CreateObject("VBScript.Regexp")
With re
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = P
End With
REChk = re.Test(str)
End Function
====================================
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top