Finding email addresses in cells

Guest · Mar 16, 2006

In one of the columns I have text with email addresses I need seperated.

Amongst the text (i.e. "please note that (e-mail address removed) is no longer
employeed" - or - "(e-mail address removed) email is undeliverable")

Is there a formula that I can create to single out just the email address?
There's no set # of characters before or after the "@" - but I'm assuming I
could go by a SPACE to seperate the name@domain from the rest of the text.

Any suggestions?

The trim function only works in one direction (i.e. after the @) and I need
it to go both ways.

Thanks,
Joey

Guest · Mar 16, 2006

You could do Data > TextToColumns > using SPACE as a delimiter and it should
break the entire email address out into it's own separate column as it has a
space both before and after with none in between.....

Vaya con Dios,
Chuck, CABGx3

Guest · Mar 16, 2006

But that doesn't put the email address in one column, especially if there's
25 words in the text of the message. Text-to-columns using SPACE could just
spread out the message along 25 more columns, with NO idea where the email
address is.

I even tried text-to-columns with "@" as the seperator, but then I don't get
the name or the domain name.

Joey

daddylonglegs · Mar 16, 2006

With your data in A1 try this formula

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND(" ",A1,FIND("@",A1))-1)," ",REPT(
",LEN(A1))),LEN(A1))

Ron Rosenfeld · Mar 16, 2006

With your data in A1 try this formula

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND(" ",A1,FIND("@",A1))-1)," ",REPT("
",LEN(A1))),LEN(A1)))

Doesn't seem to work if the email address is at the very end of the string.
--ron

Biff · Mar 16, 2006

Crashes on entries like:

text text text text (e-mail address removed)
(e-mail address removed)

Biff

"daddylonglegs" <[email protected]>
wrote in message
news:[email protected]...

Biff · Mar 16, 2006

This seems to work:

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1&" ",FIND(" ",A1&" ",FIND("@",A1&" "))-1),"
",REPT(" ",LEN(A1))),LEN(A1)))

Biff

Beege · Mar 16, 2006

Joey, This works, too, but you need morefunc.dll (google search)

=TRIM(TEXTREVERSE(LEFT(TEXTREVERSE(LEFT(A1,(SEARCH("
",A1,(SEARCH("@",A1,1)))-1))),(SEARCH(" ",TEXTREVERSE(LEFT(A1,(SEARCH("
",A1,(SEARCH("@",A1,1)))-1))),1)))))

Beege

Ron Rosenfeld · Mar 16, 2006

This seems to work:

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1&" ",FIND(" ",A1&" ",FIND("@",A1&" "))-1),"
",REPT(" ",LEN(A1))),LEN(A1)))

Biff

It will fail if the string happens to start with a dot <.>.

It also gives an erroneous result with:

"It's too bad that John Jones <[email protected]> is no longer with us"

and various other legal email configurations.

--ron

Biff · Mar 16, 2006

Oh well!

Where's your REGEX solution? I know I've seen it at least once.

Biff

daddylonglegs · Mar 17, 2006

Ron said:
Doesn't seem to work if the email address is at the very end of th
string.
--ron

Hi ron, quite right. I believe that can be fixed with a smal
amendment

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND(" ",A1&" ",FIND("@",A1))-1),
",REPT(" ",LEN(A1))),LEN(A1)))

Biff, thanks also. I didn't consider multiple email addresses, I'l
leave that to somebody else ;

Ron Rosenfeld · Mar 17, 2006

Oh well!

Where's your REGEX solution? I know I've seen it at least once.

Biff

Given all of the possible variations in legal email addresses, even a REGEX
solution would be tough.

Here's a partial one:

=REGEX.MID(A1,"\b(\w|[._%-])+@(\w|[.-])+\.[A-Za-z]{2,4}\b")

but it's not completely tested and it will not work if there are <space>'s in
the address. Also it assumes that any terminal 2 to 4 letter code will be a
valid top-level domain. In addition to that not always being the case, there
are some top level domains that are more than four characters. One could
substitute a pipe separated list of valid top level domains for the last
portion of that regex.

It should cover a number of common variations, however.

For the OP, to use this formula you must first download and install Longre's
free morefunc.xll add-in from http://xcell05.free.fr

--ron

Harlan Grove · Mar 17, 2006

Ron Rosenfeld wrote...
....

Given all of the possible variations in legal email addresses, even a REGEX
solution would be tough.

....

Impossible. Like it or not, internet e-mail descends from Unix e-mail,
which means any valid home directory name (precisely, the first level
subdirecories of /home) on a Unix-like system would be a valid
recipient ID. Now expand that to encompass filesystems based on UNICODE
text.

As a practical, non-UNICODE matter, @, / and ASCII NUL can't be
characters in e-mail address names, and *, ?, \ and the other control
characters (ASCII decimal codes 1 through 31 plus 127) would be very
unlikely. Anything else is fair game.

Here's a partial one:

=REGEX.MID(A1,"\b(\w|[._%-])+@(\w|[.-])+\.[A-Za-z]{2,4}\b")

....

If you mean all 'word' characters plus periods, underscores, percent
signs and dashes, better to use

"\b[-A-Za-z0-9_.%]+\@([-A-Za-z0-9_]+\.)*[A-Za-z]+\b"

which allows for valid degenerate e-mail addresses like foo@localhost .
Don't use the alternation operator, |, unless absolutely necessary.
It's a real performance drag in regexps. MUCH MORE EFFICIENT to use
single (if longer) character classes.

Ron Rosenfeld · Mar 17, 2006

Ron Rosenfeld wrote...
...
...

Impossible. Like it or not, internet e-mail descends from Unix e-mail,
which means any valid home directory name (precisely, the first level
subdirecories of /home) on a Unix-like system would be a valid
recipient ID. Now expand that to encompass filesystems based on UNICODE
text.

Constructing regex's to match all legal email addresses seems to be a challenge
taken up by some. I've seen one regex that is said to be 6000+ characters in
length.

As a practical, non-UNICODE matter, @, / and ASCII NUL can't be
characters in e-mail address names, and *, ?, \ and the other control
characters (ASCII decimal codes 1 through 31 plus 127) would be very
unlikely. Anything else is fair game.

Here's a partial one:

=REGEX.MID(A1,"\b(\w|[._%-])+@(\w|[.-])+\.[A-Za-z]{2,4}\b")

Click to expand...

...

If you mean all 'word' characters plus periods, underscores, percent
signs and dashes, better to use

"\b[-A-Za-z0-9_.%]+\@([-A-Za-z0-9_]+\.)*[A-Za-z]+\b"

which allows for valid degenerate e-mail addresses like foo@localhost .
Don't use the alternation operator, |, unless absolutely necessary.
It's a real performance drag in regexps. MUCH MORE EFFICIENT to use
single (if longer) character classes.

Thanks for the pointer about the pipe.

In your regex, what is the significance of the "\" preceding the "@"?

--ron

Harlan Grove · Mar 17, 2006

Ron Rosenfeld wrote...
....

In your regex, what is the significance of the "\" preceding the "@"?

Probably unnecessary. I saw a Perl regexp from a web search for e-mail
address syntax that showed \@, so I figured it might be needed in some
regexp flavors. It may not be necessary, but it won't cause any harm.

Ron Rosenfeld · Mar 17, 2006

Ron Rosenfeld wrote...
...

Probably unnecessary. I saw a Perl regexp from a web search for e-mail
address syntax that showed \@, so I figured it might be needed in some
regexp flavors. It may not be necessary, but it won't cause any harm.

OK, that's basically what I figured -- it wouldn't do any harm. I just hadn't
run across it with "@" in the past.

--ron

Biff · Mar 17, 2006

Some of these regex syntax's are the most cryptic things I've ever seen!

Biff

Harlan Grove · Mar 17, 2006

Biff wrote...

Some of these regex syntax's are the most cryptic things I've ever seen!

....

You should try Einstein's tensor notation in his work on General
Relativity.

Or a few APL one-liners.

Seriously, they may be a pain to learn, but they're far & away the most
powerful means of dealing with text. You'll never fully appreciate them
until you compare the mountains of procedural code you'd need to match
certain common text patterns. For example, try finding US social
security numbers in general strings: ###-##-#### with anything other
than decimal numerals on either side. Regexp:
"(.*\D)?(\d{3}-\d{2}-\d{4})(\D.*)?", where $2 matches the ssn.

Ron Rosenfeld · Mar 17, 2006

Some of these regex syntax's are the most cryptic things I've ever seen!

Biff

At least as implemented in Excel and VBA, once you get used to thinking in that
language, they become relatively simple to follow.

There are some flavors that are orders of magnitude more cryptic.
--ron

Harlan Grove · Mar 18, 2006

Ron Rosenfeld wrote...
....

There are some flavors that are orders of magnitude more cryptic.

Which?

At least for Unix-originated and Unix-like software, there are basic
REs (ed, plain grep), extended REs (awk, egrep), vim/sed REs, and Perl
REs. In addition to DFA vs NFA engines, the differences between them
boil down to whether certain metacharacters begin with backslashes and
whether they provide noncapturing word boundaries, parenthesized
subexpression backreferencing, nongreedy matching qualifiers and
interval qualifiers. Only the Perl flavor provides general noncapturing
assertions. If you know basic REs, these other flavors aren't all that
mysterious. VBScript REs are almost Perl REs, complete with general
noncapturing assertions.

If you mean Word's text patterns, then I agree, but they're difficult
to figure out because they're so nonstandard compared to the Unix-ish
flavors.

Finding email addresses in cells

Guest

Guest

Guest

daddylonglegs

Ron Rosenfeld

Biff

Biff

Beege

Ron Rosenfeld

Biff

daddylonglegs

Ron Rosenfeld

Harlan Grove

Ron Rosenfeld

Harlan Grove

Ron Rosenfeld

Biff

Harlan Grove

Ron Rosenfeld

Harlan Grove