Regular Expressions in VBA?


R

Rob

All,

I have a very useful report I get weekly (text document) that is in a very
un-useful format... It gives me information on all my personnel (hundreds of
folks) in a "justified" format. I want to import the information I need
from each individual into excel (CSV file) so I can manipulate the data or
search, etc... In the end, I'll probably use Access - but I still have the
same problem.

The easiest way I know to grab the information I need is using Regular
Expressions to parse through each line of the report.

I "use-ta-could" in PERL, or I could teach myself using C#.

I really would like to write the code using VBA because I'm starting to feel
comfortable with the language and I've created about a dozen routines that I
use in Word/Excel/PowerPoint. I'd like to continue becoming more proficient.

Any help putting me in the right direction would be greatly appreciated.

Rob F
 
Ad

Advertisements

H

Harlan Grove

Rob said:
The easiest way I know to grab the information I need is using Regular
Expressions to parse through each line of the report. ....
I really would like to write the code using VBA because I'm starting to
feel comfortable with the language and I've created about a dozen routines
that I use in Word/Excel/PowerPoint. I'd like to continue becoming more
proficient.
....

VBA itself doesn't have regular expressions, but since you mention C#, it's
a fair bet you're using Excel under Windows. If so, you almost certainly
have Windows Script Host installed, in which case you could use
VBScript/VJScript regular expressions.

The documentation is available online from MSDN. They used to have it in a
downloadable form, but who knows whether that's still available.

As for how to use it, see

http://groups-beta.google.com/group/microsoft.public.excel.worksheetfunctions/msg/9fc5c9457939edb5
 
G

Guest

Hi Rob,
If the file is a text file and it is regularly separated, you should be able
to simply open it in Excel. YOu will be taken to various screens and
alternatives to decsribe the data layout. I am not sure what you mean by
"justified", but Excel and open many formats. Comma, tab, space, etc. You may
have to play around with the opening of it to find the right method that fits
your data, but I expect it will open and be useful.
 
R

Rob

David,

When I say "justified," I mean it's a database report that is justified to
the screen (fields are arranged to fit the screen -- like a justified form
in ACCESS):

Name: Joe Blow SSN: 987-65-4321 Home Phone: (850) 555-1234
Start-Date: 12/25/2004 Occupation: Dead beat DOB: 01/01/1986
Education: Graduated from 6th Grade
address, etc...

With Regular Expressions, I can grab a line, parse the data I want ("Joe
Blow", "(850) 555-1234") and insert into a new csv file/database/etc. I
used to do it all the time in PERL, but I haven't used the language in 2 - 3
years. I can learn to do it in C#, but then I have the same problem -- I
will have to ensure every machine can run PERL or C#, and I will have to
train every user (sometimes it's easier to teach a monkey to fly).

Can I do anything similar to regular Expressions with VBA?
 
G

Guest

Hi Rob,
You get an electronic form of this file? When I look at what you have below,
this all comes on one line? A record per line? What "type" of file, what is
the file extension? Have you attempted to open this file with Excel? I am not
talking about how is appears on a computer screen, but form the file is is
in, prior to it's viewing.

Really, try abd open it with Excel and see what happens. If it is what
appears below, I think you will be able to open it. Again you may have to
play with it and attempt this more than once to get what you want. The
process of opening it will "parse" it for you. This does not require and VB
code. YOu will taken through several "Screens" called the Text Import
Wizard", which allows you to describe the format of the text file you are
trying to open in Excel. You will not need to parse it at this time, but
simply put it in a form that is readable and useful.

When you have it in this form, you can delete whole columns, which will
represent fields of data. Try this and see if it will open for you. In Excel
- File/Open, make sure that you tell Excel to look for "All Types" in the
Open dialog or it will not recogize anything but Excel files.
 
M

Michael Bednarek

microsoft.public.excel.programming:

[snip]
The easiest way I know to grab the information I need is using Regular
Expressions to parse through each line of the report.
It all starts with
Set RegEx = CreateObject("vbscript.regexp")

See <http://www.vbaexpress.com/kb/getarticle.php?kb_id=68> and many more
for examples. Searching the Internet (web and groups) returns quite a
number of useful results.
 
T

Tushar Mehta

If the contents of the file are a fixed number of columns, each of a
fixed character width, you should follow through on David's suggestion.

If you still need Regular Expression support, in addition to the other
suggestions, you might also want to check http://www.tmehta.com/regexp/

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 
R

Rob

David,

I gave "Importing data a try, but it is not powerfull enough to do what I
needed. One record consists of anywhere from 29 to 35 fields (The last
couple of fields are normally different on every record depending on what
information is flagged by the sender.

Like I said, I can do this very easily in PERL, but I was hoping to do so in
VBA so It's easy for the folks I built it for to use.
 
Ad

Advertisements

H

Harlan Grove

Tushar Mehta said:
If the contents of the file are a fixed number of columns, each of a
fixed character width, you should follow through on David's suggestion.

If you still need Regular Expression support, in addition to the other
suggestions, you might also want to check http://www.tmehta.com/regexp/
You have a subtle error in one of your examples in

http://www.tmehta.com/regexp/examples.htm

specifically, "The pattern that recognizes all characters other than
letters, numbers, and the underscore character is [^\w+]." You should use
\W+ (upper case W rather than lower case w, representing the Unix mindset
that case matters). Your regexp is actually an error because it doesn't
strip out + characters, and your claim about what the + char does in your
regexp is wrong because closure operators don't apply within character
classes.

Just try

=RegExpSubstitute("a+1+_","[^\w+]","")

and you'll see the result is "a+1+_". Inside character classes, *, + and ?
are treated as literal characters, so your regexp is the complement of the
SINGLE CHARACTER class made up of 'word' (token) characters *AND* +. So your
regexp also does a lot more work than it should. While you could use [^\w]+,
that's foolish compared to \W+.

You have another subtle error in "The sub-expression ((\d|.)+) isolates the
number, which may contain decimal values, and the next sub-expression (CR)
isolates the CR." Your subexpression matches any string of one or more
periods. The canonical way to express unsigned decimal numbers possibly with
decimal fractional parts is

(\d*\.)?\d+

While your subexpression may work on your test data and may work most of the
time, it doesn't take much to make it work *ALL* of the time.

You need to invest some money buying Jeffrey Friedl's "Mastering Regular
Expressions" and some time reading and understanding it. You may be an Excel
MVP, but you're a regexp newbie. If you think I'm harsh, try learning how to
use regexps for real by posting replies in comp.unix.shell, comp.lang.awk
and comp.lang.awk, where the regexp experts hang out.
 
H

Harlan Grove

use regexps for real by posting replies in comp.unix.shell, comp.lang.awk
and comp.lang.awk, where the regexp experts hang out.
Make that comp.unix.shell, comp.lang.awk and comp.lang.perl.misc.
 
T

Tushar Mehta

Thank you for your comments. Rest assured that both the technical
comments and those of a more personal nature will get all the respect
they deserve.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

Tushar Mehta said:
If the contents of the file are a fixed number of columns, each of a
fixed character width, you should follow through on David's suggestion.

If you still need Regular Expression support, in addition to the other
suggestions, you might also want to check http://www.tmehta.com/regexp/
You have a subtle error in one of your examples in

http://www.tmehta.com/regexp/examples.htm

specifically, "The pattern that recognizes all characters other than
letters, numbers, and the underscore character is [^\w+]." You should use
\W+ (upper case W rather than lower case w, representing the Unix mindset
that case matters). Your regexp is actually an error because it doesn't
strip out + characters, and your claim about what the + char does in your
regexp is wrong because closure operators don't apply within character
classes.

Just try

=RegExpSubstitute("a+1+_","[^\w+]","")

and you'll see the result is "a+1+_". Inside character classes, *, + and ?
are treated as literal characters, so your regexp is the complement of the
SINGLE CHARACTER class made up of 'word' (token) characters *AND* +. So your
regexp also does a lot more work than it should. While you could use [^\w]+,
that's foolish compared to \W+.

You have another subtle error in "The sub-expression ((\d|.)+) isolates the
number, which may contain decimal values, and the next sub-expression (CR)
isolates the CR." Your subexpression matches any string of one or more
periods. The canonical way to express unsigned decimal numbers possibly with
decimal fractional parts is

(\d*\.)?\d+

While your subexpression may work on your test data and may work most of the
time, it doesn't take much to make it work *ALL* of the time.

You need to invest some money buying Jeffrey Friedl's "Mastering Regular
Expressions" and some time reading and understanding it. You may be an Excel
MVP, but you're a regexp newbie. If you think I'm harsh, try learning how to
use regexps for real by posting replies in comp.unix.shell, comp.lang.awk
and comp.lang.awk, where the regexp experts hang out.
 
H

Harlan Grove

Tushar Mehta said:
Thank you for your comments. Rest assured that both the technical
comments and those of a more personal nature will get all the respect
they deserve.
....

The personal comments you're free to ignore.

The technical comments are objective & verifiable: the regexps you show are
inconsistent with the explanations you provide. Since you haven't corrected
the page in question yet, I can only conclude you don't care about the
errors on your site and that your tender feelings are more important to you
than providing *ACCURATE* information to others.
 
T

Tushar Mehta

Oh, for crying out loud, grow up.

I'm swamped with work and not inclined to jump at your every post. The
errors have been around for several months and can afford to stay there
for a few more days.

Contrary to your arrogant condescension, I could spot some of the
errors just by looking at what you wrote -- and wonder how I came up
with those patterns in the first place {g}. The others, yes, I will
verify.

You might be surprised but the last time I checked there was no rule
that says MS XL MVPs are not allowed to be good with regular
expressions.

You know, Harlan, in many aspects you are technically good. I just
don't understand how you have developed this idea that rudeness is a
measure of technical competence. But, I guess, to each his own.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 
Ad

Advertisements

H

Harlan Grove

Tushar Mehta said:
You might be surprised but the last time I checked there was no rule
that says MS XL MVPs are not allowed to be good with regular
expressions.
No rule at all. Just reacting to the apparent premise that expertise in one
field implies expertise in other, tenuously related fields. Why did you
bother with the regexp example page when there are hundreds of thousands of
posts on regexps in the archives of comp.unix.shell, comp.lang.perl.misc and
comp.lang.awk (as well as alt.editors.batch [sed], comp.editors [vi and
EMACS], and the newsgroups for Python, Ruby, PHP and for all I know Rebol,
Tcl, and lua).
You know, Harlan, in many aspects you are technically good. I just
don't understand how you have developed this idea that rudeness is a
measure of technical competence. But, I guess, to each his own.
Rudeness is one of my flaws. It's unrelated to any technical competence.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top