Need regular expression to parse string

M

moondaddy

I'm writing an app in vb.net 1.1 and I need to parse strings that look
similar to the one below. All 5 rows will make up one string. I have a
form where a use can copy/paste data like what you see below from excel,
word, notepad, etc.. into a textbox on my form. I need to break each line
into 2 numbers which I'll use as parameters for another function. in all
cases each line will be separated with a vbNewline and in most cases the 2
numbers in the line (2483 and 21 for example) will be separate be a tab
space, but I cant guarantee that. It should however be some sort of white
space in-between the 2 numbers. I could split the string below into 5
smaller strings, but I'm short on a reliable way to consistently parse the 2
numbers from each string.

2483 21
2484 23
24853 3
2486 14
2487 5

the end result of what I want to do is parse this string where I can feed
each value into parameters like this

param1=2483 param2=21
param1=2484 param2=23
param1=24853 param2=3
param1=2486 param2=14
param1=2487 param2=5


any good recommendations?

Thanks
 
C

Cor Ligthert

Moondaddy,

It is known by most regulars in this newsgroup that I do not like the Regex.
Think about it that is at least (for simple problems) about 20 times slower
than a normal loop and replace function.

So I would use for your problem something as (not tested)
\\\
For each st as string in myrows
st = "param1=" & st
st = st.replace(" "," param2=")
next
///

I hope this helps?

Cor

"moondaddy"
 
G

Guest

Why not just use the Split function? You could first split the text into
lines using vbNewLine as the delimiter, then process each line in a For Next
loop. For each line, do a split based on the space character. If you only get
one string back, then try it again with the tab character. Can't think of
anything else that would cause horizontal whitespace, so those two ought to
do it.
 
S

Stephany Young

I knew you had a 'thing' about Regular Expressions but I didn't know why.

Do you have any evidence for your fears or is it anecdotal and/or a matter
of perception?

In my experience, using Regular Expressions, (even for simple problems - and
believe me, my Regex stuff is simple), does not appear to add any
'significant' overhead to a given program.

When I talk about 'significant', I'm not talking about a millisecond here
and millisecond there, rather I am talking about a process taking several
seconds longer using one mechanism as opposed to another.

Interested to hear your views?


Cor Ligthert said:
Moondaddy,

It is known by most regulars in this newsgroup that I do not like the Regex.
Think about it that is at least (for simple problems) about 20 times slower
than a normal loop and replace function.

So I would use for your problem something as (not tested)
\\\
For each st as string in myrows
st = "param1=" & st
st = st.replace(" "," param2=")
next
///

I hope this helps?

Cor

"moondaddy"
 
C

Cor Ligthert

Stephany,

A nice message from you. I have one thing against the regular expressions
and that is not the performance but the readability.

It needs for me (see "me") more time to use and reread the Regex than using
a good structured program language and the classes of dotNet. (I have the
same against SQL scripting by the way)

The benefit should be that in one time a lot of changing is done and
therefore fast. However, I did a lot of testing according too messages in
this newsgroup. (You can Google for it). The Regex in dotNet those test was
always the almost slowest solution. (Splitting strings is slower).

As you maybe have seen from me, I find performance in a program less
important than (afterwards) readable code.

However probably is the reason that I do not like it as well is because I
can do very much (without taking much time) with the String methods and
therefore never take time to look at Regex. (Although when I will have a
situation with variable changing keywords, I will for sure use the Regex to
make a filter with that).

About performance it was in all my test even more than 20 times slower, but
as your stated it goes (as far as I remember me now) about a few seconds
when it is about files from more than 1 Mb in a modern computer. I find that
not important because my experience of the behaviour from a user is that
when he is aware of such a situation takes a natural rest moment.

But I use that performance argument now more than the readability, it seems
that some people find the performance even the smallest more important than
that. (Look for discussions about unboxing which takes really for us humans
impossible to think about small amounts of time).

I hope I explained with this why I have something against the Regex

Cor
 
C

Cor Ligthert

Leigh,

I just wrote a message to Stephanie, where I told the split was slow, I made
that before I readed your message, so please don't make a wrong conclussion.

In this case I think the Split can be a solution as well, it was my first
thought. However it will not hold less rows than the one I provided. The
performance for this simple things is not important in my opinion and so a
matter of taste which code to use.

Cor
 
M

moondaddy

Thanks for all the replies below. I ended up looping thru the data and
using the split function on the vbNewLine of the string. As you can see
below, the string has 4 or 5 vbNewLine(s). The string is parsed into an
array of 5 strings so I wrote some code to split the 2 parameters out of
each string by searching or spaces or vbTab, then trimming the data on each
side of the space or vbTab.

As for performance, its not an issue here since there might only be 20 line
items to parse out. But often I do data cleansing and one time I wrote
several regular expressions to help parse parameters out of 15000+ lines of
text and that took minutes to run, where as I wrote something similar in vba
years ago and took about 10 seconds to run through just as much data. So the
vba was more code to write and ran faster, but in the recent example, I
don't know if I could have written vb.net code to do the same kind of
sophisticated parsing regex did.

Thanks for the comments.

--
(e-mail address removed)
Cor Ligthert said:
Moondaddy,

It is known by most regulars in this newsgroup that I do not like the Regex.
Think about it that is at least (for simple problems) about 20 times slower
than a normal loop and replace function.

So I would use for your problem something as (not tested)
\\\
For each st as string in myrows
st = "param1=" & st
st = st.replace(" "," param2=")
next
///

I hope this helps?

Cor

"moondaddy"
 
P

Peter Huang

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top