Searching for timestamp in string

B

Brian Mitchell

Is there an easy way to pull a date/time stamp from a string? The DateTime
stamp is located in different parts of each string and the DateTime stamp
could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
dd/mm...etc.)

Any ideas would be appreciated,
Thanks!!
 
J

Jay B. Harlow [MVP - Outlook]

Brian,
You could use a RegEx to search a string for "DateTime" like formats in a
string.

Something like:

Imports System.Text.RegularExpressions

Const pattern As String =
"(?<date>(\d{1,2}/\d{1,2}/\d{1,4})|(\d{2}:\d{2} \d{1,2}/\d{1,2}/\d{1,4}))"
Static dateExpression As New Regex(pattern, RegexOptions.Compiled)
Dim input As String = "Today is ""12/31/2004"" or 12:49 12/31/2004"
For Each match As match In dateExpression.Matches(input)
Debug.WriteLine(match.Groups("date").Value, "found")
Next

You could expand pattern to include multiple formats, I only show date &
time followed by date. Just be careful of 4 & 2 digit years. Note that the
RegEx pattern doesn't know or care if the date is mm/dd/yy or dd/mm/yy or
even yy/mm/dd, just that it is 3 numbers seperated by a slash...

Once you have the Match I would recommend the following DateTime.ParseExact
overload to parse the date found into a DateTime value.

http://msdn.microsoft.com/library/d.../frlrfSystemDateTimeClassParseExactTopic3.asp

As it allows you to specific a number of custom formats to check against to
convert the string to a DateTime.

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

Hope this helps
Jay
 
N

Nick Malik [Microsoft]

well, you could create multiple regular expression that will parse out the
date/time string: one expression for each format.
Then, when you get a source string, loop through each of your regular
expressions until one of them picks up a date.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
 
B

Brian Mitchell

Thank you very much for the info, this helps me a great deal. That is a
great site for the tuorial, again thanks!!
 
N

Nick Malik [Microsoft]

Hi Jay,

I break down regular expressions for the same reason I break down a
complicated task into multiple calls to different methods: to make it easier
to understand and debug.

It's personal preference, really. A regular expression for matching one
date format is not going to be all that trivial. The OP wants to match
multiple date formats. Unless you are an expert at Regex, and most folks
aren't, it will be fairly easy to make a mistake in one of them.

If all of your regular expressions are combined into one complicated
expression, seperated by 'or' operators, and you make a mistake, it's that
much harder to find and fix the mistake.

I'll take my chances with multiple individual expressions.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
 
J

Jay B. Harlow [MVP - Outlook]

Nick,
I agree, I break down regular expressions, while I am developing them.

However: Once I am comfortable that they work, I then combine them, to
"simplify" the supporting code.

It's personal preference, really.
Is it? My concern is the manual looping you are adding unnecessary
complexity to the code, hence my question. Plus you might be adding possible
performance problems (evaluating multiple RegEx as opposed to a single
complex one). Either method may be causing increased GC pressure. How does
that saying go "robbing peter to pay paul", don't get me wrong, sometimes it
is "better" to write more complex supporting code to simplify the RegEx
verses more complex RegEx to simplify the code...

Also my concern (with both methods) is precedence, which is a problem I my
expression has with 2 & 4 digit years (it actually allows a 3 digit year).
Manually looping over individual expressions may cause an different
expression to be matched then a properly constructed group with alternation
(I am not inferring my expression is properly constructed!).

Also in this instance I would consider something like:

Const pattern1 As String = "a"
Const pattern2 As String = "b"
Const pattern3 As String = "c"

Const pattern As String = pattern1 & "|" & pattern2 & "|" & pattern3

Which easily allows you to define & maintain the patterns separately, then
gain the "simplicity" of combining the RegEx call... I would then structure
my Unit Tests such that I could easily identify if pattern1, pattern2 or
pattern3 was failing or working...
If all of your regular expressions are combined into one complicated
expression, seperated by 'or' operators, and you make a mistake, it's that
much harder to find and fix the mistak
Note: | is the alternation operator not the Or operator... As Or implies
combining (when applied to numbers & boolean), where | does not combine it
provides alternatives!

Just a thought
Jay
 
N

Nick Malik [Microsoft]

Hi Jay,

You are clearly one of the folks that I would describe as "more expert than
I in RegEx."
I agree, I break down regular expressions, while I am developing them.

However: Once I am comfortable that they work, I then combine them, to
"simplify" the supporting code.

Code simplicity is an interesting term. Not sure I agree that combining two
or three (or ten) expressions creates simplicity. The code is certainly
shorter. However, I have no desire to make things simple for the compiler
or the runtime. I want to make things simple for myself and the developer
who will follow me, and have to maintain my code.
Is it? My concern is the manual looping you are adding unnecessary
complexity to the code, hence my question.

In my opinion, a loop is a fairly common construct, and therefore the
complexity of adding a loop is small compared to the complexity of making
the RegEx more difficult for a non-expert to read.
Plus you might be adding possible
performance problems (evaluating multiple RegEx as opposed to a single
complex one).

If RegEx was being used in an inner loop, in a situation where we were
processor bound, I would agree. I haven't run across that situation. I
suppose my answer would become more cautious if I had. That said, RegEx is
pretty efficient.
Either method may be causing increased GC pressure.

Sorry to be thick, but I don't understand why. If I were doing a series of
RegEx matches in a loop, I would create the expressions outside the loop and
simply use them in the loop. A match is as good as a mile. Technically,
that should create the same number of matches.

Also, once again, most of the apps that I've done parsing in aren't tuned
for Garbage Collection. It is nearly always easier to find opportunities to
reduce GC pressure simply by applying StringBuilder where it is useful (the
"80-20" rule).
Also my concern (with both methods) is precedence, which is a problem I my
expression has with 2 & 4 digit years (it actually allows a 3 digit year).
Manually looping over individual expressions may cause an different
expression to be matched then a properly constructed group with alternation
(I am not inferring my expression is properly constructed!).

I completely agree. This is one place where I feel that a loop is better.
You can add in extra logic by structuring the code so that you match your
string against a couple of different patterns, and then YOU can apply a
complex rule to decide which to use... with the RegEx language, you don't
have the right to control precedence in as detailed a way as you can with
logical constructs and business rules.
Also in this instance I would consider something like:

Const pattern1 As String = "a"
Const pattern2 As String = "b"
Const pattern3 As String = "c"

Const pattern As String = pattern1 & "|" & pattern2 & "|" & pattern3

Which easily allows you to define & maintain the patterns separately, then
gain the "simplicity" of combining the RegEx call...

An excellent idea. One thing to consider, though. Each of the patterns
above would need to be tested individually, and the combined pattern would
need to be tested as well. If you do one, and not the other, it is possible
for a small syntax error in two patterns to balance eachother out, allowing
the final construct to be legal, valid, and wrong.

This adds to the testing burden a bit. Not much, perhaps, but still a bit.
The unit tests that you describe should still cover it, as long as they look
for boundary conditions effectively.
Note: | is the alternation operator not the Or operator... As Or implies
combining (when applied to numbers & boolean), where | does not combine it
provides alternatives!

I stand corrected.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top