Extracting time from strings

K

Ken McLennan

G'day there One & All once again,

I've got a small problem in what was supposed to be a quick
worksheet to manipulate a shift roster. I've managed to get most of my
ideas to operate in the real world and it was kind of coming together
until tonight.

My problem is that I'm dealing with about 50 different rosters
that are composed by 50 different people with no consistent format.
Hence I might find a 9am to 5pm shift entered as "9-5", "9a - 5p", "9A-
5P", or any other variant you may like to think of. There may also be
initials or symbols at beginning or end which signify various
reminders/notes/whatever. So there might well be something like "CT 9-
5", or "#9a- 5p!". The only consistency is the hyphen, which makes the
use of the Split function fairly simple. Many (read almost all) of the
entries are not validated and even the spaces may be omitted.

After splitting at the hyphen I've then tried stepping through the
resulting left & right hand strings, and removed all except the numeric
characters. Once again, that's fairly simple. Where my brain is starting
to hurt is in trying to determine the shift times in 24 hr format. What
I want to do is to nominate a time, say 14:30, and then step through all
the sheets and determine who is actually working at this time. That is,
if the nominated time is between the shift start & finish times then
list that entry.

The sheet entries are of this form:

Tuesday Wednesday --> any number of further columns.
01/01/05 02/01/05

6a-2p Trn 8a - 4p
8 -4 2p-10p !
2 - 10 2 - 10
!6p-2 Off

etc

Some rosters are 14 day, others are 28 although all will start on
the same day of the week.

I simply step through the sheets, and search for the nominated
date. I then work my way down the column until I reach the bottom row.
If the nominated time falls within the shift then I use that row to
extract the employees name, position and section from the first columns.

At least, that's the theory. I've come unstuck at how to parse the
strings to extract the times. It's not possible for me to enforce
uniformity on the manner of entry so I'm stuck with what are almost
random strings.

Google searches haven't produced the results I'd have liked. I've
found a squillion ways to extract time and or date in various formats,
but not from strings such as this!!

Can someone please offer a way me to extract what I need?
Conversion between data types isn't a problem. The issue is determining
the data to begin with. Oh, I can't install any dlls that will allow
regex searching either. Nor do I fully understand how to use such a
beast, but I saw it in the results from my searching.

Thanks for listening,
Ken McLennan
Qld Australia
 
G

Guest

P.S. You'd obviously have to concatenate the A and B values to make a start
time in the format HH:MM and do the same for C and D (or wherever your hours
and minutes happen to be)

Regards

Pete
 
G

Guest

Ken,

Wouldn't you do better to concentrate on forcing your users to enter times
in the correct format, rather than trying to process everything that they
might enter?

For example, couldn't you use dropdown lists within data validation that
only allowed in (say) column A a number between 00 and 23 (hours) and in
(say) column B only numbers between 0 and 59 (minutes). These columns could
thus contain your start times and you could use a similar concept in columns
C and D for end times.

Then, depending on your search time, you could to see if it was greater than
the start and end times for each employee.

I may be barking up the wrong tree here, but then again, it might help.

Regards

Pete
 
G

Guest

Ken,

From the length of your post, I can see that you have spent a lot of time
and have probably imagined many kinds of technical answers to your problem.
I suggest you look at it from another angle, and break down the problem in
stages.

The consistent issue you will encounter is ambiguity in the input - for
example "2-10" can be either 2am thru 10pm, or 2pm thru 10pm, thus you will
always need a manual decision at some point, if that is all the input you
have.

I suggest the following staged approach:
1. Create new worksheets, then with a macro procedure, read the input and
automatically transform the input data to structured format. For example,
sheet Input contains your input data, sheet WorkArea1 contains the same Input
sheet but the columns are structured, and reformat as best as you can, eg,
"2-10" become "02AM-10AM".

Most of the code you have spelt out below will be put to good use here.

2. Because of ambiguities as mentioned, you will also have another sheet
WorkArea2 that will be similar to WorkArea1 but will contain an alternative
structured interpretation of the input data, eg, "2-10" becomes "02AM-10PM"

3. You will have as many WorkArea sheets as you think there are possible
interpretations of the input data. If at some point in the future, there is
a new interpretation, then you can simply add another WorkArea sheet.

4. Then, for each row, you can easily loop thru each WorkArea sheet and if
one interpretation succeeds then you can display that, along with the
original data input.

5. Your report will simply show a list of potential employees, that you can
visually check for final validation.

This staged approach has the main advantage of being correct as you write
your code, resolves the ambiguities at the last moment, and can be improved
over time.

Regards,
Antonio Elinon
Sydney, Australia
 
R

Ron Rosenfeld

G'day there One & All once again,

I've got a small problem in what was supposed to be a quick
worksheet to manipulate a shift roster. I've managed to get most of my
ideas to operate in the real world and it was kind of coming together
until tonight.

My problem is that I'm dealing with about 50 different rosters
that are composed by 50 different people with no consistent format.
Hence I might find a 9am to 5pm shift entered as "9-5", "9a - 5p", "9A-
5P", or any other variant you may like to think of. There may also be
initials or symbols at beginning or end which signify various
reminders/notes/whatever. So there might well be something like "CT 9-
5", or "#9a- 5p!". The only consistency is the hyphen, which makes the
use of the Split function fairly simple. Many (read almost all) of the
entries are not validated and even the spaces may be omitted.

After splitting at the hyphen I've then tried stepping through the
resulting left & right hand strings, and removed all except the numeric
characters. Once again, that's fairly simple. Where my brain is starting
to hurt is in trying to determine the shift times in 24 hr format. What
I want to do is to nominate a time, say 14:30, and then step through all
the sheets and determine who is actually working at this time. That is,
if the nominated time is between the shift start & finish times then
list that entry.

The sheet entries are of this form:

Tuesday Wednesday --> any number of further columns.
01/01/05 02/01/05

6a-2p Trn 8a - 4p
8 -4 2p-10p !
2 - 10 2 - 10
!6p-2 Off

etc

Some rosters are 14 day, others are 28 although all will start on
the same day of the week.

I simply step through the sheets, and search for the nominated
date. I then work my way down the column until I reach the bottom row.
If the nominated time falls within the shift then I use that row to
extract the employees name, position and section from the first columns.

At least, that's the theory. I've come unstuck at how to parse the
strings to extract the times. It's not possible for me to enforce
uniformity on the manner of entry so I'm stuck with what are almost
random strings.

Google searches haven't produced the results I'd have liked. I've
found a squillion ways to extract time and or date in various formats,
but not from strings such as this!!

Can someone please offer a way me to extract what I need?
Conversion between data types isn't a problem. The issue is determining
the data to begin with. Oh, I can't install any dlls that will allow
regex searching either. Nor do I fully understand how to use such a
beast, but I saw it in the results from my searching.

Thanks for listening,
Ken McLennan
Qld Australia

Extracting the numbers is simple using regular expressions.

Your strings appear to be consistent in that there are only two sets of numbers
-- one representing a start time and one representing an end time.

The problem you will run into, given what you have posted so far, is
determining AM vs PM.

You have one entry '10-2'.

Since your goal is to enter a time and see who is working at that time, how do
you know if this represents 10AM-2PM vs 10PM-2AM ?

A similar argument could be made with other, inconsistent entries.

In any event, to extract the numbers simply, download and install Longre's free
morefunc.xll add-in from: http://xcell05.free.fr/


First Time: =REGEX.MID(A3,"\d+")
Second Time: =REGEX.MID(A3,"\d+",2)

or the same with the letter suffix (if present):

First: =REGEX.MID(A3,"\d+[APap]?")
Second: =REGEX.MID($A3,"\d+[APap]?",2)

The above expressions seem to handle the variations you posted. If you have
other variations, post them and we can handle them.

These formulas will also work in VBA:

For example, given your data:

================================
Option Explicit
Sub ExtractTimes()
Dim T(7) As String
Dim i As Long
Dim s1 As String, s2 As String

T(0) = " 6a-2p"
T(1) = " Trn 8a - 4p"
T(2) = "8 -4"
T(3) = "2p-10p !"
T(4) = "2 - 10"
T(5) = "2 - 10"
T(6) = "!6p-2"
T(7) = "Off"

For i = 0 To UBound(T)
s1 = Run([regex.mid], T(i), "\d+[APap]?")
s2 = Run([regex.mid], T(i), "\d+[APap]?", 2)

Debug.Print s1 & "....." & s2
Next i
=========================

6a.....2p
8a.....4p
8.....4
2p.....10p
2.....10
2.....10
6p.....2
......
--------------------------


You still have the problem of resolving the inherent ambiguity in some of the
entries.
--ron
 
K

Ken McLennan

G'day there Pete, Antonio & Ron,



Please excuse the delay. This is the fourth time that I've tried
to respond to you on this group and each time I've had my messages come
back with a "SpamCop" message. I've checked with my ISP and they don't
use SpamCop so somewhere along the chain I'm getting blocked.

I'll try once more to say "Thankyou" to you all for your
suggestions & advice. I've considered what you all said, and I'm
currently working on the problem still.

Unfortunately I'm not in a position to enforce validation or any
kind of uniformity. Nor am I able to distribute an add-on as per the RE
suggestion (although I've downloaded it for my own use. It looks pretty
good to me).

I thought I had the matter resolved, but discovered there was one
situation which gave me a false positive. As fate would have it, it
occurs in about 1 in 3 entries so I missed detecting over 30% of what I
wanted <g>. I'm working on that bit now =)

Thanks once again for your advice and assistance. I really do appreciate
it.

See ya
Ken McLennan
Qld Australia
 
R

Ron Rosenfeld

G'day there Pete, Antonio & Ron,
Unfortunately I'm not in a position to enforce validation or any
kind of uniformity. Nor am I able to distribute an add-on as per the RE
suggestion (although I've downloaded it for my own use. It looks pretty
good to me).

Is the distribution of the add-on forbidden by corporate policy, or is it a
matter of the difficulty?

If the latter, and if you installed morefunc with all the options, one of those
options adds a Morefunc menu item to the Tools menu. This then gives you the
option to include Morefunc with the workbook, so there would be no effort at
all in distributing it.

I thought I had the matter resolved, but discovered there was one
situation which gave me a false positive. As fate would have it, it
occurs in about 1 in 3 entries so I missed detecting over 30% of what I
wanted <g>. I'm working on that bit now =)

Thanks once again for your advice and assistance. I really do appreciate
it.

See ya
Ken McLennan
Qld Australia

--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top