extract date from text

  • Thread starter Thread starter Woodi2
  • Start date Start date
W

Woodi2

I have the following text string in cell A1. Flight Schedule - Friday 28th
November 2008.

How can I extract the date from the text string and paste it in cell A2.
Thanks
 
I'd use a VBA function:

Function GetDate(DS As String) As Date
Dim N As Long
Dim S As String
N = InStr(1, DS, " - ", vbBinaryCompare)
S = Mid(DS, N + 3)
N = InStr(1, S, " ", vbBinaryCompare)
S = Mid(S, N + 1)
S = Replace( _
Replace( _
Replace( _
Replace(S, "nd", vbNullString), _
"rd", vbNullString), _
"th", vbNullString), _
"st", vbNullString)

GetDate = DateValue(S)
End Function

Then call this from cell A2 as
=GetDate(A1)

Cordially,
Chip Pearson
Microsoft MVP
Excel Product Group
Pearson Software Consulting, LLC
www.cpearson.com
(email on web site)
 
=DATEVALUE(RIGHT(SUBSTITUTE(A1,"th",""),16))

I'm assuming that the date in A1 can vary, so hard coding the "th" and
the 16 won't work. A1 can have different ordinal suffices and month
names different in length than "November".

Cordially,
Chip Pearson
Microsoft MVP
Excel Product Group
Pearson Software Consulting, LLC
www.cpearson.com
(email on web site)
 
A little bit more compact function...

Function GetDate(DS As String) As Date
Dim Parts() As String
Parts = Split(Trim(Mid(DS, InStrRev(DS, "-") + 1)), " ")
GetDate = CDate(Val(Parts(1)) & " " & Parts(2) & " " & Val(Parts(3)))
End Function
 
This formula will do what you want (providing there is always a space after
the dash and there is no period after the year)...

=--SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(MID(A1,
FIND(" ",A1,FIND("-",A1)+2),LEN(A1))),"rd",""),"nd",""),"st","")
 
Hi,

You can also use the following array formula (Ctrl+Shift+Enter). I have
assumed that A14 holds Flight Schedule - Friday 28th November 2008

=MID(A14,MATCH(TRUE,ISNUMBER(1*MID(A14,ROW($1:$100),1)),0),50)

--
Regards,

Ashish Mathur
Microsoft Excel MVP
www.ashishmathur.com
 
Rick Rothstein said:
This formula will do what you want (providing there is always a space after
the dash and there is no period after the year)...

=--SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(MID(A1,
FIND(" ",A1,FIND("-",A1)+2),LEN(A1))),"rd",""),"nd",""),"st","")
....

Your formula fails to delete the ordinal suffix "th", which,
inconveniently, appears in the OP's example data, so your formula
fails for the OP's sample data.

Probably best to use regular expressions to do this. For example,
using Laurent Longre's MOREFUNC.XLL add-in, you could use the formula

=--REGEX.SUBSTITUTE(A1,"^[^-]*-\s*\S+\s+(\d+)[^ 0-9]*\s+(\S+)\s+(\d+).*
$","[1]-[2]-[3]")
 
I'd use a VBA function:

Function GetDate(DS As String) As Date
Dim N As Long
Dim S As String
N = InStr(1, DS, " - ", vbBinaryCompare)
S = Mid(DS, N + 3)
N = InStr(1, S, " ", vbBinaryCompare)
S = Mid(S, N + 1)
S = Replace( _
Replace( _
Replace( _
Replace(S, "nd", vbNullString), _
"rd", vbNullString), _
"th", vbNullString), _
"st", vbNullString)

GetDate = DateValue(S)
End Function

Then call this from cell A2 as
=GetDate(A1)

Cordially,
Chip Pearson
Microsoft MVP
Excel Product Group
Pearson Software Consulting, LLC
www.cpearson.com
(email on web site)

Chip,

You also need to replace the "." at the end of the string, if you are going to
use this approach.
--ron
 
I have the following text string in cell A1. Flight Schedule - Friday 28th
November 2008.

How can I extract the date from the text string and paste it in cell A2.
Thanks

All of the functional examples assume that there is a hyphen separating the
date from the preceding text.

Will that always be the case?

And will there only be the one hyphen?
--ron
 
LOL... I was concentrating so on making sure I caught the 'st', 'nd' and
'rd' that I totally overlooked the mostly occurring 'th' suffix. Thanks for
catching/noting that.

For those still desiring to implement a formula of the form I originally
posted, here is the corrected formula...

=--SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(MID(
A1,FIND(" ",A1,FIND("-",A1)+2),LEN(A1))),"rd",""),"nd",""),"st",""),"th","")

--
Rick (MVP - Excel)


Harlan Grove said:
Rick Rothstein said:
This formula will do what you want (providing there is always a space
after
the dash and there is no period after the year)...

=--SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(MID(A1,
FIND(" ",A1,FIND("-",A1)+2),LEN(A1))),"rd",""),"nd",""),"st","")
...

Your formula fails to delete the ordinal suffix "th", which,
inconveniently, appears in the OP's example data, so your formula
fails for the OP's sample data.

Probably best to use regular expressions to do this. For example,
using Laurent Longre's MOREFUNC.XLL add-in, you could use the formula

=--REGEX.SUBSTITUTE(A1,"^[^-]*-\s*\S+\s+(\d+)[^ 0-9]*\s+(\S+)\s+(\d+).*
$","[1]-[2]-[3]")
 
Ashish Mathur said:
You can also use the following array formula (Ctrl+Shift+Enter).  I have
assumed that A14 holds Flight Schedule - Friday 28th November 2008

=MID(A14,MATCH(TRUE,ISNUMBER(1*MID(A14,ROW($1:$100),1)),0),50)
....

If you're going to assume that all the OP's records end with the date
and the named day of week always precedes the date, you could extract
the date substring with a shorter formula.

=MID(x,FIND("day ",x)+4,32)

But your formula has one big potential flaw - the MATCH call locates
the leftmost numeral. In the OP's only sample record that happens to
be the beginning of the date substring, but the OP's actual data may
contain nondate numerals to the left of the date substring, e.g.,

JFK to LAX Flight 999 - Tuesday 2nd Dec 2008
 
Ron Rosenfeld said:
All of the functional examples assume that there is a hyphen separating the
date from the preceding text.  

Will that always be the case?  

And will there only be the one hyphen?

So assume a more general approach is needed. Easiest using regular
expressions.

=REGEX.SUBSTITUTE(A1,
".*?\b(\d{1,2})(st|nd|rd|th)?\s+((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|
Oct|Nov|Dec)\S*)\s+(\d{2,4})\b.*",
"[1] [3] [5]")
 
Ron Rosenfeld said:
All of the functional examples assume that there is a hyphen separating the
date from the preceding text.  

Will that always be the case?  

And will there only be the one hyphen?

So assume a more general approach is needed. Easiest using regular
expressions.

=REGEX.SUBSTITUTE(A1,
".*?\b(\d{1,2})(st|nd|rd|th)?\s+((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|
Oct|Nov|Dec)\S*)\s+(\d{2,4})\b.*",
"[1] [3] [5]")

I agree -- I came up with a similar regex:

..*?(\b\d{1,2})\D*?\s(\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)\s+(\d{4}).*

--ron
 
Ron Rosenfeld said:
....
I agree -- I came up with a similar regex:

.*?(\b\d{1,2})\D*?\s(\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)
\s+(\d{4}).*

Minor quibble about \D*? vs (st|nd|rd|th)?. However, \d{4} for the
year term doesn't allow for 2-digit years. Works for OP's single
sample record, but it's not very general.
 
Ron Rosenfeld said:
...
I agree -- I came up with a similar regex:

.*?(\b\d{1,2})\D*?\s(\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)
\s+(\d{4}).*

Minor quibble about \D*? vs (st|nd|rd|th)?. However, \d{4} for the
year term doesn't allow for 2-digit years. Works for OP's single
sample record, but it's not very general.

I was wondering if you'd note that.

I wondered about your ...|Dec)\S* in favor of ...|Dec)[a-z]* as a valid month
name should only have letters following.

And also, given how years are usually expressed, I'd prefer: (\d{2}|\d{4})
--ron
 
Ron Rosenfeld said:
I wondered about your ...|Dec)\S* in favor of ...|Dec)[a-z]* as a valid month
name should only have letters following.

Consider 1st Dec. 2008. Sometimes abbreviations include terminating
periods. If I were going to be really fastidious,

(Jan(uary|\.)?|Feb(ruary|\.)?| . . . |Dec(ember|\.)?)
And also, given how years are usually expressed, I'd prefer:   (\d{2}|\d{4})

Believe it'd be more efficient to express that as (\d{2}){1,2}.
 
Ron Rosenfeld said:
I wondered about your ...|Dec)\S* in favor of ...|Dec)[a-z]* as a valid month
name should only have letters following.

Consider 1st Dec. 2008. Sometimes abbreviations include terminating
periods. If I were going to be really fastidious,

(Jan(uary|\.)?|Feb(ruary|\.)?| . . . |Dec(ember|\.)?)

Good point.

I suppose one could also use

\s([0]\d|\d|[12]\d|3[01])\D

:-)

Believe it'd be more efficient to express that as (\d{2}){1,2}.

I would have agreed. But, in doing some testing with a regex tool that has a
debugging option (RegexBuddy by JGSoft), I discovered two things.

1. Mine should be (\d{4}|\d{2})
2. For some reason, yours seems to take more steps to complete the match.

Given 2008, mine (modified as above) matches in 2 steps, whereas yours seems to
require 4 steps.
Given 08, both require 3 steps
--ron
 
Ron Rosenfeld said:
I suppose one could also use

\s([0]\d|\d|[12]\d|3[01])\D

Not specific enough - would allow 0 and 00 as valid day numbers.
Better to use
0?[1-9] in place of 0\d|\d .
1.  Mine should be (\d{4}|\d{2})
....

What about \d\d(\d\d)?
 
Ron Rosenfeld said:
I suppose one could also use

\s([0]\d|\d|[12]\d|3[01])\D

Not specific enough - would allow 0 and 00 as valid day numbers.
Better to use
0?[1-9] in place of 0\d|\d .
Agree

1.  Mine should be (\d{4}|\d{2})
...

What about \d\d(\d\d)?

According to the RegexBuddy debugger, that takes 4 steps to match 08 and 5
steps to match 2008
--ron
 
Back
Top