Extracting from string

P

Phil Trumpy

I'm using Excel 2007 - Data is in column A

I have a data set that I am trying to import. The first column really
should be 5 separate columns, but has been created as one string. I need to
separate into 5 columns. the data looks like this:

LastName, FirstName MI. Position Team

LastName and FirstName can be ulimited characters
MI can be 1 or 2 characters followed by a period
Position is 1 or 2 characters
Team is 3 characters

I attempted to use Text to Columns, but the problem is that the middile
initial does not always exist. If I can get the middle initial to a separate
column, I can use text to columns for the rest. I would prefer to just run
one macro to take care of the whole thing since I will have a different file
each year that has the same data structure. Sorry if I didn't provide enough
info. Thanks in advance.
 
G

Gary''s Student

This macro assumes that the fields are separated by a single space. The
macro does the equivalent to Text-to-Columns. It splits the data up into
either 5 or 4 parts.

If there are only four parts, then the middle initial is missing and the
data is placed into positions 1,2,4,5.

If there are five parts, then the middle iniial is present and the data is
placed into positions 1,2,3,4,5:


Sub SplitUm()
Sheets("Sheet1").Activate
Set w2 = Sheets("Sheet2")
n = Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To n
s = Split(Cells(i, 1).Value)
u = UBound(s) + 1
w2.Cells(i, 1).Value = Replace(s(0), ",", "")
w2.Cells(i, 2).Value = s(1)
w2.Cells(i, 5) = s(u - 1)
w2.Cells(i, 4) = s(u - 2)
If u = 5 Then
w2.Cells(i, 3) = s(2)
End If
Next
End Sub
 
P

Phil Trumpy

Thank you very much. This works almost perfectly. However, there seem to be
a few records that did not get extracted correctly. Also, I did not account
for a LastName that could have a space in it (De Aza in this case). Here is
an example of one of the records that didn't seem to work:

Before
Col A
Alou, Moises RF NYM

After
Col A Col B Col C Col D
Alou Moises RF NYM

Any ideas?
 
G

Gary''s Student

I was able to duplicate your problem with the "Alou" record.

There is an extra space after NYM.

You can use the TRIM() function in the worksheet to remove these little pests.
 
P

Phil Trumpy

Thanks again. I should have thought of that. I'll add that at the beginning
of the macro and give it another shot.
 
P

Phil Trumpy

Gary''s Student - the trim function took care of it wonderfully. Now I just
have to figure out how to handle last names with a space in them. I think I
should be able to figure that out. Thank you again.
 
E

eliano

Hi Phil.
Awaiting Mr. Gary, you can try:

Sub SeparaAnagrafe()
Dim Riga, Uriga, I
Dim Nome, Pos, XA1, XA2
Riga = 1
Uriga = Cells(Riga, 1).End(xlDown).Row
For I = Riga To Uriga
Nome = Cells(I, 1)
Pos = InStrRev(Nome, " ")
XA1 = Left(Nome, Pos - 1)
XA2 = Mid(Nome, Pos + 1)
Cells(I, 2) = XA1
Cells(I, 3) = XA2
Cells(I, 4) = XA2 & " " & XA1
Next
End Sub

You can test the macro using: Di Maggio Joe in A1 and De Curtis
Antonio in A2
and vary the sub in accordance with what you want.
Saluti
Eliano
 
R

Ron Rosenfeld

I'm using Excel 2007 - Data is in column A

I have a data set that I am trying to import. The first column really
should be 5 separate columns, but has been created as one string. I need to
separate into 5 columns. the data looks like this:

LastName, FirstName MI. Position Team

LastName and FirstName can be ulimited characters
MI can be 1 or 2 characters followed by a period
Position is 1 or 2 characters
Team is 3 characters

I attempted to use Text to Columns, but the problem is that the middile
initial does not always exist. If I can get the middle initial to a separate
column, I can use text to columns for the rest. I would prefer to just run
one macro to take care of the whole thing since I will have a different file
each year that has the same data structure. Sorry if I didn't provide enough
info. Thanks in advance.

This routine seems to meet your specifications.

It WILL return a last name containing <spaces> as it uses the <comma> for the
delimiter.

The MI is optional, but it does require that the MI follow FirstName and be one
or two letters followed by a <dot>.

===================================
Option Explicit
Sub ParseString()
Dim c As Range, rg As Range
Dim Str As String
Dim re As Object, mc As Object, m As Object
Dim i As Long
Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = "(^\s*[^,]+),\s+(\S+)\s*([A-Z]{1,2}\.)?\s+(\S+)\s+(\S+)"

'set up range to parse
Set rg = Selection
'check that it is only a single column
If rg.Columns.Count <> 1 Then
MsgBox ("Can only select a single column")
Exit Sub
End If
'one could expand selection to current column

For Each c In rg
Str = c.Value
Range(c(1, 2), c(1, 6)).ClearContents
If re.test(Str) Then
Set mc = re.Execute(Str)
For Each m In mc
If m.SubMatches.Count > 0 Then
For i = 1 To m.SubMatches.Count
c.Offset(0, i).Value = m.SubMatches(i - 1)
Next i
End If
Next m
End If
Next c

End Sub
==================================
--ron
 
P

Phil Trumpy

Thank you to everyone that helped with this. I now have it working just as I
needed it to. I appreciate all the suggestions. Now, I will have to read up
on some of these functions and learn more about what is happening. That's my
favorite part of these discussions - all the knowledge to be gained. Thanks
again!

Ron Rosenfeld said:
I'm using Excel 2007 - Data is in column A

I have a data set that I am trying to import. The first column really
should be 5 separate columns, but has been created as one string. I need to
separate into 5 columns. the data looks like this:

LastName, FirstName MI. Position Team

LastName and FirstName can be ulimited characters
MI can be 1 or 2 characters followed by a period
Position is 1 or 2 characters
Team is 3 characters

I attempted to use Text to Columns, but the problem is that the middile
initial does not always exist. If I can get the middle initial to a separate
column, I can use text to columns for the rest. I would prefer to just run
one macro to take care of the whole thing since I will have a different file
each year that has the same data structure. Sorry if I didn't provide enough
info. Thanks in advance.

This routine seems to meet your specifications.

It WILL return a last name containing <spaces> as it uses the <comma> for the
delimiter.

The MI is optional, but it does require that the MI follow FirstName and be one
or two letters followed by a <dot>.

===================================
Option Explicit
Sub ParseString()
Dim c As Range, rg As Range
Dim Str As String
Dim re As Object, mc As Object, m As Object
Dim i As Long
Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = "(^\s*[^,]+),\s+(\S+)\s*([A-Z]{1,2}\.)?\s+(\S+)\s+(\S+)"

'set up range to parse
Set rg = Selection
'check that it is only a single column
If rg.Columns.Count <> 1 Then
MsgBox ("Can only select a single column")
Exit Sub
End If
'one could expand selection to current column

For Each c In rg
Str = c.Value
Range(c(1, 2), c(1, 6)).ClearContents
If re.test(Str) Then
Set mc = re.Execute(Str)
For Each m In mc
If m.SubMatches.Count > 0 Then
For i = 1 To m.SubMatches.Count
c.Offset(0, i).Value = m.SubMatches(i - 1)
Next i
End If
Next m
End If
Next c

End Sub
==================================
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top