Transpose rows to columns w/varying numbers of lines per record

G

Guest

I have a text-file address listing with a couple of thousand entries of
varying number of lines--some have 3 lines:

Name
Address
City, State Zip

and others have 4:

Name
Etc (like "c/o" or contact name--no particular pattern to this)
Address
City, State Zip

There are no blank lines between records. I don't have any control over the
format of the text file, as it's an extract from another program and I can't
change the way the data is extracted.

I can't use Excel's transpose feature to get a useful spreadsheet because
the records aren't consistent in the number of lines per record. Is there a
way to convert this data to a spreadsheet with 4 columns per record (with a
blank column for the 3-line records)?
 
G

Guest

You could use a VBA routine, but only if there is some discernible way to
tell when one address ends and the next begins. So...is there a ZIP at the
very end of EVERY SINGLE ADDRESS? Is it always a ZIP or is it sometimes a
ZIP+4? Any non-US addresses using a nonnumeric postal code?

Tell people here how to objectively determine when to end one address and
start another, and you'll probably get some valuable help
 
G

Guest

If you need help with where to put VBA code, look at this site

http://www.contextures.com/xlvba01.html
or
http://www.cpearson.com/excel/codemods.htm


Try this code:

Sub ParseAddresses()
Dim rngSrc As Range
Dim rngTgt As Range
Dim intLineNum As Integer

' change the next line to reflect the first cell of your data
Set rngSrc = Range("A1")

Set rngTgt = Range("C1")

intLineNum = 0
Do While True
If Len(rngSrc.Text) = 0 Then Exit Sub

If Not IsNumeric(Right(rngSrc.Text, 4)) Then
' part of the body of the address
rngTgt.Offset(0, intLineNum).Value = rngSrc.Text
intLineNum = intLineNum + 1
Set rngSrc = rngSrc.Offset(1, 0)
Else
rngTgt.Offset(0, 3).Value = rngSrc.Text
intLineNum = 0
Set rngSrc = rngSrc.Offset(1, 0)
Set rngTgt = rngTgt.Offset(1, 0)
' it's PRESUMABLY a zip code & the final address line
End If
Loop

End Sub
 
G

Guest

It successfully placed A1 in C1, A2 in D1, and A3 in E1 (the first record is
a 3-line record) but then placed A4 in F1, A5 in G1 and A6 in H1 (the second
record is also a 3-line record). It ran into the 256-column limit,
apparently, and returned runtime error 1004 for application-defined or
object-defined error. It's definitely on the right track, though. The
records referred to in the 256 columns do include both 3- and 4-line records,
just no new row for each record.

Would it be easier to write a piece of code that would insert a blank line
after each 3-line record, and then use transpose to get the 4 column list?
 
G

Guest

The code checked the last for characters of each line to see if they were
numeric. If so it assumed it was a Zip code and that line marked the end of
the address.

What you describe sounds as if it never found a single ZIP, making me wonder
if the lines are padded with spaces to the right. Select one of the lines
that ends witha ZIP code, press the F2 key, and see if the cursor appears off
to the right of the last character. If so, it's padded with spaces,a nd a
slight modification to the code should fix it.

Change this line

If Not IsNumeric(Right(rngSrc.Text, 4)) Then

to

If Not IsNumeric(Right(Trim(rngSrc.Text), 4)) Then

and then re-run it to see what happens
 
G

Guest

Thanks, Duke; much closer. There was much padding, and trim stripped that
out.

There are some lines that have numeric characters at the end, but which
aren't the city/state/zip. As a result, they are identified as belonging in
the last column.

Addresses with PO boxes have numeric characters at the end. All PO box
addresses in the file are formatted with "P.O. BOX " (with periods, all caps,
space after the "X") as the first characters for that line and then a box
number, the length of which can vary (ex: P.O. BOX 3, P.O. BOX 34, P.O. BOX
586, etc.).

Addresses on highways have numeric characters at the end. All highway
addresses in the file are formatted with " HWY " following an address number,
the length of which can vary, and preceding a highway number, the length of
which can vary (ex: 123 HWY 173, 15435 HWY 10, 56 HWY 4995, etc.).

As far as I can see, these are the only variations that need to be dealt
with--I didn't see any other kinds of addresses with numbers at the end.

Finally, the last line of each address always has a text string of varying
length (the city name), then a comma, a space, a two-character alpha state
code, another space, and then the zip or zip+4 (ex: PORTLAND, OR 97202 or
REDONDO BEACH, CA 90277).

Do you see any way that these text strings can be identified to put the
C/S/Z line as the last field?

Much thanks for your help--I really appreciate it.
 
G

Guest

Well, I've changed it so it WILL NOT treat lines containing P.O. Box or HWY
as the end of the address. Should get you closer

Sub ParseAddresses()
Dim rngSrc As Range
Dim rngTgt As Range
Dim intLineNum As Integer
Dim strAddress As String

' change the next line to reflect the first cell of your data
Set rngSrc = Range("A1")

Set rngTgt = Range("C1")

intLineNum = 0
Do While True
If Len(rngSrc.Text) = 0 Then Exit Sub
strAddress = Trim(rngSrc.Text)
If IsNumeric(Right(strAddress, 4)) Then
If (InStr(1, strAddress, "P.O. Box", vbTextCompare) > 1 _
Or InStr(1, strAddress, "HWY", vbTextCompare) > 1) Then
' part of the body of the address
rngTgt.Offset(0, intLineNum).Value = rngSrc.Text
intLineNum = intLineNum + 1
Set rngSrc = rngSrc.Offset(1, 0)
Else
' it's PRESUMABLY a zip code & the final address line
rngTgt.Offset(0, 3).Value = strAddress
intLineNum = 0
Set rngSrc = rngSrc.Offset(1, 0)
Set rngTgt = rngTgt.Offset(1, 0)
End If
Else
' part of the body of the address
rngTgt.Offset(0, intLineNum).Value = rngSrc.Text
intLineNum = intLineNum + 1
Set rngSrc = rngSrc.Offset(1, 0)
End If
Loop

End Sub
 
G

Guest

Thanks, Duke; I noticed that some entries seem to handle the PO box and
highway addresses correctly, and some still put them in the last column. As
far as I can tell, if an address has a 2-digit PO box, it places that
information in the 2nd column (or 3rd, if there's an "attn" entry in the
2ndposition) and correctly places the city/state/zip in the 4th column. If
it has a 3- or more digit box number, it puts that information in the 4th
column with the city/state/zip in the 4th column on the next line.

For the highway addresses, they all seem to go correctly into the 2nd (or
3rd) position, and none of them show up in the 4th column. It looks like
they were all 2-digit highway numbers (there was one that was 12345 HWY 99 #2
that also worked properly).

I changed

If IsNumeric(Right(strAddress, 4)) Then

to

If IsNumeric(Right(strAddress, 5)) Then

and this resulted in the 3-digit box addresses working properly, but there
are some 6-digit box numbers that still get put in the last column. I'm
going to futz with it a bit, as I think I might be on the right track.

Duke, thanks so much for all of your help. My one term of beginning VB sure
didn't get me where I need to be for this task!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top