Parsing a Text file into worksheet with VBA

M

mwebb415

Hi,

I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:

Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
.. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
.. . .

The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.

Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24

I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.

I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?

The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.

Thanks

Matt
 
N

NickHK

Hard to say much as your 2 example have the same layout.
But if you text file really is a mess, maybe you need to improve its quality
from the start, with however you get the data out of pdf.

But assuming the samples are what correct, here's one way you could try:
- Read the whole text file into a string with 'Open TextFile.txt for input"
etc.
- Remove all vbCrLF (or the new line markers) with Replace
- Split on "Header line"
- Split each elememt above on ":".
- Take the even index elements and Trim()

NickHK
 
M

mwebb415

Looks like I can't split on anything other than a single character - is
that true?

I did find one character that I can split on, finally - though it's
still not 100% correct.

Now my issue is that some of the fields are repeated, and my parse is
not picking up the duplicates.

I'm using instr to find the start of the delimiters and then mid to get
the data between the end of the first delimiter and the start of the
second delimiter. Anyway to find the 2nd, 3rd, 4th, etc. occurrence in
a string?

Thanks
 
N

NickHK

Did you try it with something like :
Const TestStr As String =
"ahefgDELIMahefgDELIMahefgDELIMahefgDELIMahefgDELIMahefgDELIMahefg"

You didn't post any code, so hard to tell what you are doing, but the
"Header line" and ":" are suitable delimiters.

NickHK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top