Data Parsing Problem

G

Guest

I need to parse text data copied onto an Excel worksheet from a big table in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order. The
problem is that the company name can be anything from one word to five words
separated by a space, and that messes up the parsing into columns. Using the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column. To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.
 
T

Tom Ogilvy

This worked with your sample date:

Option Explicit
Sub fixColumns()
Dim rng As Range
Dim rng1 As Range
Dim rng2 As Range
Dim cell As Range, sStr As String
Range("A1").CurrentRegion.Columns(1).TextToColumns _
Destination:=Range("A1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, _
ConsecutiveDelimiter:=True, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=True, _
Other:=False, FieldInfo:=Array( _
Array(1, 1), Array(2, 1), Array(3, 1), _
Array(4, 1), Array(5, 1), Array(6, 1), _
Array(7, 1), Array(8, 1), Array(9, 1), _
Array(10, 1)), TrailingMinusNumbers:=True
Rows(1).Insert
Range("A1:H1").Value = Array("H1", "H2", _
"H3", "H4", "H5", "H6", "H7", "H8")
ActiveSheet.AutoFilterMode = False
Set rng = Intersect(ActiveSheet.Range("A1") _
.CurrentRegion.EntireRow, Columns(8))
rng.AutoFilter Field:=1, Criteria1:="<>"
Set rng2 = rng.Offset(1, -5).Resize(rng.Rows.Count - 1, 1)
On Error Resume Next
Set rng1 = rng2.SpecialCells(xlVisible)
On Error GoTo 0
Do While Not rng1 Is Nothing
For Each cell In rng1
sStr = cell.Value & " " & cell.Offset(0, 1).Value
cell.Value = sStr
cell.Offset(0, 1).Delete Shift:=xlShiftToLeft
Next
On Error Resume Next
Set rng1 = Nothing
ActiveSheet.AutoFilterMode = False
rng.AutoFilter Field:=1, Criteria1:="<>"
Set rng1 = rng2.SpecialCells(xlVisible)
On Error GoTo 0
Debug.Print rng.Address, rng2.Address
Loop
Rows(1).Delete
MsgBox "No data in column 8 - done"

End Sub
 
R

Ron Rosenfeld

I need to parse text data copied onto an Excel worksheet from a big table in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order. The
problem is that the company name can be anything from one word to five words
separated by a space, and that messes up the parsing into columns. Using the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column. To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.

In VBA code, you can do the following. I assumed you would define the range to
be parsed by selecting the cells in the column, but you could use other
methods. I also assumed there were always four pieces of data following the
stock name. (Shares, Price, Market Cap, Yield)

===============================================
Option Explicit
Sub StockData()
Dim c As Range
Dim ParsedData As Variant
Dim i As Long, j As Long

For Each c In Selection
ParsedData = Split(c.Text, " ")
i = 3
Do Until i = UBound(ParsedData) - 3
ParsedData(2) = ParsedData(2) & " " & ParsedData(i)
i = i + 1
Loop

For j = 3 To 6
ParsedData(j) = ParsedData(i + j - 3)
Next j

ReDim Preserve ParsedData(6)

For i = 0 To 6
c.Offset(0, i).Value = ParsedData(i)
Next i
Next c
End Sub
=====================================

This task can also be done with regular expressions in worksheet formulas.

For this method, first download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then you can use the following formulas:

Ticker: =REGEX.MID(A2,"\w+")
Cusip: =REGEX.MID(A2,"\w+",2)

Name:
=REGEX.MID(REGEX.SUBSTITUTE(A2,"^\w+\s+\w+\s+"),"(\w+\s+)+(?=.*\s.*\s.*\s.*%)")

Shares: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-3)
Price: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-2)
MarketCap: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-1)
Yield: =REGEX.MID($A2,"(\d+(,|\.|))+%",-1)


--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top