How to parse a csv string with text qualifiers

J

Janelle

The Split function is very nice, but it does have limited
functionality. The strings I'm trying to split often have
text qualifiers in them. I've tried searching previous
posts in both Excel and Access and have found several
alternatives, but they all break if the qualifier is also
part of the text. Let me give an example.

I create a .csv file in notepad that contains the
following:
1 inch,"2""","3 inches","4, 5, and 6""",7 inches,"8, 9,
and 10 inches",11 inches

If I then open that csv file in Excel, everything is very
nicely split up into cells A1 through G1:
1 inch
2"
3 inches
4, 5, and 6"
7 inches
8, 9, and 10 inches
11 inches

Now let's say I write a code to programmatically read the
csv file into vba using:
Open strInfFile For Input Access Read As intFileNum
Line Input #intFileNum, strInputString

debug.print strInputString
gives the following result:
1 inch,"2""","3 inches","4, 5, and 6""",7 inches,"8, 9,
and 10 inches",11 inches

What I'd like to do is take that string and break it up
into a one-dimensional array with the same contents as the
cells in Excel.

But I keep getting things like:
1 inch
"2"""
"3 inches"
"4
5
and 6"""
7 inches
"8
9
and 10 inches"
11 inches

or

1 inch
"2"""
"3 inches"
"4, 5, and 6"""
7 inches
"8, 9, and 10 inches"
11 inches

or

1 inch

3 inches

7 inches
8, 9, and 10 inches
11 inches

Help!!! Isn't there a decent csv string parser anywhere?
I'm using Excel 2000.
I've thought about writing something myself, but I'm
afraid I can't figure out how to handle the text
qualifiers either.
 
M

Myrna Larson

The easiest way is to let Excel open the CSV file, then pull the data from the
workbook it creates into your VBA variable.

Dim v As Variant

Application.ScreenUpdating = False

'code here to open the workbook

v = ActiveWorkbook.Worksheets(1).UsedRange.Value

ActiveWorkbook.Close SaveChanges:=False

Application.ScreenUpdating = True
 
J

Jamie Collins

Janelle said:
I've tried searching previous
posts in both Excel and Access and have found several
alternatives, but they all break if the qualifier is also
part of the text.

FWIW, I queried your text file data (saved as C:\Tempo\db.txt) using Jet sql:

SELECT * FROM [Text;HDR=NO;Database=C:\Tempo;].db#txt

and it was formatted correctly.

Jamie.

--
 
J

Janelle

Thanks. I'm afraid the solution Tom posted didn't do any
better than the other solutions I've found. I got a row
of cells with the following contents again:
1 inch
"2"""
"3 inches"
"4
5
and 6"""
7 inches
"8
9
and 10 inches"
11 inches
 
T

Tom Ogilvy

I ran it against your data and didn't get anything like that. It handled
everything but the double - double quote which it retained - it didn't
otherwise affect the parsing. I added a single line to handle the double -
double quote and got identical results to opening the file in Excel
directly.

Public Sub DoTheImport()
Dim FName As Variant
Dim Sep As String

FName = Application.GetOpenFilename _
(filefilter:= _
"Text Files(*.txt),*.txt,All Files (*.*),*.*")
If FName = False Then
MsgBox "You didn't select a file"
Exit Sub
End If

Sep = InputBox("Enter a single delimiter character.", _
"Import Text File")
ImportTextFile CStr(FName), Sep

End Sub




Public Sub ImportTextFile(FName As String, Sep As String)

Dim RowNdx As Integer
Dim ColNdx As Integer
Dim TempVal As Variant
Dim WholeLine As String
Dim Pos As Integer
Dim NextPos As Integer
Dim SaveColNdx As Integer

Application.ScreenUpdating = False
'On Error GoTo EndMacro:

SaveColNdx = ActiveCell.Column
RowNdx = ActiveCell.Row

Open FName For Input Access Read As #1

While Not EOF(1)
Line Input #1, WholeLine
If Right(WholeLine, 1) <> Sep Then
WholeLine = WholeLine & Sep
End If
ColNdx = SaveColNdx
Pos = 1
If Left(WholeLine, 1) = Chr(34) Then
NextPos = InStr(Pos, _
WholeLine, Chr(34) & Sep) + 1
Else
NextPos = InStr(Pos, WholeLine, Sep)
End If
While NextPos >= 1
TempVal = Mid(WholeLine, Pos, NextPos - Pos)
If Left(TempVal, 1) = Chr(34) Then _
TempVal = Right(TempVal, Len(TempVal) - 1)
If Right(TempVal, 1) = Chr(34) Then _
TempVal = Left(TempVal, Len(TempVal) - 1)
Cells(RowNdx, ColNdx).Value = Application _
.Substitute(TempVal, Chr(34) & Chr(34), Chr(34))
Pos = NextPos + 1
ColNdx = ColNdx + 1
If Mid(WholeLine, Pos, 1) = Chr(34) Then
NextPos = InStr(Pos, WholeLine, _
Chr(34) & Sep) + 1
Else
NextPos = InStr(Pos, WholeLine, Sep)
End If
Wend
RowNdx = RowNdx + 1
Wend

EndMacro:
On Error GoTo 0
Application.ScreenUpdating = True
Close #1
End Sub

produced

1 inch
2"
3 inches
4, 5, and 6"
7 inches
8, 9, and 10 inches
11 inches

input file:

1 inch,"2""","3 inches","4, 5, and 6""",7 inches,"8, 9, and 10 inches",11
inches
 
J

Janelle

I'm not sure what I did wrong before, but worked this time!
Thanks Tom, you're a genius!

Unfortunately, though, I've run into another problem. It
didn't like this string very well:
1 inch,"2""","3 inches","4"", 5"", and 6""",7 inches,"8,
9, and 10 inches",11 inches
 
T

Tom Ogilvy

That was a quick fix - it isn't a generalized algorithm to handle you
embedded double-double quotes.

Have you tried Jamie's suggestion of pulling it into a record set?
 
J

Janelle

Thank you all for the help and advice. I think I've
finally got something that ought to work.

Jaime and Tom were right, running a SELECT query against a
text file does seem to be the best way to evaluate a csv
string. But then I ran up against yet another problem.
Most of the strings I need to evaluate are in files with
odd file extensions. And since I want this to run on
anyone's computer with any file, changing the Windows
Registry to allow for more filetypes didn't seem like a
workable option. Also, what I was REALLY wanting was a
Split-like function to parse a string, although I'm sure
the ability to parse a whole file will come in handy
before long, too. So after a LOT of research and trial
and error (mostly error), I've finally come up with the
following function. If anyone sees a problem, knows of a
better method, or sees a way to improve this, please post
it, but this seems to work so far.

Public Function SplitCSVString(strInput As String) As
Variant
'CSV String Parser
'It's sort of like Split(), but the delimiter is always a
comma
'and it can handle quotation mark text qualifiers.
'Always returns an array of at least one element.
'If the input string is empty (""), the element will also
be an empty string.
'If the input string contains a LineFeed or CarriageReturn
character,
'the line break and anything following it will be dropped.

Dim strPathName As String
Dim strFileName As String
Dim intFileNum As Integer
Dim strSplitString() As String
Dim cntFolder As New ADODB.Connection
Dim rstFile As New ADODB.Recordset
Dim intFields As Integer

strPathName = "C:\"
strFileName = "SplitCSVString" & _
Format(Date, "yyyymmdd") & ".csv"
intFileNum = FreeFile()
ReDim strSplitString(1 To 1)

'No need to go to a lot of trouble if there's nothing to
split.
If Not InStr(strInput, ",") > 0 Then
strSplitString(1) = strInput
SplitCSVString = strSplitString
Exit Function
End If
'Sorry folks, one line only
If InStr(strInput, vbLf) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf)-1)
End If
If InStr(strInput, vbCr) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf)-1)
End If
If strInput = "" Then
strSplitString(1) = ""
SplitCSVString = strSplitString
Exit Function
End If
'Write the string to a file
Open strPathName & strFileName For Output As
#intFileNum
Print #intFileNum, strInput
Close #intFileNum
'Read the file with the parser
cntFolder.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & strPathName & ";" & _
"Extended Properties=""text;HDR=No;FMT=Delimited"""
rstFile.Open "SELECT * FROM " & strFileName, _
cntFolder, adOpenStatic, adLockReadOnly
rstFile.MoveFirst
For intFields = 1 To rstFile.Fields.Count
ReDim Preserve strSplitString(1 To intFields)
If IsNull(rstFile.Fields(intFields - 1).Value) Then
strSplitString(intFields) = ""
Else
strSplitString(intFields) = _
rstFile.Fields(intFields - 1).Value
End If
Next intFields
rstFile.Close
cntFolder.Close
'Kill the file
Kill strPathName & strFileName
'Return the array
SplitCSVString = strSplitString

End Function
 
J

Janelle

Ok, this is getting very discouraging. I thought I had
everything solved. I really did. The function has worked
GREAT...until today. I think I've run across some sort of
bug.

If the input string is this:
"ENTRY,327,327f,f,F,""327F"",327F,327K,327S"
we discover that the uppercase F, K, and S are dropped
when they're combined with a number and not enclosed in
quotes.

I've tried interrupting the function before it kills the
csv file and it with Excel, and it works fine. But not
with the ADODB connection.

I am absolutely at my wit's end. SURELY there has got to
be SOME WAY to accurately parse a simple csv string, but
for the life of me, I can't seem to find it!
HELP!!!!!!!!
 
B

Bernie Deitrick

Janelle,

With your input string in cell A1, the sub below shows the correct values.
Requires XL2000 or higher.

HTH,
Bernie
MS Excel MVP

Sub TryNow()
Dim mySplit As Variant
Dim i As Integer

mySplit = Split(Replace(Range("A1").Value, """", ""), ",")

For i = LBound(mySplit) To UBound(mySplit)
MsgBox mySplit(i)
Next i

End Sub
 
J

Janelle

Sorry, it looks like I didn't give enough information and
you must not have seen the earlier posts to this thread.
I'm looking for a fully functional CSV parser.

If I put the following into a csv text file (it's supposed
to be all on one line, but this window keeps wrapping it):
274F,K274,1S2,1 inch,"2""","3 inches","4"", 5"", and
6""",7 inches,"8, 9, and 10 inches",11 inches
and open the csv file in Excel, it works beautifully.

If I read it into a text string & try feeding that into my
function, it loses the F, K, and S.

If I try feeding it through your sub, it loses all the
quotation marks and splits these 2 into 3 parts each:
4", 5", and 6"
8, 9, and 10 inches
 
B

Bernie Deitrick

Janelle,

I'm sorry - I thought your post was a new thread rather than the
continuation of another.

Try the macro below, which worked on a long column of your test values.
Select your values to be parsed first.

As written, it will only work on a single column, and may over-write things
to the right of the selection.

HTH,
Bernie
MS Excel MVP

Sub JanellesCSVParser()

Dim FName As String
Dim FNum As Integer
Dim RowNdx As Long
Dim StartRow As Long
Dim EndRow As Long
Dim myRange As Range

FName = "Janelle Conversion.csv"

Application.ScreenUpdating = False
On Error GoTo EndMacro:
FNum = FreeFile

Set myRange = Selection
With myRange
StartRow = .Cells(1).Row
EndRow = .Cells(.Cells.Count).Row
End With

Open FName For Output Access Write As #FNum

For RowNdx = StartRow To EndRow
Print #FNum, Cells(RowNdx, myRange.Column).Text
Next RowNdx

EndMacro:
On Error GoTo 0
Application.ScreenUpdating = True
Close #FNum

Workbooks.Open Filename:=FName

Range("A1").CurrentRegion.Copy myRange
ActiveWorkbook.Close False
Kill FName

End Sub
 
J

Janelle

Well, yes, writing the text to a file, then opening it in
Excel and pulling the data from there had occurred to me,
but I was hoping for something that worked a bit faster.
I have a LOT of text to go through.

What I actually want is something that works much like the
Split() function, except that it needs to properly handle
text qualifiers, and I don't mind if you can't specify the
delimiter. I want to be able to feed in a string and have
it return an array. Preferably without opening and
closing any new workbooks.
 
B

Bernie Deitrick

Janelle,

How long does it actually take? Have you tested it? Writing to and reading
from a file can actually be faster than custom functions/procedures that do
a lot of internal processing, expecially since the solution uses native
Excel code (fast) rather than VBA code (very very slow).

And since you have a working solution, you are infinitely better off having
that solution than not having a (possibly but not certainly) faster
(currently non-existing) solution.

HTH,
Bernie
MS Excel MVP
 
J

Janelle

A co-worker of mine discovered that the F, K, and S cause
the column to be read as currency, but that a schema.ini
solves the problem. So I added a section into the
function that counts the fields in the string and creates
a schema.ini making each one a text field.
It all seems a bit long to me, but it also seems to work.
And since there are no Excel workbooks involved, I'm
hoping I can also port it over to Access without too much
trouble, although I haven't actually tried that yet.
I'm a great believer in sharing, so for anyone else who
might be interested, here's the code:

Public Function SplitCSVString(strInput As String) As
Variant

'CSV String Parser
'It's sort of like Split(), but the delimiter is always a
comma,
'and it can handle quotation mark text qualifiers.
'Always returns an array of at least one element.
'If the input string is empty (""), the element will also
be an empty string.
'If the input string contains a LineFeed or CarriageReturn
character,
'the line break and anything following it will be dropped.

Dim strPathName As String
Dim strFileName As String
Dim intFileNum As Integer
Dim strSplitString() As String
Dim cntFolder As New ADODB.Connection
Dim rstFile As New ADODB.Recordset
Dim intFields As Integer

strPathName = "C:\"
' strFileName = "SplitCSVString" & Format
(Date, "yyyymmdd") & ".csv"
strFileName = "SplitCSVString.csv"
intFileNum = FreeFile()
ReDim strSplitString(1 To 1)

'No need to go to a lot of trouble if there's nothing to
split.
If Not InStr(strInput, ",") > 0 Then
strSplitString(1) = strInput
SplitCSVString = strSplitString
Exit Function
End If
'Sorry folks, one line only
If InStr(strInput, vbLf) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf) -
1)
End If
If InStr(strInput, vbCr) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf) -
1)
End If
'No need to go to a lot of trouble if there's nothing to
split.
If Not InStr(strInput, ",") > 0 Then
strSplitString(1) = strInput
SplitCSVString = strSplitString
Exit Function
End If
'Write the string to a file
Open strPathName & strFileName For Output As
#intFileNum
Print #intFileNum, strInput
Close #intFileNum
'Read the file with the parser to get column count
cntFolder.Open "Provider=Microsoft.Jet.OLEDB.4.0;"
& "Data Source=" & strPathName & ";" & "Extended
Properties=""text;HDR=No;FMT=Delimited"""
rstFile.Open "SELECT * FROM " & strFileName,
cntFolder, adOpenStatic, adLockReadOnly
'Write schema.ini
Open strPathName & "schema.ini" For Output As
#intFileNum
Print #intFileNum, "[" & strFileName & "]"
Print #intFileNum, "ColNameHeader=False"
Print #intFileNum, "Format=CSVDelimited"
For intFields = 1 To rstFile.Fields.Count
Print #intFileNum, "Col" & intFields & "=C" &
intFields & " Char"
Next intFields
Close #intFileNum
rstFile.Close
'Read the file with the parser
rstFile.Open "SELECT * FROM " & strFileName,
cntFolder, adOpenStatic, adLockReadOnly
rstFile.MoveFirst
For intFields = 1 To rstFile.Fields.Count
ReDim Preserve strSplitString(1 To intFields)
If IsNull(rstFile.Fields(intFields - 1).Value) Then
strSplitString(intFields) = ""
Else
strSplitString(intFields) = rstFile.Fields
(intFields - 1).Value
End If
Next intFields
rstFile.Close
cntFolder.Close
'Kill the file - skipping these 2 lines in an effort to
save processing time
' Kill strPathName & strFileName
' Kill strPathName & "schema.ini"
'Return the array
SplitCSVString = strSplitString

End Function
 
J

Janelle

Ok, the effort to save processing time backfired
somewhat. You DO need to kill the schema.ini at the end,
which I would've realized if I had been thinking straight.
-----Original Message-----
A co-worker of mine discovered that the F, K, and S cause
the column to be read as currency, but that a schema.ini
solves the problem. So I added a section into the
function that counts the fields in the string and creates
a schema.ini making each one a text field.
It all seems a bit long to me, but it also seems to work.
And since there are no Excel workbooks involved, I'm
hoping I can also port it over to Access without too much
trouble, although I haven't actually tried that yet.
I'm a great believer in sharing, so for anyone else who
might be interested, here's the code:

Public Function SplitCSVString(strInput As String) As
Variant

'CSV String Parser
'It's sort of like Split(), but the delimiter is always a
comma,
'and it can handle quotation mark text qualifiers.
'Always returns an array of at least one element.
'If the input string is empty (""), the element will also
be an empty string.
'If the input string contains a LineFeed or CarriageReturn
character,
'the line break and anything following it will be dropped.

Dim strPathName As String
Dim strFileName As String
Dim intFileNum As Integer
Dim strSplitString() As String
Dim cntFolder As New ADODB.Connection
Dim rstFile As New ADODB.Recordset
Dim intFields As Integer

strPathName = "C:\"
' strFileName = "SplitCSVString" & Format
(Date, "yyyymmdd") & ".csv"
strFileName = "SplitCSVString.csv"
intFileNum = FreeFile()
ReDim strSplitString(1 To 1)

'No need to go to a lot of trouble if there's nothing to
split.
If Not InStr(strInput, ",") > 0 Then
strSplitString(1) = strInput
SplitCSVString = strSplitString
Exit Function
End If
'Sorry folks, one line only
If InStr(strInput, vbLf) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf) -
1)
End If
If InStr(strInput, vbCr) > 0 Then
strInput = Left(strInput, InStr(strInput, vbLf) -
1)
End If
'No need to go to a lot of trouble if there's nothing to
split.
If Not InStr(strInput, ",") > 0 Then
strSplitString(1) = strInput
SplitCSVString = strSplitString
Exit Function
End If
'Write the string to a file
Open strPathName & strFileName For Output As
#intFileNum
Print #intFileNum, strInput
Close #intFileNum
'Read the file with the parser to get column count
cntFolder.Open "Provider=Microsoft.Jet.OLEDB.4.0;"
& "Data Source=" & strPathName & ";" & "Extended
Properties=""text;HDR=No;FMT=Delimited"""
rstFile.Open "SELECT * FROM " & strFileName,
cntFolder, adOpenStatic, adLockReadOnly
'Write schema.ini
Open strPathName & "schema.ini" For Output As
#intFileNum
Print #intFileNum, "[" & strFileName & "]"
Print #intFileNum, "ColNameHeader=False"
Print #intFileNum, "Format=CSVDelimited"
For intFields = 1 To rstFile.Fields.Count
Print #intFileNum, "Col" & intFields & "=C" &
intFields & " Char"
Next intFields
Close #intFileNum
rstFile.Close
'Read the file with the parser
rstFile.Open "SELECT * FROM " & strFileName,
cntFolder, adOpenStatic, adLockReadOnly
rstFile.MoveFirst
For intFields = 1 To rstFile.Fields.Count
ReDim Preserve strSplitString(1 To intFields)
If IsNull(rstFile.Fields(intFields - 1).Value) Then
strSplitString(intFields) = ""
Else
strSplitString(intFields) = rstFile.Fields
(intFields - 1).Value
End If
Next intFields
rstFile.Close
cntFolder.Close
'Kill the file - skipping these 2 lines in an effort to
save processing time
' Kill strPathName & strFileName
' Kill strPathName & "schema.ini"
'Return the array
SplitCSVString = strSplitString

End Function

-----Original Message-----
Ok, this is getting very discouraging. I thought I had
everything solved. I really did. The function has worked
GREAT...until today. I think I've run across some sort of
bug.

If the input string is this:
"ENTRY,327,327f,f,F,""327F"",327F,327K,327S"
we discover that the uppercase F, K, and S are dropped
when they're combined with a number and not enclosed in
quotes.

I've tried interrupting the function before it kills the
csv file and it with Excel, and it works fine. But not
with the ADODB connection.

I am absolutely at my wit's end. SURELY there has got to
be SOME WAY to accurately parse a simple csv string, but
for the life of me, I can't seem to find it!
HELP!!!!!!!!
.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top