PC Review


Reply
Thread Tools Rate Thread

Determine delimiter in csv or txt file

 
 
Kieran H
Guest
Posts: n/a
 
      23rd Feb 2007
To all,

I would like to be able to programatically determine the delimiter
within a txt or csv file so that the user is not faced with the Excel
import text wizard..

The files come from several sources and the delimiter is likely to be
tab, comma or semicolon and less likely a pipe or tilda.

I'm thinking I need to count the number of each potential delimiter in
a given number of rows probably excluding those that occur within
quoted text.

If you have any thoughts or know of any examples they'd be a great
help

Cheers

Kieran

 
Reply With Quote
 
 
 
 
Chip Pearson
Guest
Posts: n/a
 
      23rd Feb 2007
The following code will open the file and test for probable delimiters. If
Split returns an array of more than one element when splitting using a
specific character as the delimiter, that character is a probable delimiter
of the text on a single line of the input file. Change FName to the name of
your file, and change PossibleDelimiters to include all the characters that
might be a delimiter of the data.

Sub AAA()
Dim FNum As Integer
Dim FName As String
Dim Ndx As Long
Dim InputLine As String
Dim Arr As Variant
Dim C As String
Dim PossibleDelimiters As String

PossibleDelimiters = ",;|~" & vbTab

FName = "C:\Test.txt" '<<< CHANGE
FNum = FreeFile
Open FName For Input Access Read As #FNum
Line Input #FNum, InputLine
Close #FNum
For Ndx = 1 To Len(PossibleDelimiters)
C = Mid(PossibleDelimiters, Ndx, 1)
Arr = Split(InputLine, C)
If IsArray(Arr) = True Then
If UBound(Arr) - LBound(Arr) + 1 > 1 Then
Debug.Print "Likely Delimiter: ", C, Asc(C)
End If
End If
Next Ndx
End Sub


--
Cordially,
Chip Pearson
Microsoft MVP - Excel
Pearson Software Consulting, LLC
www.cpearson.com
(email address is on the web site)

"Kieran H" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> To all,
>
> I would like to be able to programatically determine the delimiter
> within a txt or csv file so that the user is not faced with the Excel
> import text wizard..
>
> The files come from several sources and the delimiter is likely to be
> tab, comma or semicolon and less likely a pipe or tilda.
>
> I'm thinking I need to count the number of each potential delimiter in
> a given number of rows probably excluding those that occur within
> quoted text.
>
> If you have any thoughts or know of any examples they'd be a great
> help
>
> Cheers
>
> Kieran
>



 
Reply With Quote
 
=?Utf-8?B?VG9tIE9naWx2eQ==?=
Guest
Posts: n/a
 
      23rd Feb 2007
this looks for the most frequently occuring non letter, non number character

Sub ReadStraightTextFile()
Dim sStr As String
Dim LineofText As String
Dim schr As String
Dim s(0 To 127) As Long
Dim sepChar As String, sepMax As Long
Dim sName As Variant, i As Long
Dim bStop As Boolean, s1 As String
sName = Application.GetOpenFilename(FileFilter:="CSV Files (*.csv),*.csv")
If sName = False Then Exit Sub
Open sName For Input As #1
sStr = ""
i = 1
Do While Not EOF(1)
If i > 2 Then Exit Do
Line Input #1, LineofText
sStr = LineofText
i = i + 1
Loop
Close #1
bStop = False
For i = 1 To Len(sStr)
schr = Mid(sStr, i, 1)
If schr = """" Then bStop = Not bStop
If Not bStop Then
s(Asc(schr)) = s(Asc(schr)) + 1
s1 = s1 & schr
End If
Next
sepMax = 0
sepChar = ","
For i = 9 To 127
' exclude double quote and space
If i <> 34 and i <> 32 Then
schr = Chr(i)
If Not IsNumeric(schr) Then
If UCase(schr) = LCase(schr) Then
If s(i) > sepMax Then
sepChar = schr
sepMax = s(i)
End If
End If
End If
End If
Next

MsgBox asc(sepChar) & ": " & sepChar
End Sub

It would be easily defeated by odd patterns

!!!!!!!!!!!!!!!!,!!!!!!!!!!!,///////////

and some not so odd patterns
1/1/2007,1/2/2007,1/3/2007

but if you have some knowledge of the possibilities you might be able to
craft it to a reasonable approach.

--
Regards,
Tom Ogilvy


"Kieran H" wrote:

> To all,
>
> I would like to be able to programatically determine the delimiter
> within a txt or csv file so that the user is not faced with the Excel
> import text wizard..
>
> The files come from several sources and the delimiter is likely to be
> tab, comma or semicolon and less likely a pipe or tilda.
>
> I'm thinking I need to count the number of each potential delimiter in
> a given number of rows probably excluding those that occur within
> quoted text.
>
> If you have any thoughts or know of any examples they'd be a great
> help
>
> Cheers
>
> Kieran
>
>

 
Reply With Quote
 
RB Smissaert
Guest
Posts: n/a
 
      23rd Feb 2007
Haven't timed it, but this might be a faster way of doing this:


Sub test()
MsgBox Asc(GuessDelimiter("C:\DelimTest.txt"))
End Sub


Function GuessDelimiter(strFile As String) As String

Dim i As Byte
Dim n As Long
Dim x As Long
Dim z As Byte
Dim strText As String
Dim arrDelimiters(1 To 5) As String

strText = OpenTextFileToString(strFile)

arrDelimiters(1) = ","
arrDelimiters(2) = ";"
arrDelimiters(3) = "|"
arrDelimiters(4) = "~"
arrDelimiters(5) = vbTab

For i = 1 To 5
If i = 1 Then
n = CountChar(arrDelimiters(i), strText)
Else
n = CountChar(arrDelimiters(i), strText, x)
End If
If n > x Then
x = n
z = i
End If
Next i

If z > 0 Then
GuessDelimiter = arrDelimiters(z)
End If

End Function


Function OpenTextFileToString(ByVal strFile As String) As String

Dim hFile As Long

'obtain file handle, open file and load into a string buffer
hFile = FreeFile

Open strFile For Input As #hFile

OpenTextFileToString = Input$(LOF(hFile), hFile)

Close #hFile

End Function


Function CountChar(strChar As String, _
strString As String, _
Optional lStopAtCount As Long = -1) As Long

Dim lPos As Long
Dim n As Long

lPos = InStr(1, strString, strChar, vbBinaryCompare)

If lPos = 0 Then
CountChar = 0
Exit Function
End If

If lStopAtCount = -1 Then
Do Until lPos = 0
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
Else
Do Until lPos = 0 Or n > lStopAtCount
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
End If

CountChar = n

End Function


You might make it a bit faster by altering the order of the delimiters in
arrDelimiters.


RBS


"Kieran H" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> To all,
>
> I would like to be able to programatically determine the delimiter
> within a txt or csv file so that the user is not faced with the Excel
> import text wizard..
>
> The files come from several sources and the delimiter is likely to be
> tab, comma or semicolon and less likely a pipe or tilda.
>
> I'm thinking I need to count the number of each potential delimiter in
> a given number of rows probably excluding those that occur within
> quoted text.
>
> If you have any thoughts or know of any examples they'd be a great
> help
>
> Cheers
>
> Kieran
>


 
Reply With Quote
 
RB Smissaert
Guest
Posts: n/a
 
      23rd Feb 2007
Yes, I can see it makes sense to only look at the first line of the file for
several reasons.

So, this would be better compared to my old function:

Function GuessDelimiter(strFile As String) As String

Dim i As Byte
Dim n As Long
Dim x As Long
Dim z As Byte
Dim hFile As Long
Dim strFirstLine As String
Dim arrDelimiters(1 To 5) As String

hFile = FreeFile

Open strFile For Input Access Read As #hFile
Line Input #hFile, strFirstLine
Close #hFile

arrDelimiters(1) = "|"
arrDelimiters(2) = "~"
arrDelimiters(3) = ";"
arrDelimiters(4) = vbTab
arrDelimiters(5) = ","

For i = 1 To 5
If i = 1 Then
n = CountChar(arrDelimiters(i), strFirstLine)
Else
n = CountChar(arrDelimiters(i), strFirstLine, x)
End If
If n > x Then
x = n
z = i
End If
Next i

If z > 0 Then
GuessDelimiter = arrDelimiters(z)
End If

End Function


RBS


"Chip Pearson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> The following code will open the file and test for probable delimiters. If
> Split returns an array of more than one element when splitting using a
> specific character as the delimiter, that character is a probable
> delimiter of the text on a single line of the input file. Change FName to
> the name of your file, and change PossibleDelimiters to include all the
> characters that might be a delimiter of the data.
>
> Sub AAA()
> Dim FNum As Integer
> Dim FName As String
> Dim Ndx As Long
> Dim InputLine As String
> Dim Arr As Variant
> Dim C As String
> Dim PossibleDelimiters As String
>
> PossibleDelimiters = ",;|~" & vbTab
>
> FName = "C:\Test.txt" '<<< CHANGE
> FNum = FreeFile
> Open FName For Input Access Read As #FNum
> Line Input #FNum, InputLine
> Close #FNum
> For Ndx = 1 To Len(PossibleDelimiters)
> C = Mid(PossibleDelimiters, Ndx, 1)
> Arr = Split(InputLine, C)
> If IsArray(Arr) = True Then
> If UBound(Arr) - LBound(Arr) + 1 > 1 Then
> Debug.Print "Likely Delimiter: ", C, Asc(C)
> End If
> End If
> Next Ndx
> End Sub
>
>
> --
> Cordially,
> Chip Pearson
> Microsoft MVP - Excel
> Pearson Software Consulting, LLC
> www.cpearson.com
> (email address is on the web site)
>
> "Kieran H" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> To all,
>>
>> I would like to be able to programatically determine the delimiter
>> within a txt or csv file so that the user is not faced with the Excel
>> import text wizard..
>>
>> The files come from several sources and the delimiter is likely to be
>> tab, comma or semicolon and less likely a pipe or tilda.
>>
>> I'm thinking I need to count the number of each potential delimiter in
>> a given number of rows probably excluding those that occur within
>> quoted text.
>>
>> If you have any thoughts or know of any examples they'd be a great
>> help
>>
>> Cheers
>>
>> Kieran
>>

>
>


 
Reply With Quote
 
Kieran H
Guest
Posts: n/a
 
      26th Feb 2007
On Feb 23, 10:18 pm, "RB Smissaert" <bartsmissa...@blueyonder.co.uk>
wrote:
> Haven't timed it, but this might be a faster way of doing this:
>
> Sub test()
> MsgBox Asc(GuessDelimiter("C:\DelimTest.txt"))
> End Sub
>
> Function GuessDelimiter(strFile As String) As String
>
> Dim i As Byte
> Dim n As Long
> Dim x As Long
> Dim z As Byte
> Dim strText As String
> Dim arrDelimiters(1 To 5) As String
>
> strText = OpenTextFileToString(strFile)
>
> arrDelimiters(1) = ","
> arrDelimiters(2) = ";"
> arrDelimiters(3) = "|"
> arrDelimiters(4) = "~"
> arrDelimiters(5) = vbTab
>
> For i = 1 To 5
> If i = 1 Then
> n = CountChar(arrDelimiters(i), strText)
> Else
> n = CountChar(arrDelimiters(i), strText, x)
> End If
> If n > x Then
> x = n
> z = i
> End If
> Next i
>
> If z > 0 Then
> GuessDelimiter = arrDelimiters(z)
> End If
>
> End Function
>
> Function OpenTextFileToString(ByVal strFile As String) As String
>
> Dim hFile As Long
>
> 'obtain file handle, open file and load into a string buffer
> hFile = FreeFile
>
> Open strFile For Input As #hFile
>
> OpenTextFileToString = Input$(LOF(hFile), hFile)
>
> Close #hFile
>
> End Function
>
> Function CountChar(strChar As String, _
> strString As String, _
> Optional lStopAtCount As Long = -1) As Long
>
> Dim lPos As Long
> Dim n As Long
>
> lPos = InStr(1, strString, strChar, vbBinaryCompare)
>
> If lPos = 0 Then
> CountChar = 0
> Exit Function
> End If
>
> If lStopAtCount = -1 Then
> Do Until lPos = 0
> lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
> n = n + 1
> Loop
> Else
> Do Until lPos = 0 Or n > lStopAtCount
> lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
> n = n + 1
> Loop
> End If
>
> CountChar = n
>
> End Function
>
> You might make it a bit faster by altering the order of the delimiters in
> arrDelimiters.
>
> RBS
>
> "KieranH" <kieran.ha...@cliffordchance.com> wrote in message
>
> news:(E-Mail Removed)...
>
>
>
> > To all,

>
> > I would like to be able to programatically determine the delimiter
> > within a txt or csv file so that the user is not faced with the Excel
> > import text wizard..

>
> > The files come from several sources and the delimiter is likely to be
> > tab, comma or semicolon and less likely a pipe or tilda.

>
> > I'm thinking I need to count the number of each potential delimiter in
> > a given number of rows probably excluding those that occur within
> > quoted text.

>
> > If you have any thoughts or know of any examples they'd be a great
> > help

>
> > Cheers

>
> >Kieran- Hide quoted text -

>
> - Show quoted text -


This is excelent - Real world problems - Real world solutions

My thanks and respect

Cheers

Kieran




















 
Reply With Quote
 
RB Smissaert
Guest
Posts: n/a
 
      26th Feb 2007
No trouble.
As you don't really have to check the whole file look at my other posting
and that of Chip Pearson.

RBS

"Kieran H" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Feb 23, 10:18 pm, "RB Smissaert" <bartsmissa...@blueyonder.co.uk>
> wrote:
>> Haven't timed it, but this might be a faster way of doing this:
>>
>> Sub test()
>> MsgBox Asc(GuessDelimiter("C:\DelimTest.txt"))
>> End Sub
>>
>> Function GuessDelimiter(strFile As String) As String
>>
>> Dim i As Byte
>> Dim n As Long
>> Dim x As Long
>> Dim z As Byte
>> Dim strText As String
>> Dim arrDelimiters(1 To 5) As String
>>
>> strText = OpenTextFileToString(strFile)
>>
>> arrDelimiters(1) = ","
>> arrDelimiters(2) = ";"
>> arrDelimiters(3) = "|"
>> arrDelimiters(4) = "~"
>> arrDelimiters(5) = vbTab
>>
>> For i = 1 To 5
>> If i = 1 Then
>> n = CountChar(arrDelimiters(i), strText)
>> Else
>> n = CountChar(arrDelimiters(i), strText, x)
>> End If
>> If n > x Then
>> x = n
>> z = i
>> End If
>> Next i
>>
>> If z > 0 Then
>> GuessDelimiter = arrDelimiters(z)
>> End If
>>
>> End Function
>>
>> Function OpenTextFileToString(ByVal strFile As String) As String
>>
>> Dim hFile As Long
>>
>> 'obtain file handle, open file and load into a string buffer
>> hFile = FreeFile
>>
>> Open strFile For Input As #hFile
>>
>> OpenTextFileToString = Input$(LOF(hFile), hFile)
>>
>> Close #hFile
>>
>> End Function
>>
>> Function CountChar(strChar As String, _
>> strString As String, _
>> Optional lStopAtCount As Long = -1) As Long
>>
>> Dim lPos As Long
>> Dim n As Long
>>
>> lPos = InStr(1, strString, strChar, vbBinaryCompare)
>>
>> If lPos = 0 Then
>> CountChar = 0
>> Exit Function
>> End If
>>
>> If lStopAtCount = -1 Then
>> Do Until lPos = 0
>> lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
>> n = n + 1
>> Loop
>> Else
>> Do Until lPos = 0 Or n > lStopAtCount
>> lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
>> n = n + 1
>> Loop
>> End If
>>
>> CountChar = n
>>
>> End Function
>>
>> You might make it a bit faster by altering the order of the delimiters in
>> arrDelimiters.
>>
>> RBS
>>
>> "KieranH" <kieran.ha...@cliffordchance.com> wrote in message
>>
>> news:(E-Mail Removed)...
>>
>>
>>
>> > To all,

>>
>> > I would like to be able to programatically determine the delimiter
>> > within a txt or csv file so that the user is not faced with the Excel
>> > import text wizard..

>>
>> > The files come from several sources and the delimiter is likely to be
>> > tab, comma or semicolon and less likely a pipe or tilda.

>>
>> > I'm thinking I need to count the number of each potential delimiter in
>> > a given number of rows probably excluding those that occur within
>> > quoted text.

>>
>> > If you have any thoughts or know of any examples they'd be a great
>> > help

>>
>> > Cheers

>>
>> >Kieran- Hide quoted text -

>>
>> - Show quoted text -

>
> This is excelent - Real world problems - Real world solutions
>
> My thanks and respect
>
> Cheers
>
> Kieran
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Delimiter for csv and text file using ADO Jean-Yves Microsoft Excel Programming 1 22nd Dec 2006 08:37 AM
Tab Delimiter File moonwalker Microsoft Excel Misc 7 24th Feb 2006 03:56 PM
Please define field delimiter and record delimiter in sorting A-Z. =?Utf-8?B?YW5nZWxmYWNl?= Microsoft Word Document Management 2 4th Oct 2004 10:43 PM
CSV file delimiter Vinay Microsoft Excel Worksheet Functions 1 29th Jul 2004 12:32 PM
Use other delimiter for a file in Excel mcastillo Microsoft Excel Discussion 3 3rd Mar 2004 04:19 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 07:46 PM.