Extract matching data from large data file (csv)

Discussion in 'Microsoft Excel Programming' started by Utahstew, Apr 13, 2008.

  1. Utahstew

    Utahstew Guest

    I need to extract data (the first three letters after the 2nd comma, and the
    first 35 characters after the 7th comma) from a csv file (over 100,000 rows),
    only after the 8th column matches a values in column A of my spreadsheet. The
    two extracted data elements need to be stored in my worksheet in columns B
    and C

    Any help on this would be appreciated.
     
    Utahstew, Apr 13, 2008
    #1
    1. Advertisements

  2. Utahstew

    Joel Guest

    The code below should work. It may need a slight change. I a little
    confused. The eigth column is the data after the seventh comma. Do you mean
    after the eigth comma?

    the code below will only bring into the worksheet the needed data. It will
    ignore data that doesn't meet your criteria.

    Sub Gettext()

    Const ForReading = 1, ForWriting = 2, ForAppending = 3
    Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0

    Dim Data(8)

    'default folder
    Folder = "C:\temp"
    ChDir (Folder)

    Set fsread = CreateObject("Scripting.FileSystemObject")
    FName = Application.GetOpenFilename("CSV (*.csv),*.csv")

    Set fread = fsread.GetFile(FName)
    Set tsread = fread.OpenAsTextStream(ForReading, TristateUseDefault)

    RowCount = 1
    Do While tsread.atendofstream = False

    InputLine = tsread.ReadLine

    For i = 0 To 7
    If InStr(InputLine, ",") > 0 Then
    Data(i) = Left(InputLine, InStr(InputLine, ",") - 1)
    InputLine = Mid(InputLine, InStr(InputLine, ",") + 1)
    Else
    If Len(InputLine) > 0 Then
    Data(i) = InputLine
    InputLine = ""
    Else
    Exit For
    End If
    End If
    Next i
    'check if 8th item is in column A
    Set c = Columns("A:A").Find(what:=Data(7), LookIn:=xlValues, _
    lookat:=xlWhole)
    If Not c Is Nothing Then
    c.Offset(0, 1) = Left(Data(2), 3)
    c.Offset(0, 2) = Left(Data(7), 35)
    End If
    Loop
    tsread.Close
    End Sub



    "Utahstew" wrote:

    > I need to extract data (the first three letters after the 2nd comma, and the
    > first 35 characters after the 7th comma) from a csv file (over 100,000 rows),
    > only after the 8th column matches a values in column A of my spreadsheet. The
    > two extracted data elements need to be stored in my worksheet in columns B
    > and C
    >
    > Any help on this would be appreciated.
    >
    >
     
    Joel, Apr 13, 2008
    #2
    1. Advertisements

  3. Utahstew

    Utahstew Guest

    Thank you for your quick response.

    The code works great until it finds an "&" in Data(7)

    Any suggestion on how to over come this ...?


    "Joel" wrote:

    > The code below should work. It may need a slight change. I a little
    > confused. The eigth column is the data after the seventh comma. Do you mean
    > after the eigth comma?
    >
    > the code below will only bring into the worksheet the needed data. It will
    > ignore data that doesn't meet your criteria.
    >
    > Sub Gettext()
    >
    > Const ForReading = 1, ForWriting = 2, ForAppending = 3
    > Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
    >
    > Dim Data(8)
    >
    > 'default folder
    > Folder = "C:\temp"
    > ChDir (Folder)
    >
    > Set fsread = CreateObject("Scripting.FileSystemObject")
    > FName = Application.GetOpenFilename("CSV (*.csv),*.csv")
    >
    > Set fread = fsread.GetFile(FName)
    > Set tsread = fread.OpenAsTextStream(ForReading, TristateUseDefault)
    >
    > RowCount = 1
    > Do While tsread.atendofstream = False
    >
    > InputLine = tsread.ReadLine
    >
    > For i = 0 To 7
    > If InStr(InputLine, ",") > 0 Then
    > Data(i) = Left(InputLine, InStr(InputLine, ",") - 1)
    > InputLine = Mid(InputLine, InStr(InputLine, ",") + 1)
    > Else
    > If Len(InputLine) > 0 Then
    > Data(i) = InputLine
    > InputLine = ""
    > Else
    > Exit For
    > End If
    > End If
    > Next i
    > 'check if 8th item is in column A
    > Set c = Columns("A:A").Find(what:=Data(7), LookIn:=xlValues, _
    > lookat:=xlWhole)
    > If Not c Is Nothing Then
    > c.Offset(0, 1) = Left(Data(2), 3)
    > c.Offset(0, 2) = Left(Data(7), 35)
    > End If
    > Loop
    > tsread.Close
    > End Sub
    >
    >
    >
    > "Utahstew" wrote:
    >
    > > I need to extract data (the first three letters after the 2nd comma, and the
    > > first 35 characters after the 7th comma) from a csv file (over 100,000 rows),
    > > only after the 8th column matches a values in column A of my spreadsheet. The
    > > two extracted data elements need to be stored in my worksheet in columns B
    > > and C
    > >
    > > Any help on this would be appreciated.
    > >
    > >
     
    Utahstew, Apr 13, 2008
    #3
  4. Utahstew

    Utahstew Guest

    I was wrong other "&" exist in the lines previously
     
    Utahstew, Apr 13, 2008
    #4
  5. Utahstew

    Utahstew Guest

    Joel,
    Here is the modified code I have in the spreadsheet. It throws a Run-time
    error '9': Subscript out of range when reading data from line 3073 of the
    Total.csv file.

    Equity 93671105 NYSE - H&R Block Inc 20080102 18.61 US0936711052 H&R Block
    Inc HRB 93671105


    Sub getDatafromTextFile()
    Dim Ticker As String

    tickers = Application.CountA(ActiveSheet.Range("A:A")) 'determines number of
    Tickers

    Const ForReading = 1
    Const ForWriting = 2
    Const ForAppending = 3
    Const TristateUSeDefault = -2
    Const TristateTrue = -1
    Const TristateFalse = 0

    Dim Data(9)

    'Default folder

    Set fsread = CreateObject("Scripting.fileSystemObject")
    Fname = Application.GetOpenFilename(FileFilter:="Text Files (*.csv), *.csv",
    Title:="Please select a file")
    If Fname = False Then
    ' They pressed Cancel
    MsgBox "Stopping because you did not select a file"
    End If

    Set fread = fsread.getfile(Fname)
    Set tsread = fread.openastextstream(ForReading, TristateUSeDefault)

    RowCount = 1


    Do While tsread.atendofstream = False

    inputline = tsread.readline

    For i = 0 To 10
    If InStr(inputline, ",") > 0 Then
    Data(i) = Left(inputline, InStr(inputline, ",") - 1)
    inputline = Mid(inputline, InStr(inputline, ",") + 1)
    Else
    If Len(inputline) > 0 Then
    Data(i) = inputline
    inputline = ""
    Else
    Exit For
    End If
    End If

    Next i

    Set c = Columns("A:A").Find(what:=Trim(Data(8)), LookIn:=xlValues,
    lookat:=xlWhole)
    If Not c Is Nothing Then
    c.Offset(0, 1) = Left(Data(2), 3)
    c.Offset(0, 2) = Left(Data(7), 35)
    End If
    Loop
    tsread.Close
    End Sub
     
    Utahstew, Apr 13, 2008
    #5
  6. Utahstew

    Utahstew Guest

    The lies that have problems have extra comma at the end.

    The following line work in the code
    Equity,93679108,NYSE - Blockbuster Inc Class
    A,20080102,3.64,,US0936791088,Blockbuster Inc Class A,BBI,93679108

    The following line does not work in the code
    Equity,9.26E+104,NYSE - Enh Eq Yd Pr Fd,20080102,15.63,,US09256E1010,Enh Eq
    Yd Pr Fd,ECV,9.26E+104,,

    Notice the final two comma

    How do I limit theinput line to only the first ten columns and disregard the
    rest?
     
    Utahstew, Apr 13, 2008
    #6
  7. Utahstew

    Joel Guest

    I'm not sure why you modified the size of the Data array and the size of the
    FOR loop. You only need 8 items. Why go through the loop additional times
    and get data you don't need. The original code would of just ignored the
    other data in the line.

    The error is occuring becuase the for loop is now 0 to 10 which is 11 items.
    Dat is defoined as Data(9) which is nine items. The error is occuring
    becuase you are putting more items into the array than it can hold.

    "Utahstew" wrote:

    >
    > The lies that have problems have extra comma at the end.
    >
    > The following line work in the code
    > Equity,93679108,NYSE - Blockbuster Inc Class
    > A,20080102,3.64,,US0936791088,Blockbuster Inc Class A,BBI,93679108
    >
    > The following line does not work in the code
    > Equity,9.26E+104,NYSE - Enh Eq Yd Pr Fd,20080102,15.63,,US09256E1010,Enh Eq
    > Yd Pr Fd,ECV,9.26E+104,,
    >
    > Notice the final two comma
    >
    > How do I limit theinput line to only the first ten columns and disregard the
    > rest?
     
    Joel, Apr 13, 2008
    #7
  8. Utahstew

    Utahstew Guest

    Thank you Joel, your fast response is appreciated

    I had forgotten arrays start at 0.
     
    Utahstew, Apr 13, 2008
    #8
  9. Utahstew

    Joel Guest

    I think you should add an exit sub in the code below. You are going to get
    an error if you don't make the change.

    from
    Title:="Please select a file")
    If Fname = False Then
    ' They pressed Cancel
    MsgBox "Stopping because you did not select a file"
    End If
    to
    Title:="Please select a file")
    If Fname = False Then
    ' They pressed Cancel
    MsgBox "Stopping because you did not select a file"
    exit sub
    End If


    "Utahstew" wrote:

    > Thank you Joel, your fast response is appreciated
    >
    > I had forgotten arrays start at 0.
    >
    >
     
    Joel, Apr 14, 2008
    #9
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. vect98

    Extract only non-matching data

    vect98, Aug 17, 2005, in forum: Microsoft Excel Programming
    Replies:
    4
    Views:
    401
    vect98
    Aug 18, 2005
  2. schellam
    Replies:
    0
    Views:
    309
    schellam
    Nov 29, 2005
  3. Guest
    Replies:
    1
    Views:
    219
    Guest
    Apr 8, 2007
  4. Replies:
    3
    Views:
    234
  5. L.Mathe

    Extract data from csv file

    L.Mathe, Feb 17, 2010, in forum: Microsoft Excel Programming
    Replies:
    4
    Views:
    360
    L.Mathe
    Feb 23, 2010
Loading...

Share This Page