Load large files

P

platinumbay

I am trying to load large flat files; > 250 MB. What is the best practice
for this? I have already read through everything I could find on Google, to
no avail.

I have a SQL DTS process that will load the file in about 7 minutes. My
code takes about an hour.

Here is a snippet:

Dim objSR As StreamReader = System.IO.File.OpenText(objOrigFile)

Dim curEncoding As System.Text.Encoding = objSR.CurrentEncoding

Dim loopX As Integer = 0
Do While objSR.Peek > -1
loopX += 1

Dim strRec As String = objSR.ReadLine
arrA = Split(strRec, coldel)

Try
If trailerSigHas Then
If getRecordData(trailerSigPos - 1).ToString =
trailerSig Then
If loopX < 2 Then
Throw New Exception("Empty extract file")
Else
Exit Do
End If
End If
End If

Dim tmpLngth As Integer = arrA.Length
Dim tmpColCount As Integer =
iNode.Item("columns").ChildNodes.Count ' objDS.Tables(0).Columns.Count
If tmpLngth > tmpColCount Then
Throw New Exception("Too many columns in extract
row.")
ElseIf tmpLngth < tmpColCount Then
Throw New Exception("Not enough columns in extract
row.")
End If

Dim newRow As DataRow = objDS.Tables(0).NewRow

For x As Integer = 0 To objDS.Tables(0).Columns.Count -
1
Dim tmpSqlCol As String =
objDS.Tables(0).Columns(x).Caption
Dim tmpNode As XmlNode =
iNode.SelectSingleNode("//columns/column[@sqlcol='" & tmpSqlCol & "']")
'"//columns/column[sqlcol='" & tmpSqlCol & "']")
Dim attrID As String =
tmpNode.Attributes("id").InnerText
Dim xmlCol As Integer = Convert.ToInt32(attrID)
Dim arrVal As Object = getRecordData(xmlCol - 1)
newRow(x) = GetObjectValue(x, arrVal) ' Does a
Convert to the target column type
Next

objDS.Tables(0).Rows.Add(newRow)
Catch ex As Exception
Dim newRow As DataRow = objDS.Tables(1).NewRow
Dim finalX As Integer = 0
For x As Integer = 0 To objDS.Tables(1).Columns.Count -
1
Try
Dim strColCaption As String =
objDS.Tables(1).Columns(x).Caption
Dim tmpINode As XmlNode =
iNode.SelectSingleNode("//columns/column[@sqlcol='" & strColCaption & "']")

Dim attr2ID As String =
tmpINode.Attributes("id").InnerText
Dim sqlColInt As Integer =
Convert.ToInt32(attr2ID)
If x = 0 Then
newrow(x) = Left(getRecordData(sqlColInt -
1).ToString, 8000)
Else
newrow(x) = Left(getRecordData(sqlColInt -
1).ToString, 250)
End If
Catch ' non-existent column
Finally
finalX = x
End Try
l001f:
Next
newrow(finalX - 4) = Left(ex.Message, 150)
newrow(finalX - 3) = DateTime.Now
newrow(finalX - 2) = tblName
newrow(finalX - 1) = strFileName
newrow(finalX) = loopX
objDS.Tables(1).Rows.Add(newRow)
End Try
Loop
objSR.Close()
objSR = Nothing
End If

objConn.Open()
Dim rowCount As Integer = objDS.Tables(0).Rows.Count
Dim rowCount2 As Integer = objDS.Tables(1).Rows.Count
Dim updCount As Integer

If objConn.State <> ConnectionState.Closed Then objConn.Close()
updCount = objDAmain.Update(objDS, tblName)

If objConn.State <> ConnectionState.Closed Then objConn.Close()
updCount = objDAerr.Update(objDS, tblName & "_err")

objConn.Close()

objDS = Nothing

Thank you very much,

Steve
 
P

platinumbay

I am loading the datatable with the schema, so I can validate the input file
has the correct datatype, row by row, column by column.
 
C

Cor Ligthert

Platiumbay,
I am loading the datatable with the schema, so I can validate the input
file
has the correct datatype, row by row, column by column.

That does the setting of the value in the SQLcommandparameters as well.

However when the schema or that should check the correct datatype you are in
my opinion to late.

Cor
 
P

platinumbay

Unfortunately, the invalid data could be anything. A field overflow,
string in a boolean, too many column, too few columns, invalid date, etc.
Our backend is running legacy code to generate these extracts, and is prone
to failure. Our front end needs to be able to handle anything. Hence why I
am writing this code. I just need to figure out the quickest way to read
the flat file, validate the data, and load it. Validation failure needs to
happen on a row by row basis to load as much good data as possible.
 
C

Cor Ligthert

Platinumbay,

In my opinion stays the way straigt forward.

I would create from your legaly file fields, when I read them line by line
with Substring
Validate those fields and take the right decisions on that.

When the values of a line are correct I would add them to the
parametervalues as I said, and use an command.executenonquiry to fill it.

I think that what you do, as far as I can understand it, you can do as well
with the commandparamaters. However I will never put a catch in my code with
no action what I saw in yours.

http://msdn.microsoft.com/library/d...sqlclientsqlparametercollectionclasstopic.asp

Just my thought,

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top