someone an idea how i can optimize this a bit more

M

M. Posseth

The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1


I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect


annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:) )





Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")
 
K

Ken Tucker [MVP]

Hi,

Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.
http://msdn.microsoft.com/library/d...ml/frlrfsystemtextstringbuilderclasstopic.asp

Ken
-------------------
The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1


I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect


annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:) )





Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")
 
C

Cor Ligthert

Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.
Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.

Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor
 
A

Andy O'Neill

Cor Ligthert said:
Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.


Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor

Did I miss something.
Am I being too simple...?

I thought there was an odbc driver for MySQL.
Wouldn't opening it up with a dataadaptor via odbc be better?
 
C

Chris Dunaway

In addition to the other responses, updating and refreshing a label
will also slow down the process. It might help to delegate the code
that reads the file to its own thread.
 
C

Cor Ligthert

Andy,

When the database is in Africa and the Files are in China.

There is nothing told about the database and what is the reason that this is
the way it should be done.

The question was "how i can optimize this a bit more". Not "how can I do
this better".

It can be that you are right, however the question is in my opinion not
pointing in this direction.

Just my thought,

Cor
 
M

M.Posseth

Andy said:
Did I miss something.
Am I being too simple...?

I thought there was an odbc driver for MySQL.
Wouldn't opening it up with a dataadaptor via odbc be better?
euhh well

what if you only have a sql dump file of 7 gig and a high performance
MSSQL server to get it in
 
M

M.Posseth

Ken said:
Hi,

Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.
http://msdn.microsoft.com/library/d...ml/frlrfsystemtextstringbuilderclasstopic.asp

Ken
-------------------
The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1


I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect


annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:) )





Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")
it is now superior fast on my AMD machine it took 15 minutes !!! this
is great !! the stringbuilder is a cool thingy

while with the normall strings it took 12 hours to only parse 350 mb
worth of files ( on the same machine )

thank you for pointing me in the right direction
 
A

Andy O'Neill

M.Posseth said:
euhh well

what if you only have a sql dump file of 7 gig and a high performance
MSSQL server to get it in

Fair enough.
That'd be a good reason.
 
Top