Inserting a single character into the middle of a 3 GB file

  • Thread starter Thread starter R Avery
  • Start date Start date
R

R Avery

I wrote a program that analyzes a 1.5 GB text (CSV) file and outputs a
3.0 GB text file. However, it turns out that on the 2.3 something
millionth row, it is missing a comma. Therefore, when trying to import
the flat file into SQL Server, it is complaining about data type error.

What i would like to do is to write a VBA program that will just insert
a comma into the appropriate place in the file (I can figure out the
offset that i would like to place it at). If this is not possible
because of the way data is stored on disk, it is also possible for me to
remove a character from elsewhere in the line to make up the
difference. Therefore, what i want to do amounts to changing

20040315,34.349,20020401-45.69,13.34897273,100 into --->
20040315,34.349,20020401,-45.69,13.3489727,100

However, I do not know how to do this. Any help would be appreciated.
 
Sub AddComma()
'Assumes the data is in contiguous rows and stops when it hits a blank row
'adds a comma just before the "-" sign and puts the results in column "B"

Do Until ActiveCell.Value = ""
ActiveCell.Offset(0, 1).Range("A1").Select
ActiveCell.FormulaR1C1 = _

"=LEFT(RC[-1],FIND(""-"",RC[-1],1)-1)&"",""&RIGHT(RC[-1],LEN(RC[-1])-(FIND(""-"",RC[-1],1)-1))"
ActiveCell.Offset(1, -1).Range("A1").Select
Loop
End Sub

Not certain this is what you want, but I hope it helps
 
R Avery said:
I wrote a program that analyzes a 1.5 GB text (CSV) file and outputs a
3.0 GB text file. However, it turns out that on the 2.3 something
millionth row, it is missing a comma. Therefore, when trying to import
the flat file into SQL Server, it is complaining about data type error.

What i would like to do is to write a VBA program that will just insert
a comma into the appropriate place in the file (I can figure out the
offset that i would like to place it at). If this is not possible
because of the way data is stored on disk, it is also possible for me to
remove a character from elsewhere in the line to make up the
difference. Therefore, what i want to do amounts to changing

20040315,34.349,20020401-45.69,13.34897273,100 into --->
20040315,34.349,20020401,-45.69,13.3489727,100

However, I do not know how to do this. Any help would be appreciated.

Please explain *exactly* how you can figure out where the comma is missing.
Is it always before a hyphen? How is the CSV file generated? It is better to
fix what's causing the problem in the first case, don't you think? Right
now, it seems as if you are trying to fix a symptom rather than the actual
problem.

/ Fredrik
 
Open your file for reading, open a new file for writing. Read
consecutive lines from the original file and write them to the new
file. When you get to the problem line, fix it. Write the rest of
lines from the original file.

Should not be a problem assuming you have the spare 3GB....

Tim.
 
Back
Top