Best way to delete the first and the last records of a very big file

  • Thread starter Thread starter booksnore
  • Start date Start date
B

booksnore

I am reading some very large files greater than 10 GB. Some of the files
(not all) contain a header and footer record identified by "***" in the
first three characters of the record. I need to delete the header or
footer record before reading the file into a database. Whats the best
way to do this in C#?
Any help appreciated.

Joe
 
booksnore said:
I am reading some very large files greater than 10 GB. Some of the files
(not all) contain a header and footer record identified by "***" in the
first three characters of the record. I need to delete the header or
footer record before reading the file into a database. Whats the best
way to do this in C#?
Any help appreciated.

Joe

if you're able to loop through the records in the very large files, skip
processing when you see that the record you're on starts with "***".

using System.Text.RegularExpressions;
....

foreach (string s in records) { // example code
Match m = Regex.Match(s, "^***");

if (m.Success) { // this is a header or footer
continue;
} else {
processRecord();
}

}
 
: booksnore wrote:

: > I am reading some very large files greater than 10 GB. Some of the files
: > (not all) contain a header and footer record identified by "***" in the
: > first three characters of the record. I need to delete the header or
: > footer record before reading the file into a database. Whats the best
: > way to do this in C#?
: > Any help appreciated.
:
: if you're able to loop through the records in the very large
: files, skip processing when you see that the record you're on
: starts with "***".
:
: using System.Text.RegularExpressions;
: ...
:
: foreach (string s in records) { // example code
: Match m = Regex.Match(s, "^***");

Of course, that should be @"^\*\*\*"; otherwise, you'll get an exception
at runtime.

: if (m.Success) { // this is a header or footer
: continue;
: } else {
: processRecord();
: }

If you like guards instead, you could write

if (Regex.IsMatch(input, @"^\*\*\*"))
continue;

ObJonSkeet: You could also write the test as

if (input.IndexOf("***") == 0)

Greg
 
booksnore said:
I am reading some very large files greater than 10 GB. Some of the
files (not all) contain a header and footer record identified by
"***" in the first three characters of the record. I need to delete
the header or footer record before reading the file into a database.
Whats the best way to do this in C#?
Any help appreciated.

Joe

That depends on how you are "reading the file into a database". If you are
parsing the file yourself, simply skip the input records you want to (as
identified above as having "***" in the first 3 characters). In other
words, do you really need to DELETE the records or just ignore them?
 
Greg Bacon said:
: if (m.Success) { // this is a header or footer
: continue;
: } else {
: processRecord();
: }

If you like guards instead, you could write

if (Regex.IsMatch(input, @"^\*\*\*"))
continue;

ObJonSkeet: You could also write the test as

if (input.IndexOf("***") == 0)

Or, even more readably:

if (input.StartsWith("***"))

I challenge anyone to claim with a straight face that the regex is a
more readable option than the above...
 
Thank you for the replies. Yes I will be ignoring the records rather
than deleting them. I'm testing your suggestions now to see which will
provide the best performance on very large input.

Thank you again

Joe
 
Back
Top