T
Todd_M
I was wondering what anyone might suggest as "best practice" patterns for
streaming out fixed formatted text files with C#? Let's say we get our data
in a dataset table and we need to iterate over thousands..potentially tens
of thousands of rows to create a properly formatted text file -- like an ACH
file, for example. In there you typically have a header, a body block of
detail rows and a trailer. (The trailers usually contain sums of the
details and counts, but for purposes of my question, we'll ignore that in
the comments below)
Concerning only the detail rows, what's the FASTEST, most effecient way to
get them out of the dataset, validate each dataset field and push into/onto
a fixed length line. For example, some checks might be that numbers must be
validated as numbers (They may be stored in the database as strings, but
must be treated as "numbers" and be left padded with zeros. If a field is
length 5 and my number is 23, my output must be 00023 for that block. Let's
say that I could possibly get text in which is invalid and the entire
Dataset row then skipped or at least handled by some other sub function.
Let's say if the number was too large, say it would evaluate to 6 digits,
that the rules would then truncate the left and leave the 5 right most.
Let's say with text it would always Left justfy and be padded with spaces to
the right if less than the specified length or truncated if too long.
Let's say currency comes in as 15.25 from the database, but must display in
a 10 character fixed lenght, left padded and no decimal, ie, 0000001525.
Date time formats must be MMDDYY.
I did just complete an example of this. It runs pretty fast (On my test, it
can output 45K rows in under a few seconds) -- but I ran into some questions
regarding string handling and getting the data formatted and just wondered
about some best practices/patterns for handling some of the situations above
and whether I might have done things better. (For example, for the dates, I
couldn't find a quick and easy one step process to format MMDDYY. I didn't
feel I could depend on the locale formatted short date, so I broke each part
out with a something like this:
int datePart= Convert.ToInt32(tempDate.Month);
(that gave me an integer, say 2 for Feb, which I then appened to a
streamwriter line like this:
textRowLine.AppendFormat("{0
2}",datePart);
When it came to the year part, I pulled YEAR out as a string (gives you
2003, but I needed 03), so now a string I used
..Substring(sYear.Length -2,2); to pull it into an integer and then did the
..AppendFormat(... line like above.
In my case I used a streamwriter to create the filelines. I iterated the
dataset from the first record to the last. At the beginning of each record
loop, I created strings to hold the output of each dataset field (This would
be one of my effeciency questions). I also created a stringbuilder to
represent each "completed' fixed format line. For example, I pulled "check
number" from the dataset and inside a try{} block cast it to an int32. If I
don't fall into the catch, I know it was valid (not text). I used the
..substring method to pull/parse text.
Once I had appended in my stringbuilder line, I then wrote it to the
filestream object with the .WriteLine(textRowLine.ToString();
When it came to currency, I pulled the amount out as a decimal from the
dataset (it's datatype) and multipled by 100, then dropped the precision
with a cast to int and then padded into the stringbuilder with this type of
syntax:
int curAmt= Convert.ToInt32(Convert.ToDecimal(amount)*100);
textRowLine.AppendFormat("{0
10}",curAmt);
What are some alternatives/improvements to capturing errors, manipulating
and padding strings with spaces, validating numeric and padding it with
zeros, and stripping out. Basically it needs to be solid as far as error
handling (can't crash or create corrupt file if it receives, say text in a
numeric place), but have good coding practices., lowest overhead, fastest
code.
For example, above, I did not use a strongly typed dataset, so I couldn't
rely on that to do any checking for types for me. I basically pulled each
field out as a string and validated from there. I would have thought an
easier, more straightforward way for dates to MMDDYY existed, but I couldn't
find one in the short timeframe I had to get this done. It's in retrospect,
I want to review what I did. Let me know. I'd appreciate any feedback
someone has.
Todd
streaming out fixed formatted text files with C#? Let's say we get our data
in a dataset table and we need to iterate over thousands..potentially tens
of thousands of rows to create a properly formatted text file -- like an ACH
file, for example. In there you typically have a header, a body block of
detail rows and a trailer. (The trailers usually contain sums of the
details and counts, but for purposes of my question, we'll ignore that in
the comments below)
Concerning only the detail rows, what's the FASTEST, most effecient way to
get them out of the dataset, validate each dataset field and push into/onto
a fixed length line. For example, some checks might be that numbers must be
validated as numbers (They may be stored in the database as strings, but
must be treated as "numbers" and be left padded with zeros. If a field is
length 5 and my number is 23, my output must be 00023 for that block. Let's
say that I could possibly get text in which is invalid and the entire
Dataset row then skipped or at least handled by some other sub function.
Let's say if the number was too large, say it would evaluate to 6 digits,
that the rules would then truncate the left and leave the 5 right most.
Let's say with text it would always Left justfy and be padded with spaces to
the right if less than the specified length or truncated if too long.
Let's say currency comes in as 15.25 from the database, but must display in
a 10 character fixed lenght, left padded and no decimal, ie, 0000001525.
Date time formats must be MMDDYY.
I did just complete an example of this. It runs pretty fast (On my test, it
can output 45K rows in under a few seconds) -- but I ran into some questions
regarding string handling and getting the data formatted and just wondered
about some best practices/patterns for handling some of the situations above
and whether I might have done things better. (For example, for the dates, I
couldn't find a quick and easy one step process to format MMDDYY. I didn't
feel I could depend on the locale formatted short date, so I broke each part
out with a something like this:
int datePart= Convert.ToInt32(tempDate.Month);
(that gave me an integer, say 2 for Feb, which I then appened to a
streamwriter line like this:
textRowLine.AppendFormat("{0

When it came to the year part, I pulled YEAR out as a string (gives you
2003, but I needed 03), so now a string I used
..Substring(sYear.Length -2,2); to pull it into an integer and then did the
..AppendFormat(... line like above.
In my case I used a streamwriter to create the filelines. I iterated the
dataset from the first record to the last. At the beginning of each record
loop, I created strings to hold the output of each dataset field (This would
be one of my effeciency questions). I also created a stringbuilder to
represent each "completed' fixed format line. For example, I pulled "check
number" from the dataset and inside a try{} block cast it to an int32. If I
don't fall into the catch, I know it was valid (not text). I used the
..substring method to pull/parse text.
Once I had appended in my stringbuilder line, I then wrote it to the
filestream object with the .WriteLine(textRowLine.ToString();
When it came to currency, I pulled the amount out as a decimal from the
dataset (it's datatype) and multipled by 100, then dropped the precision
with a cast to int and then padded into the stringbuilder with this type of
syntax:
int curAmt= Convert.ToInt32(Convert.ToDecimal(amount)*100);
textRowLine.AppendFormat("{0

What are some alternatives/improvements to capturing errors, manipulating
and padding strings with spaces, validating numeric and padding it with
zeros, and stripping out. Basically it needs to be solid as far as error
handling (can't crash or create corrupt file if it receives, say text in a
numeric place), but have good coding practices., lowest overhead, fastest
code.
For example, above, I did not use a strongly typed dataset, so I couldn't
rely on that to do any checking for types for me. I basically pulled each
field out as a string and validated from there. I would have thought an
easier, more straightforward way for dates to MMDDYY existed, but I couldn't
find one in the short timeframe I had to get this done. It's in retrospect,
I want to review what I did. Let me know. I'd appreciate any feedback
someone has.
Todd