Data Records from Flat File (COBOL Style)

  • Thread starter Thread starter J. G.
  • Start date Start date
J

J. G.

I'm looking at rewriting some stand-alone Pro*COBOL applications that
read flat files and spit out some reports.

Is there any way to mimic COBOL's ability to read lines from a flat
file into a data structure? I'm hoping C# has a way to define a data
structure that I can read the lines into so that they can be easily
manipulated.

Any advice would be greatly appreciated.
 
J. G. said:
I'm looking at rewriting some stand-alone Pro*COBOL applications that
read flat files and spit out some reports.

Is there any way to mimic COBOL's ability to read lines from a flat
file into a data structure? I'm hoping C# has a way to define a data
structure that I can read the lines into so that they can be easily
manipulated.

Any advice would be greatly appreciated.

I don't know anything about COBOL but I think I see what you need, I'm
pretty sure you can create a struct and then use System.IO to fill the
data in binary mode into the struct, what I'm not sure is how to declare
fixed-size strings within the struct to match your file.
 
JG,

Probably is the only think extra beside just looping and placing
String.Substring(0,10) //which are the first 10 characters
String.Substring(10) //Which is the rest of the record.

Just assuming that I understand what you mean.

Cor
 
That's how things are done now. We grab the first x characters and
throw them into a String variable. We do this until all fields in that
line have been extracted.

What Edgardo mentioned, creating a struct, is more along the lines of
what I'd like to do. I haven't found anything that demonstrates how to
define the fixed-length fields.
 
J.G.

I don't see where my previous answer conflicts with Edgardo"s.

Although would I use a collection class or even more simple in this case a
datatable.

Cor
 
For fixed size strings use byte[] and then use the appropriate encoding to
convert to a string.
 
I understand what the OP wants: rather than creating strings one-by-one
from the bytes that make up the input record, he wants to lay the input
record over some sort of structure (I wish people would stop saying
"struct"... that means something quite different in C#) and just pick
the fields out of the structure, as you can in COBOL or C.

There are several reasons why you can't do this easily in "safe" C#:
the language doesn't "like" to let you look at a class from two
different angles: as a collection of fields and a stream of bits in
memory; there is the problem of mediating between the "standard" data
representation in flat files (ASCII) and the usual data representation
for characters in .NET (Unicode); finally, there is the problem that C#
strings are not just groups of characters... they are special
structures all their own with special rules for how they're managed in
the runtime. So, no, using "normal" C# you can't play the same kinds of
tricks as in older languages.

I suppose that you could do this using "unsafe" mode and pointers, or
some such thing, but that's really not the way that C# is designed to
do things. Besides, I really doubt that you're going to find it a big
deal, considering that people are nowadays parsing multi-gigabyte XML
files, sifting through element tags and attribute markers, and
transforming data into specific data types based on schemas. Compared
to something like that, zipping through a byte array pulling out fields
is going to be turbo-charged.

Yes, it's slower than the COBOL way. Then again you get (in exchange)
much better language technology for doing almost anything else but
reading flat file data. For me, the vastly superior language more than
compensates for the slight penalty during I/O.
 
Bruce,

I was me aware as well about that.

Something as

01 myrec.
03 myfielda pic 9(10).
03 myfieldb pic x(70).
03 myfieldba redifenes myfieldb.
05 myfieldbc pic x(30).
05 myfieldbd pic x(40).

Right?

I almost forgot the dots. :-)

However what is that in fact not different from
myfieldb.substring(10,70).
It declares a string from a certain starting point with a certain length

I do not agree that it should be slower, because it is complete the same,
internal is that Pic nothing more than adres(a) with a length of (y).

:-)

Just my thought,

Cor
 
There is a speed penalty.

The COBOL PIC clause absolutely does not have an internal length. It's
just a sequence of bytes, that the _compiler_ knows happens to be 10
long, or whatever. It does _not_ contain any attached "true length".
That is, if you declare

ABC PIC X(10)

in COBOL, there is _no way_ to indicate that you stored only 5
characters in there.

A call to Substring(5,10), or something like that, copies the ten
characters to another place in memory and adds a length (in this case,
10) to form a string object. Picking the fields out of a larger string
one-by-one involves copying each field in memory and constructing a
string for it, which costs time and memory. My point was that it
doesn't cost much in the grand scheme of things.

In more detail, here is what happens in COBOL and C# in situations like
this one.

COBOL:

ABC PIC X(10).

Any reference to "ABC" simply points to the start of the ten
characters. The compiler may or may not generate instructions to make
sure that you don't run off the end of the 10 characters, depending on
compiler switches you specify. You can manipulate the characters in
place, without copying them anywhere.

C#

string customerNumber = lineString.Substring(5, 10);

The Substring method copies ten characters from the lineString and uses
them to construct a string elsewhere in memory, with a length of 10 and
the characters copied from the lineString. A reference to that string
is then stored in customerNumber. Yes, a more intensive operation than
COBOL (which didn't need to move anything anywhere). If you're
processing a million rows you will notice a difference, but as with
many such applications most of your time will be spent doing I/O, so
even if the in-memory processing is 10x slower, it still won't make all
that much real-time difference.
 
Bruce,

Although in my opinion not everything is true that you wrote about Cobol is
this not a newsgroup so let us not discuss that.

I do agree what you wrote about the immutable string.

Therefore that what Edgardo (and I later) wrote about setting it in a
collection class or datatable will be better when it is more time used.

This can be avoid by reading every line as a byte array, to get the exact
behaviour. However, in my opinion will only a fool do that.

I assume we agree about this.

Cor
 
Yes, we agree on that.

For me, the interesting thing about this discussion is that many people
come to new languages like C# from other languages, and ask, "How do I
do this task in C# _in the same way I did it in my old language_." That
last part is the flaw: you often don't. Just because the OP could use
fields from a record _in situ_ in COBOL doesn't mean that this is an
appropriate solution in C#.

You and Edgardo pointed out one of the correct solutions, which is what
the OP is already doing. I just wanted to back that up with an
explanation that yes, the OP is right: "the C# way" uses more CPU and
more memory, but no, it's not really significant for most programs.

I think we're arguing two different aspects of the same point of view.
:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top