Which design pattern is good for this?

G

Guest

I need to write a program validate a text file in CSV format. So I will have a

class DataType

and a lot of of derived class for various type, e.g. IntType, StringType,
FloatType, MoneyType, ... etc.

For each column of a type, it may accept null/empty value. or not. It may
have various max length for StringType, IntType,... etc.

And for each column, it may have certain range checking, like some column of
IntType can only between 1 to 25. Some StringType column can only be certain
values.....

Which design patter is best for this? A dictionary with decorate design
pattern? sound too heavy....
 
N

namekuseijin

I need to write a program validate a text file in CSV format. So I will have a

class DataType

and a lot of of derived class for various type, e.g. IntType, StringType,
FloatType, MoneyType, ... etc.

For each column of a type, it may accept null/empty value. or not. It may
have various max length for StringType, IntType,... etc.

And for each column, it may have certain range checking, like some column of
IntType can only between 1 to 25. Some StringType column can only be certain
values.....

Which design patter is best for this? A dictionary with decorate design
pattern? sound too heavy....

how about the WTF?! design pattern?

seriously, a better pattern is DRY: why implement the aforementioned
classes when you could simply do, say, Int.Parse( text ) for a given
chunk of text inside a try block?
 
J

Jon Skeet [C# MVP]

how about the WTF?! design pattern?

seriously, a better pattern is DRY: why implement the aforementioned
classes when you could simply do, say, Int.Parse( text ) for a given
chunk of text inside a try block?

Because it provides encapsulation of parsing and validation. Instead
of having a giant switch statement (or something similar) the OP can
define the columns, and then just keep calling Parse etc. Sounds
reasonable to me.

Now, as for your suggestion: if you're going to try to parse something
and catch exceptions, the TryParse methods are better than calling
Parse inside a try block.

Jon
 
J

Jon Skeet [C# MVP]

I need to write a program validate a text file in CSV format. So I will have a

class DataType

and a lot of of derived class for various type, e.g. IntType, StringType,
FloatType, MoneyType, ... etc.

For each column of a type, it may accept null/empty value. or not. It may
have various max length for StringType, IntType,... etc.

So for a given file, you'll have a list of column definitions, each
containing a parser and a validator, correct? I'd imagine it *may* be
worth combining the parsing and validation - I wouldn't have thought
there'd be many cases where the validator can be used with lots of
different parsers, for instance.
And for each column, it may have certain range checking, like some column of
IntType can only between 1 to 25. Some StringType column can only be certain
values.....

Which design patter is best for this? A dictionary with decorate design
pattern? sound too heavy....

I can't see how a decorator would fit in here. I'd just define an
appropriate interface, and then (once) create a list of column
definitions for your CSV file, each of which implements the interface.
Then either do the splitting at the "top level" and parse each part,
or allow each parser to "take" however much data they need from the
line, from a given position, returning how much source data they've
used up, and the resulting data. The column definitions themselves
should be immutable, unchanged by the process of parsing an entry -
that way they stay reusable.

Now, what do you need to *do* with the data when you've got it? That
will dictate the design of how the results are stored.

Jon
 
N

namekuseijin

Because it provides encapsulation of parsing and validation. Instead
of having a giant switch statement (or something similar) the OP can
define the columns, and then just keep calling Parse etc. Sounds
reasonable to me.

a single method/function definition with the "switch statement"
provides just enough encapsulation to the job at hand. Why waste time
implementing several redundant classes?
Now, as for your suggestion: if you're going to try to parse something
and catch exceptions, the TryParse methods are better than calling
Parse inside a try block.

wow, somehow sounds like they do exactly that underneath...

seriously, a better pattern seems to be KISS...
 
J

Jon Skeet [C# MVP]

namekuseijin said:
a single method/function definition with the "switch statement"
provides just enough encapsulation to the job at hand. Why waste time
implementing several redundant classes?

They're not redundant, IMO - they're encapsulating behaviour, and in a
flexible way. Individual objects are then responsible for defining how
a column behaves, in all aspects of parsing and validating.

Without separate objects for each column, where would you put rules for
lengths, optional/mandatory values, potentially minimum/maximum values
for numbers etc? Is that all going to be part of the giant switch
statement too?

I have no problem with having many small classes, each doing a
particular thing well. I far prefer that to having giant methods.
wow, somehow sounds like they do exactly that underneath...

No, they don't. They avoid the exception being thrown in the first
place.
seriously, a better pattern seems to be KISS...

You think try/catch/ignore expression is simpler than using a method
which tells you whether or not the value was parsed correctly? I
disagree.
 
N

namekuseijin

They're not redundant, IMO - they're encapsulating behaviour, and in a
flexible way. Individual objects are then responsible for defining how
a column behaves, in all aspects of parsing and validating.

my whole point was to point out that "IntType, StringType,
FloatType, MoneyType" are all builtin types already, even with a
handy
Parse method!

that's why it's redundant.
Without separate objects for each column, where would you put rules for
lengths, optional/mandatory values, potentially minimum/maximum values
for numbers etc? Is that all going to be part of the giant switch
statement too?

the giant switch will likely be way shorter than implementing the
useless,
redundant classes for this one-shot problem.
 
J

Jon Skeet [C# MVP]

namekuseijin said:
my whole point was to point out that "IntType, StringType,
FloatType, MoneyType" are all builtin types already, even with a
handy Parse method!
that's why it's redundant.

None of them contain settings for allowing the name of the column,
nullability, other validation etc.

That's part of what would be contained within the column definition,
and some of that varies by type.
the giant switch will likely be way shorter than implementing the
useless, redundant classes for this one-shot problem.

There's more to elegant design than counting lines of code.
 
N

namekuseijin

None of them contain settings for allowing

the OP's original request was:
"I need to write a program validate a text file in CSV format"

that is, given:
name;age;salary;
john;33;3400;
jane;28;2500;

ensure that first column is string of given length, second is int in
range and third is money. None of these constraints are provided by
the CSV per se, but by the programmer, including the order.

an algorithm to process this file:

try {
lines = file.Lines()
lines.next() // drop first line: headers
while (line = lines.next())
{
getName( line )
getAge( line )
getSalary( line )
}
}
catch { "CSV file not ok" }

where, say, getAge could be:
getAge( line )
{
int age = Int.Parse( CSVcolumn( 1, line ) ) // may throw an
exception right away
if !(age between min and max) throw exception
}

this is a lot more useful and simple than implementing whole classes
for such a trivial and one-shot task... KISS
 
J

Jon Skeet [C# MVP]

namekuseijin said:
the OP's original request was:
"I need to write a program validate a text file in CSV format"

that is, given:
name;age;salary;
john;33;3400;
jane;28;2500;

ensure that first column is string of given length, second is int in
range and third is money. None of these constraints are provided by
the CSV per se, but by the programmer, including the order.

Absolutely - the programmer can put in the constraints with the types.
All they need to do is create the column definitions once (which could
easily be done in something like a Spring configuration file) and then
call a method which can parse any file given the column definitions.

There's no need to hard code everything.

getName( line )
getAge( line )
getSalary( line )

So is each of these methods going to split the line? I'd rather split
the line once, and act on each element separately.

Apart from anything else, that's also a lot easier to test.
this is a lot more useful and simple than implementing whole classes
for such a trivial and one-shot task... KISS

Yes, 'cos we all know that CSV files never change format... I believe
my solution would be just as simple, but much more flexible.
 
J

jehugaleahsa

I need to write a program validate a text file in CSV format. So I will have a

class DataType

and a lot of of derived class for various type, e.g. IntType, StringType,
FloatType, MoneyType, ... etc.

For each column of a type, it may accept null/empty value. or not. It may
have various max length for StringType, IntType,... etc.

And for each column, it may have certain range checking, like some column of
IntType can only between 1 to 25. Some StringType column can only be certain
values.....

Which design patter is best for this? A dictionary with decorate design
pattern? sound too heavy....

A friend of mine had me implement a CSV/SSV/XML parser in terms of the
IDataReader interface. It made the project he was working on a breeze.

It also gave him the ability to add column-specific constraints with a
lot more ease. Putting the code in the IDataReader made the project so
small and easy that 3 distinct parsers were done in a day's time.

The DataReader class has a base abstract class that allows you to
specify how the data columns are parsed (this is the only "tricky"
part). The base class provides intuitive conversions from the text
file data to the requested type.

Personally, I treated the derived reader as a business object-like
creature and create Properties for things like Name, Date, Company,
Favorite Ice Cream which would retrieve the correct column and perform
the correct data conversions and do constraint tests. Many people use
IDataReader for their business objects - this is no different.

public DateTime Date
{
get
{
DateTime date = this.GetDate(1); // get date from text file
1st column
// do checks on date
return date;
}
}

I would more than love to send you my class if you are interested.
However it is at work and you will need to wait till Monday or
Tuesday. Just 'Reply to Author'.

Thanks,
Travis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top