PC Review


Reply
Thread Tools Rate Thread

Dataset question

 
 
=?Utf-8?B?QmFzIEhhbWVy?=
Guest
Posts: n/a
 
      24th Oct 2005
I guess I don't know how to word it better than that.

Our company has machines that generate log file in our own proprietary
language. A while back I wrote a class that took one of these files and
loaded all the data into a dataset of some predefined tables and some dynamic
tables. This worked well for a while and gave me the ability to do searches,
although each search ended up being a lot of custom code.

Now I'm getting to the point where I need to revisit this code and I'm
trying to create a hierarchy of classes.

So the base class is Log, and that is inherited by a class called
MachineXLog. Now since i know what information would appear in MachinexLog I
could create a strongly typed dataset that defines all the fields of this new
table.

So the problem comes in at this point. I have a large table, of dynamic
fields. I want to run a select statement on this table, and grab a subset of
these fields and stick this data in a strongly typed table for easy coding
access.

One of the big sticking points is performance. I need to move ~12 Gigs of
compressed data trough this process (changed into xml it turn into 600 Gigs
of xml).

..Net 2.0 answers are fine as well.

 
Reply With Quote
 
 
 
 
Nicholas Paldino [.NET/C# MVP]
Guest
Posts: n/a
 
      24th Oct 2005
Bas,

It would seem that the dataset is not a good idea for you. The first
option I would recommend is creating your own objects instead of using typed
data sets and handle the parsing yourself into these objects, into some
format that is more easily searchable. While I am a big advocate for typed
data sets, they are clearly inadequate for some situations (like this one).
Being able to query things in from data sets in general is a pain. Also,
loading this amount of data in memory is going to kill your performance.

The second option (and I dont know how feasable it is) is to store the
log information in a database. Database servers are meant to handle this
amount of data, and perform queries on them. It would save you a ton of
time in development if your store was a DB (with regular backups, for
obvious reasons).

As for the bloat you see when storing the data as XML, there is no way
around it. The very nature of XML is what you are seeing here, since the
persistence format for the infoset is text (and text is not an efficient
representation of data).

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- (E-Mail Removed)

"Bas Hamer" <(E-Mail Removed)> wrote in message
news:382DBB17-9D43-40D6-AB91-(E-Mail Removed)...
>I guess I don't know how to word it better than that.
>
> Our company has machines that generate log file in our own proprietary
> language. A while back I wrote a class that took one of these files and
> loaded all the data into a dataset of some predefined tables and some
> dynamic
> tables. This worked well for a while and gave me the ability to do
> searches,
> although each search ended up being a lot of custom code.
>
> Now I'm getting to the point where I need to revisit this code and I'm
> trying to create a hierarchy of classes.
>
> So the base class is Log, and that is inherited by a class called
> MachineXLog. Now since i know what information would appear in MachinexLog
> I
> could create a strongly typed dataset that defines all the fields of this
> new
> table.
>
> So the problem comes in at this point. I have a large table, of dynamic
> fields. I want to run a select statement on this table, and grab a subset
> of
> these fields and stick this data in a strongly typed table for easy coding
> access.
>
> One of the big sticking points is performance. I need to move ~12 Gigs of
> compressed data trough this process (changed into xml it turn into 600
> Gigs
> of xml).
>
> .Net 2.0 answers are fine as well.
>



 
Reply With Quote
 
 
 
 
=?Utf-8?B?QmFzIEhhbWVy?=
Guest
Posts: n/a
 
      24th Oct 2005
oops, should have clarified this.

The individual sets are ~18 mb of xml. or 600k compressed proprietary
format. I only handle ~4 of these at a time. there are however many files and
some searches requires that I look at all of them. So that is where the
300Gigs comes from.

Parsing them all once and just storing them into the database would be a
nice solution and would remove the time constraints since it would be a
onetime process, but that idea was shutdown from above.



"Nicholas Paldino [.NET/C# MVP]" wrote:

> Bas,
>
> It would seem that the dataset is not a good idea for you. The first
> option I would recommend is creating your own objects instead of using typed
> data sets and handle the parsing yourself into these objects, into some
> format that is more easily searchable. While I am a big advocate for typed
> data sets, they are clearly inadequate for some situations (like this one).
> Being able to query things in from data sets in general is a pain. Also,
> loading this amount of data in memory is going to kill your performance.
>
> The second option (and I dont know how feasable it is) is to store the
> log information in a database. Database servers are meant to handle this
> amount of data, and perform queries on them. It would save you a ton of
> time in development if your store was a DB (with regular backups, for
> obvious reasons).
>
> As for the bloat you see when storing the data as XML, there is no way
> around it. The very nature of XML is what you are seeing here, since the
> persistence format for the infoset is text (and text is not an efficient
> representation of data).
>
> Hope this helps.
>
>
> --
> - Nicholas Paldino [.NET/C# MVP]
> - (E-Mail Removed)
>
> "Bas Hamer" <(E-Mail Removed)> wrote in message
> news:382DBB17-9D43-40D6-AB91-(E-Mail Removed)...
> >I guess I don't know how to word it better than that.
> >
> > Our company has machines that generate log file in our own proprietary
> > language. A while back I wrote a class that took one of these files and
> > loaded all the data into a dataset of some predefined tables and some
> > dynamic
> > tables. This worked well for a while and gave me the ability to do
> > searches,
> > although each search ended up being a lot of custom code.
> >
> > Now I'm getting to the point where I need to revisit this code and I'm
> > trying to create a hierarchy of classes.
> >
> > So the base class is Log, and that is inherited by a class called
> > MachineXLog. Now since i know what information would appear in MachinexLog
> > I
> > could create a strongly typed dataset that defines all the fields of this
> > new
> > table.
> >
> > So the problem comes in at this point. I have a large table, of dynamic
> > fields. I want to run a select statement on this table, and grab a subset
> > of
> > these fields and stick this data in a strongly typed table for easy coding
> > access.
> >
> > One of the big sticking points is performance. I need to move ~12 Gigs of
> > compressed data trough this process (changed into xml it turn into 600
> > Gigs
> > of xml).
> >
> > .Net 2.0 answers are fine as well.
> >

>
>
>

 
Reply With Quote
 
Nicholas Paldino [.NET/C# MVP]
Guest
Posts: n/a
 
      24th Oct 2005
Bas,

Is there any way that you could alter that thinking? What was the
reason for not doing it? After all, the performance you are going to get
out of using a DB is going to trump whatever you do in this area.

If you have to look at all of the files, the only way I can think of
searching through all of them is to open one up, filter the things you want,
copy the filtered records into an object/database, load the next file,
filter, copy to the result set, and so on, and so on.

If possible, you should write your logs directly into the database (in
addition to writing it to your logs, if you wish). That would be the
optimal solution.


--
- Nicholas Paldino [.NET/C# MVP]
- (E-Mail Removed)

"Bas Hamer" <(E-Mail Removed)> wrote in message
news:699AB414-7DC9-48A8-AB89-(E-Mail Removed)...
> oops, should have clarified this.
>
> The individual sets are ~18 mb of xml. or 600k compressed proprietary
> format. I only handle ~4 of these at a time. there are however many files
> and
> some searches requires that I look at all of them. So that is where the
> 300Gigs comes from.
>
> Parsing them all once and just storing them into the database would be a
> nice solution and would remove the time constraints since it would be a
> onetime process, but that idea was shutdown from above.
>
>
>
> "Nicholas Paldino [.NET/C# MVP]" wrote:
>
>> Bas,
>>
>> It would seem that the dataset is not a good idea for you. The first
>> option I would recommend is creating your own objects instead of using
>> typed
>> data sets and handle the parsing yourself into these objects, into some
>> format that is more easily searchable. While I am a big advocate for
>> typed
>> data sets, they are clearly inadequate for some situations (like this
>> one).
>> Being able to query things in from data sets in general is a pain. Also,
>> loading this amount of data in memory is going to kill your performance.
>>
>> The second option (and I dont know how feasable it is) is to store
>> the
>> log information in a database. Database servers are meant to handle this
>> amount of data, and perform queries on them. It would save you a ton of
>> time in development if your store was a DB (with regular backups, for
>> obvious reasons).
>>
>> As for the bloat you see when storing the data as XML, there is no
>> way
>> around it. The very nature of XML is what you are seeing here, since the
>> persistence format for the infoset is text (and text is not an efficient
>> representation of data).
>>
>> Hope this helps.
>>
>>
>> --
>> - Nicholas Paldino [.NET/C# MVP]
>> - (E-Mail Removed)
>>
>> "Bas Hamer" <(E-Mail Removed)> wrote in message
>> news:382DBB17-9D43-40D6-AB91-(E-Mail Removed)...
>> >I guess I don't know how to word it better than that.
>> >
>> > Our company has machines that generate log file in our own proprietary
>> > language. A while back I wrote a class that took one of these files and
>> > loaded all the data into a dataset of some predefined tables and some
>> > dynamic
>> > tables. This worked well for a while and gave me the ability to do
>> > searches,
>> > although each search ended up being a lot of custom code.
>> >
>> > Now I'm getting to the point where I need to revisit this code and I'm
>> > trying to create a hierarchy of classes.
>> >
>> > So the base class is Log, and that is inherited by a class called
>> > MachineXLog. Now since i know what information would appear in
>> > MachinexLog
>> > I
>> > could create a strongly typed dataset that defines all the fields of
>> > this
>> > new
>> > table.
>> >
>> > So the problem comes in at this point. I have a large table, of dynamic
>> > fields. I want to run a select statement on this table, and grab a
>> > subset
>> > of
>> > these fields and stick this data in a strongly typed table for easy
>> > coding
>> > access.
>> >
>> > One of the big sticking points is performance. I need to move ~12 Gigs
>> > of
>> > compressed data trough this process (changed into xml it turn into 600
>> > Gigs
>> > of xml).
>> >
>> > .Net 2.0 answers are fine as well.
>> >

>>
>>
>>



 
Reply With Quote
 
=?Utf-8?B?QmFzIEhhbWVy?=
Guest
Posts: n/a
 
      24th Oct 2005
In effect that is what I do now. I generate a number of searcher threads to
fetch open, and search files and just write the results to a file. got it
down to ~6 hours on a 3.2 GHz p4 HT running 4 threads.

I guess the database approach is the most feasable as our files are starting
to grow bothe in number and in size

"Nicholas Paldino [.NET/C# MVP]" wrote:

> Bas,
>
> Is there any way that you could alter that thinking? What was the
> reason for not doing it? After all, the performance you are going to get
> out of using a DB is going to trump whatever you do in this area.
>
> If you have to look at all of the files, the only way I can think of
> searching through all of them is to open one up, filter the things you want,
> copy the filtered records into an object/database, load the next file,
> filter, copy to the result set, and so on, and so on.
>
> If possible, you should write your logs directly into the database (in
> addition to writing it to your logs, if you wish). That would be the
> optimal solution.
>
>
> --
> - Nicholas Paldino [.NET/C# MVP]
> - (E-Mail Removed)
>
> "Bas Hamer" <(E-Mail Removed)> wrote in message
> news:699AB414-7DC9-48A8-AB89-(E-Mail Removed)...
> > oops, should have clarified this.
> >
> > The individual sets are ~18 mb of xml. or 600k compressed proprietary
> > format. I only handle ~4 of these at a time. there are however many files
> > and
> > some searches requires that I look at all of them. So that is where the
> > 300Gigs comes from.
> >
> > Parsing them all once and just storing them into the database would be a
> > nice solution and would remove the time constraints since it would be a
> > onetime process, but that idea was shutdown from above.
> >
> >
> >
> > "Nicholas Paldino [.NET/C# MVP]" wrote:
> >
> >> Bas,
> >>
> >> It would seem that the dataset is not a good idea for you. The first
> >> option I would recommend is creating your own objects instead of using
> >> typed
> >> data sets and handle the parsing yourself into these objects, into some
> >> format that is more easily searchable. While I am a big advocate for
> >> typed
> >> data sets, they are clearly inadequate for some situations (like this
> >> one).
> >> Being able to query things in from data sets in general is a pain. Also,
> >> loading this amount of data in memory is going to kill your performance.
> >>
> >> The second option (and I dont know how feasable it is) is to store
> >> the
> >> log information in a database. Database servers are meant to handle this
> >> amount of data, and perform queries on them. It would save you a ton of
> >> time in development if your store was a DB (with regular backups, for
> >> obvious reasons).
> >>
> >> As for the bloat you see when storing the data as XML, there is no
> >> way
> >> around it. The very nature of XML is what you are seeing here, since the
> >> persistence format for the infoset is text (and text is not an efficient
> >> representation of data).
> >>
> >> Hope this helps.
> >>
> >>
> >> --
> >> - Nicholas Paldino [.NET/C# MVP]
> >> - (E-Mail Removed)
> >>
> >> "Bas Hamer" <(E-Mail Removed)> wrote in message
> >> news:382DBB17-9D43-40D6-AB91-(E-Mail Removed)...
> >> >I guess I don't know how to word it better than that.
> >> >
> >> > Our company has machines that generate log file in our own proprietary
> >> > language. A while back I wrote a class that took one of these files and
> >> > loaded all the data into a dataset of some predefined tables and some
> >> > dynamic
> >> > tables. This worked well for a while and gave me the ability to do
> >> > searches,
> >> > although each search ended up being a lot of custom code.
> >> >
> >> > Now I'm getting to the point where I need to revisit this code and I'm
> >> > trying to create a hierarchy of classes.
> >> >
> >> > So the base class is Log, and that is inherited by a class called
> >> > MachineXLog. Now since i know what information would appear in
> >> > MachinexLog
> >> > I
> >> > could create a strongly typed dataset that defines all the fields of
> >> > this
> >> > new
> >> > table.
> >> >
> >> > So the problem comes in at this point. I have a large table, of dynamic
> >> > fields. I want to run a select statement on this table, and grab a
> >> > subset
> >> > of
> >> > these fields and stick this data in a strongly typed table for easy
> >> > coding
> >> > access.
> >> >
> >> > One of the big sticking points is performance. I need to move ~12 Gigs
> >> > of
> >> > compressed data trough this process (changed into xml it turn into 600
> >> > Gigs
> >> > of xml).
> >> >
> >> > .Net 2.0 answers are fine as well.
> >> >
> >>
> >>
> >>

>
>
>

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
copying a datatable content from an untyped dataset into a table which is inside a typed dataset Nedu N Microsoft Dot NET Framework 3 31st Oct 2003 02:05 PM
Ccopying a datatable content from an untyped dataset into a table which is inside a typed dataset Nedu N Microsoft Dot NET Framework 2 31st Oct 2003 03:39 AM
Re: copy/move rows from a dataset to another dataset? Stephen Muecke Microsoft ADO .NET 1 22nd Jul 2003 04:56 PM
Re: Merging untyped dataset into a typed dataset (GUID problems) Lewis Edward Moten III Microsoft ADO .NET 0 14th Jul 2003 09:13 PM
GetChanges in a Typed Dataset returns a DataSet?? Paddy Microsoft ADO .NET 1 5th Jul 2003 05:59 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 01:53 AM.