compressing data stored in a database

T

Tarscher

Hi all,

My application uses a database to store big blobs (0.5 - 1 Gb). I have
a List of a class Sample that I store in a blob. Since the blob can
become very big I want to use compression.

The class Sample contains a lot of properties. When I store the List
of samples only a few of the properties are used. Does it makes sense
to make a class sampleSimple that only contains the properties I use
for database storage and thus save storage space?

Is it possible to first compress the blob and then store it in the
database? Does .NET provides such functionality?

Many thanks in advance.

Stijn
 
P

Peter Duniho

Tarscher said:
[...]
The class Sample contains a lot of properties. When I store the List
of samples only a few of the properties are used. Does it makes sense
to make a class sampleSimple that only contains the properties I use
for database storage and thus save storage space?

I would say so. Though, my first thought would be to try to understand
better why you are using a class that has many more properties than you
consider essential for saving.

Also, I admit I don't really get the "blob" stuff -- outside my database
use experience -- but if it's anything like the regular serialization
stuff, you should be able to suppress serialization of specific
properties without creating a whole new class (there's an attribute that
does this per-property).
Is it possible to first compress the blob and then store it in the
database? Does .NET provides such functionality?

Depends on how you are creating the blob. If the mechanism involves
getting a MemoryStream, byte array or similar binary representation of
the data, then you can probably use the GzipStream class to compress the
data before you store it in the database.

Pete
 
P

Peter K

@corp.supernews.com:

Depends on how you are creating the blob. If the mechanism involves
getting a MemoryStream, byte array or similar binary representation of
the data, then you can probably use the GzipStream class to compress the
data before you store it in the database.

Note also that some databases can be configured to automatically compress &
decompress data.
 
C

Cor Ligthert [MVP]

Tarcher,

If your blob is used for images, than AFAIK you can spare the time. Most
image types are so optimalized that compressing can even result in some
extra bytes.

Cor
 
R

Rad [Visual C# MVP]

Hi all,

My application uses a database to store big blobs (0.5 - 1 Gb). I have
a List of a class Sample that I store in a blob. Since the blob can
become very big I want to use compression.

The class Sample contains a lot of properties. When I store the List
of samples only a few of the properties are used. Does it makes sense
to make a class sampleSimple that only contains the properties I use
for database storage and thus save storage space?

Is it possible to first compress the blob and then store it in the
database? Does .NET provides such functionality?

Many thanks in advance.

Stijn

Well, first you might want to rethink your class design. If there are
so many properties that are not necessary for persistence, any
particular reason why they're there?

Also, like Cor has said, for some types of data like MP3, Image, Video
etc you are unlikely to get much savings from additional compression.
What sort of data are you storing in the blob?
 
T

Tarscher

Well, first you might want to rethink your class design. If there are
so many properties that are not necessary for persistence, any
particular reason why they're there?

Also, like Cor has said, for some types of data like MP3, Image, Video
etc you are unlikely to get much savings from additional compression.
What sort of data are you storing in the blob?

--http://bytes.thinkersroom.com

many thanks for all the replies.

There aren't actually that many properties but (4 in total) of which I
use 3 when stored in the database. Since I store a large number of
objects I think every byte I can save in serialization process will
eventuelly add up to the savings.

My object contains a string, an integer and a double. I create a list
of the object with about 1 million objects. This data I thus want to
compress
 
P

Peter Duniho

Tarscher said:
many thanks for all the replies.

There aren't actually that many properties but (4 in total) of which I
use 3 when stored in the database. Since I store a large number of
objects I think every byte I can save in serialization process will
eventuelly add up to the savings.

My object contains a string, an integer and a double. I create a list
of the object with about 1 million objects. This data I thus want to
compress

I count three properties. So I'm assuming those are the three that you
use, and there's a fourth you don't use. Assuming that fourth property
isn't also a string, I doubt you're going to realize any significant
gains by excluding that one last property.

It's true that in absolute terms, you might consider 100MB a lot of
savings. But in a 1GB database, that's only 10% and if you can
satisfactorily handle a 900MB database, you can easily handle a 1GB
database as well.

The compression may provide more value, but you should keep in mind
Cor's point about whether the data can easily be compressed. If it's
mostly string data, it's likely to compress very well. If it's
repetitive binary data, it probably will compress well. If it's binary
data that's already been compressed, it won't compress well at all.

If the data is only pre-compressed some of the time, you may still find
value in compressing it yourself but you will definitely want to put in
a check to see whether the data has actually gotten bigger after
compression to avoid making things worse. Only save the compressed data
if it's smaller than the original (and of course you'll probably want to
store some kind of flag to indicate whether the stored data was
compressed or not...the alternative is to always try to decompress the
data and handle exceptions when it's not actually compressed).

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top