Byte[] and File

S

shapper

Hello,

I am saving some data on a XML file (txt, jpg, ...) in byte[] format
along with other information.
To avoid having a XML file to big can I create individual files for
that data?

I mean, when I insert some binary data on the XML file that data,
whatever it will be, would be saved in a file.
Then when I retrieve that node on the XML file I will get the byte[]
data from the individual file associated with that node.

I could have a Key value (Guid) on each node of XML file to reference
that individual file which filename could also be that Guid value. And
what would be this file extension?

Does this make sense?
Basically I would keep my XML file small by having the byte[]
data ...

Thanks,
Miguel
 
H

Harlan Messinger

shapper said:
Hello,

I am saving some data on a XML file (txt, jpg, ...) in byte[] format
along with other information.
To avoid having a XML file to big can I create individual files for
that data?

I mean, when I insert some binary data on the XML file that data,
whatever it will be, would be saved in a file.
Then when I retrieve that node on the XML file I will get the byte[]
data from the individual file associated with that node.

I could have a Key value (Guid) on each node of XML file to reference
that individual file which filename could also be that Guid value. And
what would be this file extension?

Does this make sense?
Basically I would keep my XML file small by having the byte[]
data ...

It seems it should be as simple as saving the binary data to files, and
then storing the names of the files in the XML instead of the data.
Then, when you read the XML data, you take all the file names and read
the data from the files they represent. Is that not the answer you're
looking for?

I'm thinking that maybe you're looking for the XML-saving mechanism to
put the data into files automatically, without you having to come up
with names for them. I don't think there's any built-in way to do that.
 
P

Peter Duniho

shapper said:
Hello,

I am saving some data on a XML file (txt, jpg, ...) in byte[] format
along with other information.
To avoid having a XML file to big can I create individual files for
that data? [...]

Sure, you can do that. There's nothing built into XML or the .NET
classes supporting XML that would do that automatically. But it's
simple enough for you to implement. You just need a reasonable place to
put the files, a way to generate file names, and of course the code to
create the relationship between the XML and the file.

If nothing else, you'll benefit from the efficiency of not having to
encode binary data as text. And of course, the XML file itself can be
much smaller in any case. Of course, it severely complicates the
problem of managing the data, because the XML file is no longer
self-contained.

Note that one alternative is to put both the XML and the binary data
into a container file type, so that if someone wants to copy the data,
there's just one file to copy.

One strategy for doing that is to use the .zip format as the container.
This is in fact what Microsoft Office 2007 does. If you want for
people to be able to access the XML and binary data independently of
your own program, using a standard format like .zip, .tar, .rar, etc.
would be important. Otherwise, you can just make up your own if you like.

Pete
 
S

shapper

shapper said:
I am saving some data on a XML file (txt, jpg, ...) in byte[] format
along with other information.
To avoid having a XML file to big can I create individual files for
that data? [...]

Sure, you can do that.  There's nothing built into XML or the .NET
classes supporting XML that would do that automatically.  But it's
simple enough for you to implement.  You just need a reasonable place to
put the files, a way to generate file names, and of course the code to
create the relationship between the XML and the file.

If nothing else, you'll benefit from the efficiency of not having to
encode binary data as text.  And of course, the XML file itself can be
much smaller in any case.  Of course, it severely complicates the
problem of managing the data, because the XML file is no longer
self-contained.

Note that one alternative is to put both the XML and the binary data
into a container file type, so that if someone wants to copy the data,
there's just one file to copy.

One strategy for doing that is to use the .zip format as the container.
  This is in fact what Microsoft Office 2007 does.  If you want for
people to be able to access the XML and binary data independently of
your own program, using a standard format like .zip, .tar, .rar, etc.
would be important.  Otherwise, you can just make up your own if you like.

Pete

But if the xml file is being continued updated and files continued
added will not be to heavy on the application to continuously zip it?
In this case I don't have the need to share with others ...

When you say create my own wrapper file what do you mean?
How can this be done in C#?
And again should it be done when the files are continued changed?

Thanks,
Miguel
 
P

Peter Duniho

shapper said:
But if the xml file is being continued updated and files continued
added will not be to heavy on the application to continuously zip it?

If the XML file is being continually updated and files continued to be
added, won't that be "too heavy" on the application to continuously have
to rewrite the XML and associated binary files?

The fact is, there is overhead in maintaining data. A composite file,
such as a .zip or even an uncompressed format like .tar, will involve
overhead in rewriting the file when changes are made. The best way to
avoid that is to avoid making changes. In any case, if you do have
compression involved, that won't be the main cost. The main cost will
be the i/o for the file.

Of course, if the binary files are themselves not changing, then you can
perhaps gain some efficiency by leaving them separate and only rewriting
the XML. But a) it's not clear from your question that's in fact the
case, and b) there is still value in not spreading your data all around,
even if it is the case.
In this case I don't have the need to share with others ...

Any activity that involves moving or copying this data is relevant,
whether you are the only person involved or not.
When you say create my own wrapper file what do you mean?

My suggestion that you _can_ create your own container file format is
simply that: come up with your own binary format for the file, and
implement that. There's nothing magic about any of the other file
formats; they are simply a stream of bytes arranged in a particular way.

So, decide what arrangement of your bytes is most useful and simplest to
implement for you, and arrange the bytes that way.

One argument in favor of some of the more standard formats is that there
may already be libraries that exist for handling those formats. If so,
you may find it better to use them than to create your own, just because
there's less work involved.
How can this be done in C#?

The same as for any file format. Write the bytes to the file in the
right order.
And again should it be done when the files are continued changed?

It just depends. But generally, I'd say the answer is "yes". If the
data need to be kept together, the simplest way to enforce that is to
make sure they are all in the same file in the first place. Simplicity,
maintainability, and ease-of-use trump other considerations most of the
time.

As the overhead of maintaining a combined file format becomes too costly
for your application, then of course you may find that answer
insufficient. It depends a lot on the size of the files involved, how
often the binary data itself changes, etc.

Note that as a compromise approach, you may find it beneficial to use a
subdirectory in the file system itself as your "folder". This is harder
to enforce, but it's much simpler to implement, and would be more
efficient if you are able to avoid rewriting the binary file parts of
your data when changes not involving them are made.

Pete
 
S

shapper

If the XML file is being continually updated and files continued to be
added, won't that be "too heavy" on the application to continuously have
to rewrite the XML and associated binary files?

The fact is, there is overhead in maintaining data.  A composite file,
such as a .zip or even an uncompressed format like .tar, will involve
overhead in rewriting the file when changes are made.  The best way to
avoid that is to avoid making changes.  In any case, if you do have
compression involved, that won't be the main cost.  The main cost will
be the i/o for the file.

Of course, if the binary files are themselves not changing, then you can
perhaps gain some efficiency by leaving them separate and only rewriting
the XML.  But a) it's not clear from your question that's in fact the
case, and b) there is still value in not spreading your data all around,
even if it is the case.


Any activity that involves moving or copying this data is relevant,
whether you are the only person involved or not.


My suggestion that you _can_ create your own container file format is
simply that: come up with your own binary format for the file, and
implement that.  There's nothing magic about any of the other file
formats; they are simply a stream of bytes arranged in a particular way.

So, decide what arrangement of your bytes is most useful and simplest to
implement for you, and arrange the bytes that way.

One argument in favor of some of the more standard formats is that there
may already be libraries that exist for handling those formats.  If so,
you may find it better to use them than to create your own, just because
there's less work involved.


The same as for any file format.  Write the bytes to the file in the
right order.


It just depends.  But generally, I'd say the answer is "yes".  If the
data need to be kept together, the simplest way to enforce that is to
make sure they are all in the same file in the first place.  Simplicity,
maintainability, and ease-of-use trump other considerations most of the
time.

As the overhead of maintaining a combined file format becomes too costly
for your application, then of course you may find that answer
insufficient.  It depends a lot on the size of the files involved, how
often the binary data itself changes, etc.

Note that as a compromise approach, you may find it beneficial to use a
subdirectory in the file system itself as your "folder".  This is harder
to enforce, but it's much simpler to implement, and would be more
efficient if you are able to avoid rewriting the binary file parts of
your data when changes not involving them are made.

Pete

Let me try to explain it better:

The XML file is not continually updated.
And the individual files are not continually created/updated/deleted.

The only continuous actions are:
1. Get N nodes of the XML file (just the text, nothing else).
2. Get a singular node of the XML file and in this case get the file
associated to it.

So I have the following folder structure:

+ Data Folder
+ Files Folder
- XMLFile1.xml
- XMLFile2.xml
- ...

So XML Files hold text data and a reference for a file in Files
Folder.
When a XML File node is updated, deleted or created an action is taken
on a file in Files Folder.
This action is very occasionally ... 10 times a day.

And as I wrote, when I get ONE node then yer I will get the file. This
is the action taken more often.

About the zip, I think to read a file from the zip it would be fast or
not?
As I have it now is enough for me ... But that zip part seems
interesting even for other situations.

Thank You,
Miguel
 
T

Tim Roberts

shapper said:
But if the xml file is being continued updated and files continued
added will not be to heavy on the application to continuously zip it?

If you are continuously updating and adding things, then XML is a poor
choice for a storage medium. You should use a database. There are a
number of lightweight database libraries you could use, like SQLite.
In this case I don't have the need to share with others ...

Then why use XML at all? XML is an exchange format. It's not a database
format.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top