What file format to use?

A

Andy B.

I have a program that manages "notebooks". Imagine these notebooks to be
like the 3 ring binders you would use for school or work to keep your
paperwork in. The UI has a combobox to list existing notebooks, a listview
to list each section and page of each notebook and a pane for the pages
content and attachments like images or mp3 files. It is similar to One Note
2007 or 2010. The One Note mobile on my WM6.5 smartphone would be ok, but it
is severely limited and I need to create my own notebook manager. The
program has the following requirements for notebook storage:

1. Each notebook should be a file on the filesystem
2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.
3. I can't use sql or sql mobile.
4. The notebook file creation should be able to be done with C# code. I.E.
File>New>Notebook
5. The notebook files can't be read by any other program.

What file format should I use for this type of project? How would I get
started with it? Hopefully this program will also be a desktop model too.
 
F

Family Tree Mike

I have a program that manages "notebooks". Imagine these notebooks to be
like the 3 ring binders you would use for school or work to keep your
paperwork in. The UI has a combobox to list existing notebooks, a listview
to list each section and page of each notebook and a pane for the pages
content and attachments like images or mp3 files. It is similar to One Note
2007 or 2010. The One Note mobile on my WM6.5 smartphone would be ok, but it
is severely limited and I need to create my own notebook manager. The
program has the following requirements for notebook storage:

1. Each notebook should be a file on the filesystem
2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.
3. I can't use sql or sql mobile.
4. The notebook file creation should be able to be done with C# code. I.E.
File>New>Notebook
5. The notebook files can't be read by any other program.

What file format should I use for this type of project? How would I get
started with it? Hopefully this program will also be a desktop model too.

Items 1 and 2 seem to me to indicate the contents of the notebook should
be file links and not the actual data making up the zip file, as an
example. Is that your intent? This allows the file to still be opened
by your user, or by your notebook program, so long as care is taken for
broken links. This is more like an IE "Favorites" system than a
notebook though.

For item 5, if you want to prevent snooping, use encrypted data of some
form. Encrypted strings within binary serialization of the container
object would prevent all but the most dedicated snoopers from trying.
Just store these serialized objects to the file system if you want your
users to click the files to open your app.
 
P

Peter Duniho

Andy said:
[...] The
program has the following requirements for notebook storage:

1. Each notebook should be a file on the filesystem

Fine. Do that. :)
2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.

I read this to mean you want a single file for each notebook, where the
file contains all of the data for any attachments. That is also fine.
Personally, I'd use XML because it's easy. But that would mean you'll
have to encode binary data as text (e.g. as Base64).

For a mobile app, you'll probably want to gzip the output, which will
compress the data and offer a modicum of obfuscation (not security…just
obfuscation; the file won't be immediately recognizable to a human, as
it will be apparently randomized binary data).
3. I can't use sql or sql mobile.

Fine. Don't do that. :)
4. The notebook file creation should be able to be done with C# code. I.E.
File>New>Notebook

I don't understand how those two sentences go together. You can read
and write any kind of data using C#. You can also create UI menus using
C#. The two don't have anything to do with each other though.
5. The notebook files can't be read by any other program.

You cannot accomplish this if the data is stored anywhere other than on
a computer you control the physical access for. As soon as you deploy a
program that has the ability to read and write your file format, you
have handed over to everyone else all the information they need in order
to figure out your file format.

Even encryption cannot prevent access, because your program necessarily
will have the encryption key required.

The one exception to this is if the user provides the encryption key for
you (e.g. as an actual key, or a passphrase used to generate a key,
etc.). Note, however, that doing it this way means there's a
possibility the user will forget their key and lose all access to their
data. You will have to weigh that cost with whatever benefit you hope
to gain by securing the data.
What file format should I use for this type of project? How would I get
started with it? Hopefully this program will also be a desktop model too.

You can use any file format you like. I prefer XML for its simplicity.
But, you can use .NET's built-in serialization, SOAP, binary, plain
XML. Or you can invent your own format…perhaps something like TIFF or
RIFF to support the attachments (those formats are basically binary, but
include structure to demarcate specific chunks of data; you can have a
master chunk of data, that then refers to attachment "child chunks" of
data).

Pete
 
A

Andy B.

Family Tree Mike said:
Items 1 and 2 seem to me to indicate the contents of the notebook should
be file links and not the actual data making up the zip file, as an
example. Is that your intent? This allows the file to still be opened by
your user, or by your notebook program, so long as care is taken for
broken links. This is more like an IE "Favorites" system than a notebook
though.

The attachments have to be embedded into the notebook file.
For item 5, if you want to prevent snooping, use encrypted data of some
form. Encrypted strings within binary serialization of the container
object would prevent all but the most dedicated snoopers from trying. Just
store these serialized objects to the file system if you want your users
to click the files to open your app.

I was thinking of that. Now, what file format should be used for the
notebook containers? xml, binary, database of some kind (rather stay away
from databases)?=, xml?
 
A

Andy O'Neill

Andy B. said:
I have a program that manages "notebooks". Imagine these notebooks to be
like the 3 ring binders you would use for school or work to keep your
1. Each notebook should be a file on the filesystem
2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.

Maybe a customised compression algorithm.
Plus chained lists of data like are often used in computer games.
Then you have to secure your algorithm.
 
A

Andy B.

Peter Duniho said:
Andy said:
[...] The program has the following requirements for notebook storage:

2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.

I read this to mean you want a single file for each notebook, where the
file contains all of the data for any attachments. That is also fine.
Personally, I'd use XML because it's easy. But that would mean you'll
have to encode binary data as text (e.g. as Base64).

Do you have any msdn articles on how to put binary into xml files? You are
starting to make my life interesting now, because I might be able to avoid
databases now with a web app I am working on...grin!
For a mobile app, you'll probably want to gzip the output, which will
compress the data and offer a modicum of obfuscation (not security…just
obfuscation; the file won't be immediately recognizable to a human, as it
will be apparently randomized binary data).

Ok. Same as above. Any msdn articles on how to do this gzip compression?
Fine. Don't do that. :)

I am finding sql and databases to be quite large and very time consuming to
create and manage. and hey, if xml is portable...
You cannot accomplish this if the data is stored anywhere other than on a
computer you control the physical access for. As soon as you deploy a
program that has the ability to read and write your file format, you have
handed over to everyone else all the information they need in order to
figure out your file format.

Even encryption cannot prevent access, because your program necessarily
will have the encryption key required.

The one exception to this is if the user provides the encryption key for
you (e.g. as an actual key, or a passphrase used to generate a key, etc.).
Note, however, that doing it this way means there's a possibility the user
will forget their key and lose all access to their data. You will have to
weigh that cost with whatever benefit you hope to gain by securing the
data.

I have some apps that are like that. Lose your password and all your data is
gone for good (not ones I made though). This sounds fine to me since it will
be a choice for each new notebook. Any msdn articles on how to store keys
created from user passwords? If possible, I would probably encode them into
the xml file unless that isn't a good idea.
You can use any file format you like. I prefer XML for its simplicity.
But, you can use .NET's built-in serialization, SOAP, binary, plain XML.
Or you can invent your own format…perhaps something like TIFF or RIFF to
support the attachments (those formats are basically binary, but include
structure to demarcate specific chunks of data; you can have a master
chunk of data, that then refers to attachment "child chunks" of data).

How do you invent a file type?
Is there a comparison table on these formats and their pros/cons?
 
P

Peter Duniho

Andy said:
Do you have any msdn articles on how to put binary into xml files? You are
starting to make my life interesting now, because I might be able to avoid
databases now with a web app I am working on...grin!

System.Convert class. FromBase64String() and ToBase64String() methods.
Ok. Same as above. Any msdn articles on how to do this gzip compression?
System.Compression.GzipStream
[...]
The one exception to this is if the user provides the encryption key for
you (e.g. as an actual key, or a passphrase used to generate a key, etc.).
Note, however, that doing it this way means there's a possibility the user
will forget their key and lose all access to their data. You will have to
weigh that cost with whatever benefit you hope to gain by securing the
data.

I have some apps that are like that. Lose your password and all your data is
gone for good (not ones I made though). This sounds fine to me since it will
be a choice for each new notebook. Any msdn articles on how to store keys
created from user passwords? If possible, I would probably encode them into
the xml file unless that isn't a good idea.

I know less about the encryption stuff. But there is a
System.Security.Cryptography namespace with lots of promising classes.

You'd have to _really_ need a high degree of security to go to that kind
of trouble though. Note that depending on who you're trying to defend
against, you may or may not need the crypto stuff. There are two
avenues of attack: hacking your program to find the key, and deciphering
the crypto itself.

If you only need to protect against the former, then some simple
encryption using a user-provided key would be fine. If you only need to
protect against the latter, then the strong crypto classes in .NET using
a built-in key would be fine. It's only if you have a strong need to
protect against both kinds of attacks that you'll need to figure out how
to allow users to provide keys for the strong crypto classes in .NET.
How do you invent a file type?

A file format is simply a sequence of bytes representing your data.
Inventing one is as simple as deciding what bytes you need to store, and
deciding in what order you want to store them. Optionally providing for
features such as allowing for variable-length data, missing data, and
random-access within the file (these are features that the "tagged" file
formats like TIFF and RIFF are particularly good at).
Is there a comparison table on these formats and their pros/cons?

I doubt it. I haven't seen one. If it does exist, Google probably
knows about it.

Pete
 
P

Peter Webb

Andy B. said:
I have a program that manages "notebooks". Imagine these notebooks to be
like the 3 ring binders you would use for school or work to keep your
paperwork in. The UI has a combobox to list existing notebooks, a listview
to list each section and page of each notebook and a pane for the pages
content and attachments like images or mp3 files. It is similar to One Note
2007 or 2010. The One Note mobile on my WM6.5 smartphone would be ok, but
it is severely limited and I need to create my own notebook manager. The
program has the following requirements for notebook storage:

1. Each notebook should be a file on the filesystem
2. Each notebook should be able to store binary data like Word files, zip
files, mp3 files, images and so on as page/section attachments.
3. I can't use sql or sql mobile.
4. The notebook file creation should be able to be done with C# code. I.E.
File>New>Notebook
5. The notebook files can't be read by any other program.

What file format should I use for this type of project? How would I get
started with it? Hopefully this program will also be a desktop model too.

Your requirement (1) makes this harder than it could be.

If each notebook was a file with a subdirectory for holding attachments, it
would be much easier. MS use this model themselves on occasion, for example
when converting a Word document to a web page.

You could use the .Net isolated storage class, and if you used this then the
data structures would only be accessible to your app and nothing except your
program would even know whether you were using files or folders.

I agree that the actual notebook page could be stored as xml, with the
embedded data held as file names within the folder associated with that
page.
 
A

Andy B.

Peter Duniho said:
System.Convert class. FromBase64String() and ToBase64String() methods.

Got it!

System.Compression.GzipStream

Got it!
[...]
The one exception to this is if the user provides the encryption key for
you (e.g. as an actual key, or a passphrase used to generate a key,
etc.). Note, however, that doing it this way means there's a possibility
the user will forget their key and lose all access to their data. You
will have to weigh that cost with whatever benefit you hope to gain by
securing the data.

I have some apps that are like that. Lose your password and all your data
is gone for good (not ones I made though). This sounds fine to me since
it will be a choice for each new notebook. Any msdn articles on how to
store keys created from user passwords? If possible, I would probably
encode them into the xml file unless that isn't a good idea.

I know less about the encryption stuff. But there is a
System.Security.Cryptography namespace with lots of promising classes.

Also, there is the Enterprise Library 4.1 that has crypto stuff in it as
well.
You'd have to _really_ need a high degree of security to go to that kind
of trouble though. Note that depending on who you're trying to defend
against, you may or may not need the crypto stuff. There are two avenues
of attack: hacking your program to find the key, and deciphering the
crypto itself.

It would probably be keeping the crypto from being figured out. I should
probably figure out how to protect the key as well because I don't know if
my app will ever make it to someone other than myself. Better safe than
sorry...
 
A

Andy O'Neill

You could use the .Net isolated storage class, and if you used this then
the data structures would only be accessible to your app and nothing
except your program would even know whether you were using files or
folders.

Isolated storage is just an oddly named folder.
Could work if the security isn't critical.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top