Custom File Format & Serialization

P

Phil Price

Hi there,

I'm developing a shape recognition application for the tablet PC for a)
fun b) university project. Currently I'm working on the learning stage
using neural networks, and have to store a load of learning data (a 25
by 25 matrix) each shape group has a number of user drawn shapes, then
the application will create variations of these shapes (by moving nodes
and drawing lines into the matrix between nodes, after normalization).
So as you can imagine there is allot of data floating around in the
program. I've used XML serialization to save the data to disk, and at
the moment it weighs in at a 7.3 meg file for 3 shape groups, with 4
user drawn shapes each, and each user drawn shape having 48 variations
generated.

I'm wondering if there is a better way to save this information, as I
plan to have many shape groups and as you can imagine the file size of
the learning data will go through the roof. I also want to package some
other data with the learning information (for example a graphic preview
of what the shape group actually is, and may stroke information),
without the file size being too huge. So any ideas here?

Personally I have thought about compression, but this means expanding
into memory which is never a good thing, and also the only compression
library of any note that I can find on the magic google is sharpzlip and
this is GPL and I'm not a fan of GPL.

Thanks in advance
 
V

Victor Urnyshev [MSFT]

Hello Phil,

I think the first thing you can do is start using BinaryFromatter class
from System.Runtime.Serialization namespace. It should produce more compact
files. If the file size is still too large, you can do the following things:
- write your own formatter that involves compression/decompression
algorithms
- Implement ISerializable interface in your classes and override the
standard way of serializing/deserializing objects. Probably you can come up
with more efficient encoding for your classes.

I hope this helps.

--
Victor Urnyshev [MSFT]
This post is "AS IS" with no warranties, and confers no rights.
--------------------
|NNTP-Posting-Date: Tue, 25 May 2004 06:44:27 -0500
|Date: Tue, 25 May 2004 12:44:33 +0100
|From: Phil Price <[email protected]>
|User-Agent: Mozilla Thunderbird 0.6 (Windows/20040502)
|X-Accept-Language: en-us, en
|MIME-Version: 1.0
|Newsgroups: microsoft.public.dotnet.general
|Subject: Custom File Format & Serialization
|Content-Type: text/plain; charset=us-ascii; format=flowed
|Content-Transfer-Encoding: 7bit
|Message-ID: <[email protected]>
|Lines: 32
|NNTP-Posting-Host: 213.249.237.180
|X-Trace:
sv3-wKxWJX0RZgjeT1B2+0MPc8a85CX6uzrF7+4YN5Py+5XFzm5CEgG8cTZ5TuDjU/lN3tJZyx2d
bMr7LIG!RLIyHvzbcTr52H0XsB6kc+Ie7iFHFWBP7UT1JME8q1f8r9TfvXaRtTa2QDXngL6yoOHs
4EIwhhFP!+Q==
|X-Complaints-To: (e-mail address removed)
|X-DMCA-Complaints-To: (e-mail address removed)
|X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
|X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your
complaint properly
|X-Postfilter: 1.1
|Path:
cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.s
ul.t-online.de!t-online.de!border2.nntp.dca.giganews.com!border1.nntp.dca.gi
ganews.com!nntp.giganews.com!local1.nntp.dca.giganews.com!nntp.karoo.co.uk!n
ews.karoo.co.uk.POSTED!not-for-mail
|Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.general:135105
|X-Tomcat-NG: microsoft.public.dotnet.general
|
|Hi there,
|
|I'm developing a shape recognition application for the tablet PC for a)
|fun b) university project. Currently I'm working on the learning stage
|using neural networks, and have to store a load of learning data (a 25
|by 25 matrix) each shape group has a number of user drawn shapes, then
|the application will create variations of these shapes (by moving nodes
|and drawing lines into the matrix between nodes, after normalization).
|So as you can imagine there is allot of data floating around in the
|program. I've used XML serialization to save the data to disk, and at
|the moment it weighs in at a 7.3 meg file for 3 shape groups, with 4
|user drawn shapes each, and each user drawn shape having 48 variations
|generated.
|
|I'm wondering if there is a better way to save this information, as I
|plan to have many shape groups and as you can imagine the file size of
|the learning data will go through the roof. I also want to package some
|other data with the learning information (for example a graphic preview
|of what the shape group actually is, and may stroke information),
|without the file size being too huge. So any ideas here?
|
|Personally I have thought about compression, but this means expanding
|into memory which is never a good thing, and also the only compression
|library of any note that I can find on the magic google is sharpzlip and
|this is GPL and I'm not a fan of GPL.
|
|Thanks in advance
|--
|Phil Price
|CS BSc, University Of Hull
|Microsoft Student Partner 2004
|w: www.philprice.net
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top