About Xml serialization scalability and persistence

  • Thread starter =?ISO-8859-1?Q?=22Andr=E9s_G=2E_Aragoneses_=5B_kno
  • Start date
?

=?ISO-8859-1?Q?=22Andr=E9s_G=2E_Aragoneses_=5B_kno

I am developing a Windows Service that is resident on the machine. The
program needs to synchronize certain object list in memory (an object
typed as List<Foo>) with disc, serializing and deserializing XML.

The first simplest technique I have used ATM is the following:

- At program start, if the XML file exists, deserialize it and create an
object with its content.
- Every time the list changes (an element is added or remove), I
serialize the entire object again and I write a new file (overwriting
the old one).

The problem is that I am a little bit concerned about this going to
production, because of the following matters:

- Scalability: What would happen if the list begins to grow to, for
example, 1000 or 10000 elements? (I suppose that the data serialization
would take much longer and the program would go much slower when the
list is modified).

- Data loss: what would happen if the computer looses energy in the
moment when the serializer is writing data to disc? (I suppose that I
would loose the last list written and I would get a new corrupt file, am
I right?).

Do you have any ideas to improve this behaviour?

Thanks in advance.


Andrés [ knocte ]

--
 
R

Robert May

Andres,

For performance reasons, the biggest question you need to ask is how
frequently this list will change and you'll be writing this list to disc?
If it's only occasionally, then this will work, but otherwise, I might look
for a different solution.

To ensure that this file is always written out, you might look at Messaging
(MSMQ). It can ensure that a message is processed, and can therefore ensure
that the message is written to disc.

I'd do this in the following manner:

When a synchronization request is made, write a message to the queue with
the new value.
Update the list with the new value.

On a separate thread, you have a queue monitor that looks for new messages
in the queue.
When it sees a new message, it peeks the queue (or reads the queue in a
transaction) and starts the file write process to a temporary file.
Upon completion of the file write process, the new temp file is copied over
the top of the old file and the temp file deleted.
The process then consumes the message from the queue (or commits the
transaction) and looks for the next message to process.

If the power fails while the write process is in progress, worst case is
that you end up with a partial temp file, which wastes disk space. Because
the message hasn't been consumed from the queue, the thread that monitors
the queue will see it after the failure and immediately write out the file
again.

Additionally, if you have multiple threads changing the list, you can ensure
that only one change is processed at a time (you could consume all of the
outstanding messages in one file update as well). You could also do batch
updates to the file. Once at startup, and once every 5 minutes or whatever.
This would limit the impact on the overall performance of the application.

Make sense?

Robert


"Andrés G. Aragoneses [ knocte ]" said:
I am developing a Windows Service that is resident on the machine. The
program needs to synchronize certain object list in memory (an object typed
as List<Foo>) with disc, serializing and deserializing XML.

The first simplest technique I have used ATM is the following:

- At program start, if the XML file exists, deserialize it and create an
object with its content.
- Every time the list changes (an element is added or remove), I serialize
the entire object again and I write a new file (overwriting the old one).

The problem is that I am a little bit concerned about this going to
production, because of the following matters:

- Scalability: What would happen if the list begins to grow to, for
example, 1000 or 10000 elements? (I suppose that the data serialization
would take much longer and the program would go much slower when the list
is modified).

- Data loss: what would happen if the computer looses energy in the moment
when the serializer is writing data to disc? (I suppose that I would loose
the last list written and I would get a new corrupt file, am I right?).

Do you have any ideas to improve this behaviour?

Thanks in advance.


Andrés [ knocte ]

--
 
?

=?ISO-8859-1?Q?=22Andr=E9s_G=2E_Aragoneses_=5B_kno

Robert May escribió:
For performance reasons, the biggest question you need to ask is how
frequently this list will change and you'll be writing this list to disc?
If it's only occasionally, then this will work, but otherwise, I might look
for a different solution.

To ensure that this file is always written out, you might look at Messaging
(MSMQ). It can ensure that a message is processed, and can therefore ensure
that the message is written to disc.

I'd do this in the following manner:

When a synchronization request is made, write a message to the queue with
the new value.
Update the list with the new value.

On a separate thread, you have a queue monitor that looks for new messages
in the queue.
When it sees a new message, it peeks the queue (or reads the queue in a
transaction) and starts the file write process to a temporary file.
Upon completion of the file write process, the new temp file is copied over
the top of the old file and the temp file deleted.
The process then consumes the message from the queue (or commits the
transaction) and looks for the next message to process.

If the power fails while the write process is in progress, worst case is
that you end up with a partial temp file, which wastes disk space. Because
the message hasn't been consumed from the queue, the thread that monitors
the queue will see it after the failure and immediately write out the file
again.

Additionally, if you have multiple threads changing the list, you can ensure
that only one change is processed at a time (you could consume all of the
outstanding messages in one file update as well). You could also do batch
updates to the file. Once at startup, and once every 5 minutes or whatever.
This would limit the impact on the overall performance of the application.

Make sense?


Thanks for your comment.
I already thought about the temp file solution but wasn't sure if it was
the most correct. Perhaps I was dreaming about a more elegant one, that
involved some transactional way of accessing the disc (without not
deletting the original file until finally written the next one). I am
not going to use MSMQ because it's not portable, but the idea is
interesting.

Well, the application won't update the list very frequently but I am
still concerned about rewriting the whole object in each modification
(even if I create a thread so as to write the data in a larger
interval). Isn't there a way to write and remove the data incrementally?

Regards,

Andrés [ knocte ]

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top