Document Managment

J

Jeff Mason

I hope I can get some advice on a design/technology question here.

The app is a medical claims case management system. It is written in VB.NET and it
works great. It is a Winforms app primarily; there is a web component, but that is
not relevant to this discussion.

A new requirement has recently surfaced which would require the managment of a large
repository of document files. This repository contains several 100,000 files of
various types and sizes. The files arrive by a variety of means, such as scanned
documents, fax images, file uploads, etc. Two file types are TIF image files and
PDF's, both of which have files which can become 200MB or more in size. Most files
are considerably smaller, though.

Desktop users have the need to view, print, edit (e.g Word docs) or otherwise access
one or more files from this repository. They will also add files to it. For
security and audit reasons, the access to these files must be tightly controlled. All
operations are logged. For certain files, editing of their contents is allowed. A
user must "check out" the file for editting and then check it back in when they are
done. The modified file will be added to the repository as a new version. While a
file is checked out, no other user may check out or otherwise access the file, though
they may view prior versions.

Thus, there is a need to be able to efficiently transfer files both to and from a
user's desktop. These transfers would be mediated, presumably, by some kind of file
server/service which would authenticate the user, validate the operation being
performed, create any log data, and transfer the file to/from the appropriate
directory on the server. These requirements seem to suggest that business objects
running on the client will cooperate with objects on a server somewhere to record the
information as appropriate as well as effect the transfer of the files themselves.

We discarded the idea of making the files available via direct access to a network
share, since that would violate the security requirements - we can't have users
messing around, outside of the app, in the repository directory tree. Though I think
that would be by far the simplest (and fastest?) approach, we cannot allow direct
access to the files; all access must be monitored and controlled. Indeed the users
aren't really aware that there are files at all - they deal with cases and the case's
supporting documents. They don't know or care what the filenames are.

We have toyed with the idea of using a Web Service for this. The idea is that web
service methods could be called with appropriate arguments for authentication as well
as the operation being performed. For the file involved, a byte array by reference
could be used as a argument to the service call. The byte array would "be" the file,
and it would then be written as a temporary file on the user's local machine, or in
the case of an upload by a user to the server, written to the appropriate server
directory.

We have developed some proof of concept code and it seems quite straightforward.

But, the problem with this approach is, I think, the large files. While there aren't
many of them, there are enough of them to force us to deal with them. Using Web
Services means the byte array is serialized into an xml stream, increasing the size
by, what, 50%? That is a significant overhead. Also, that would mean that the web
site running the service would require that 200mb byte array to be resident in memory
while being serialized and transferred, and if we had more than a few users doing
that I suspect the web server would be overwhelmed. Indeed, in some of our tests we
have had "Insufficient Resource" errors on the server when using a Binary Reader to
load a large file into a byte array in preparation for returning that array to a
caller.

Does anyone have any thoughts on how to do this? Perhaps some sort of custom
remoting to transfer the file? If the remoting were hosted in ISS (like the
dataportal), then wouldn't the same resource problems exist with the large files? I
saw an article somewhere (in the MS KB?) that showed how to write a service which
would host the remote object, but isn't there still a problem with transferring 200MB
in one big chunk? How would breaking a file into smaller chunks work using
single-call remoting and how would that file be reassembled on the user's system?

Or maybe somebody has an idea for some other approach entirely?

Thanks for any help or insight anyone can offer.

- jeff

-- Jeff
 
G

Guest

But, the problem with this approach is, I think, the large files.
While there aren't many of them, there are enough of them to force us
to deal with them. Using Web Services means the byte array is
serialized into an xml stream, increasing the size by, what, 50%?
That is a significant overhead. Also, that would mean that the web
site running the service would require that 200mb byte array to be
resident in memory while being serialized and transferred, and if we
had more than a few users doing that I suspect the web server would be
overwhelmed. Indeed, in some of our tests we have had "Insufficient
Resource" errors on the server when using a Binary Reader to load a
large file into a byte array in preparation for returning that array
to a caller.

A simple solution to your problem would be to chunk the file - send the
file in 10 - 20MB increments. This will allow you to restart failed
transfer too.

Otherwise, with Microsoft's WSE I think you can do direct data transfer
using WS-DIME:

http://msdn.microsoft.com/msdnmag/issues/02/12/DIME/default.aspx

Uploading large attachments with DIME:
http://www.aspnetworld.com/articles/2004110301.aspx
 
D

David Browne

Jeff Mason said:
I hope I can get some advice on a design/technology question here.

The app is a medical claims case management system. It is written in
VB.NET and it
works great. It is a Winforms app primarily; there is a web component,
but that is
not relevant to this discussion.
.. . .
Does anyone have any thoughts on how to do this? Perhaps some sort of
custom
remoting to transfer the file? If the remoting were hosted in ISS (like
the
dataportal), then wouldn't the same resource problems exist with the large
files? I
saw an article somewhere (in the MS KB?) that showed how to write a
service which
would host the remote object, but isn't there still a problem with
transferring 200MB
in one big chunk? How would breaking a file into smaller chunks work using
single-call remoting and how would that file be reassembled on the user's
system?

Or maybe somebody has an idea for some other approach entirely?

Thanks for any help or insight anyone can offer.

WSS is part of thw Windows Server 2003 OS, and it does everything you
described.

Windows Sharepoint Services
http://www.microsoft.com/technet/windowsserver/sharepoint/V2/default.mspx

V3 is late in beta now.
http://www.microsoft.com/office/preview/technologies/sharepointtechnology/highlights.mspx

David
 
N

Nick Malik [Microsoft]

You have posed a technical problem. There are two solutions:
1) chunk the data. I discussed this idea in detail (gory detail) on my blog
quite some time ago.
http://blogs.msdn.com/nickmalik/archive/2004/11/01/250883.aspx


2) don't solve the problem with web services that you write, but rather by
using packaged software. There are literally dozens of applications that
will handle document management for you, especially using the complex and
varied data management requirements you describe.
A very good backgrounder on this topic can be found at:
http://en.wikipedia.org/wiki/Content_management_system
(incomplete and not well formatted) Lists of products
http://en.wikipedia.org/wiki/Comparison_of_content_management_systems

I suggest that you look into a couple of different products:
Windows SharePoint Server:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;830320
Documentum
http://software.emc.com/products/content_management/content_management.htm

There are some open source products in this space as well. I haven't used
any of them and cannot comment on their capabilities, but some are
well-liked, like Alfresco.

Good luck
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top