Random Access stream access in C#

G

Guest

Hi All,

I am a bit stuck with a project: Specifically, when making a database like
engine in 'the old days', I would have wrapped a record class with a stream
class, so I could have a file of records on disc, such that I could always
jump straight to the record number I wanted. Simple random file access.

Maybe they were fixed length records and I knew the n'th record was
(n*length of record) into the file. I could therefore jump straight to the
(n*length of record)th byte and read out the record, since I knew how long
each fixed length record was. Then by using a b-tree algorithm I could have
myself a mini-database engine.

The question is how do I implement the record management class in C#, using
serialization, so that I can jump straight to a specific record? Since
everything is wrapped up nicely in classes and objects. we don't use pointers
much nowadays, nor know how 'big' classes/structs are in terms of bytes, how
do I jump straight to the record I want in a file of records on disc? (or in
memory, etc).

It may be that I have not thought this through enough, but I don't see an
easy way to do this.

If anybody could point me in the right direction I would be very grateful.

If I have described the problem badly give me a shout and I will do my best
to clarify,

Thanks,
Gary.
 
K

Kevin Spencer

What are your requirements? I'm curious as to why you want to write a
database engine. There are all sorts of ready-made solutions for this these
days. What is the purpose of this?

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.
 
G

Guest

Hi Vadym,

Many thanks for this. I like the index table idea. Sort of

1) I want the 268th record, so I ...
2) look up the 268th record in the index, which tell me where I can find ...
3) the 268th record in the 'proper file.

It is surprising that VB.Net has the FileGetObject Function et al, which I
think would do exactly what I want, but C# has not. (I am not a VB programmer
so I can't say for sure).

However, I appreciate the help.
regards,
Gary.
 
G

Guest

Hi Kevin,
I am an incurable 'must find out how things work' sort of person, and so I
just started doing this as a sort of hobby project when I have time at home.
My first interest was to write a B-Tree implementation and that sort of lead
me onto this.

By day I work as a programmer and you are right - I never would use this in
any production sense that I can think of. We use SQL Server virtually
exclusively, and I don't think I will be trying to rewrite that anytime soon
:cool:

So the answer really is just as a hobby, voyage of discovery type thing -
and I just love writing code!

cheers,
Gary
 
K

Kevin Spencer

Good answer!

You can use a FileStream to do your "jumping around" in the file if you
like, and you still have access to pointers as well when you want/need them.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.
 
C

Cor Ligthert [MVP]

Gary,

Before you give up, wait until Jon gives an answer. He has given more
answers about this in this newsgroup. I can look it up as well, however I am
not sure if I have the right keywords in my mind, so wait on Jon as he is
active these days then I am almost sure that he will answer this.

It is (relative) simple to do as I have understand from Jon's messages. I
thought that it was this class and the corresponding writer class.

http://msdn.microsoft.com/library/d.../html/frlrfsystemiobinaryreaderclasstopic.asp

Cor
 
C

Cor Ligthert [MVP]

Kevin,

In my opinion can there be plenty of reasons to do this. One of them is
speed in extreme situations the other one can be lack of memory (of course
than not with PC's or Servers).

(For the last it should be a very small database, I have used this in past.
However in normal use is the overhead in needed free memory (and the needed
updating/compress routines) very much. That was one of the reasons there
came the related databases).

A little extention to Gary message gives a fine description from a Relative
Database, the data was stored relatively to its starting pointer in an index
and its length. (And of course were the free areas one of the indexes)

Cor
 
K

Kevin Spencer

Hi Cor,
In my opinion can there be plenty of reasons to do this. One of them is
speed in extreme situations the other one can be lack of memory (of course
than not with PC's or Servers).

As I said, I was curious as to his reason. I didn't assume that he didn't
have a good reason. And he answered with an answer which I thought was a
very good reason.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.
 
C

Cor Ligthert [MVP]

Kevin,
As I said, I was curious as to his reason. I didn't assume that he didn't
have a good reason. And he answered with an answer which I thought was a
very good reason.
I like it forever to write about this by my so loved database type. They
were so extremely fast.

Forgive me.

Cor
 
K

Kevin Spencer

Hi Cor,

I have writeen a database in the past as well, using C. That was about 11 or
12 years ago. It was certainly an educational experience. I learned a lot
about B-Tree indexing, relational database concepts, and so on. It helped me
quite a bit when I started using database products and tools.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.
 
G

Guest

Hi Cor,

(Sorry for the delay in replying - hectic day at work!)

Thanks for tip - I will indeed monitor the thread to see if anybody else
wants to contribute. I don't know Jon but I am always apperciative of help.

Also thanks for the link. I will have another look at the binary reader
class, (I have not used it much), that your link points to.

If you can find any more about this I would appreciate it - but don't spend
too much time. As I said to Kevin, it is just a hobby/education/fun thing for
me; although I do appreciate any help of course.

regards,
Gary
 
G

Guest

Kevin,

Ta for that - I will have a good look at the filestream class. We don't use
the IO classes much at work - its mostly data access/ADO type stuff - so I am
not too familiar with them. Thats the whole point of my little voyages of
discovery I guess.

Anyhow, thanks again,
regards,
Gary
(Sorry for the delay in replying - hectic day at work!)
 
J

Jon Skeet [C# MVP]

The question is how do I implement the record management class in C#, using
serialization, so that I can jump straight to a specific record? Since
everything is wrapped up nicely in classes and objects. we don't use pointers
much nowadays, nor know how 'big' classes/structs are in terms of bytes, how
do I jump straight to the record I want in a file of records on disc? (or in
memory, etc).

You would need to work out the format for the records yourself,
computing the size required for each of the fields etc. You could then
use FileStream (and possibly BinaryReader for convenience), setting the
Position property of the stream to seek from one record to another.

Now, that's not a lot of detail, but hopefully it's enough to get you
started - let me know if you need more help.
 
G

Guest

Hi Jon,

Thanks for taking the time to reply - I do appreciate it.

The amount of detail in your answer is fine - thats exactly how I would have
done it in an older language like C/Delphi, etc. Sort of ...

"I want the 67th record, so I multiply the record size by (67 -1), add on
any header size my file/stream probably contains, 'seek' to that location
and read a record. No worries"

I suppose in the world of the CTS and type safety everywhere, I was looking
for a way that did not involve playing around with the size of types, or
pointers, or unsafe code, (if I am going to use 'sizeof' for instance); a
more object oriented way if you will.

Does FileGetObject function et al in VB.Net fulfill this function? (I am not
too familiar with VB.Net, but a [very] quick look at the help file seemed to
say that it did).

Anyhow, if you have time, (and only if - this is only a hobby/fun project
for me), then I would be glad to hear your thoughts.

Thanks in any case for the help,
regards,
Gary.
 
J

Jon Skeet [C# MVP]

Gary Bond said:
Thanks for taking the time to reply - I do appreciate it.

The amount of detail in your answer is fine - thats exactly how I would have
done it in an older language like C/Delphi, etc. Sort of ...

"I want the 67th record, so I multiply the record size by (67 -1), add on
any header size my file/stream probably contains, 'seek' to that location
and read a record. No worries"
Yup.

I suppose in the world of the CTS and type safety everywhere, I was looking
for a way that did not involve playing around with the size of types, or
pointers, or unsafe code, (if I am going to use 'sizeof' for instance); a
more object oriented way if you will.

Well, you don't need unsafe code, and you don't need pointers in the C#
sense, although you'll definitely end up using references (as most
types are reference types).
Does FileGetObject function et al in VB.Net fulfill this function? (I am not
too familiar with VB.Net, but a [very] quick look at the help file seemed to
say that it did).

Anyhow, if you have time, (and only if - this is only a hobby/fun project
for me), then I would be glad to hear your thoughts.

Serialization (which I suspect is what FileGetObject does - I don't
know, I'm afraid) isn't really what you want for this kind of
application, I believe. It's got a lot of extra information such as
type info which you'd already know, and wouldn't be fixed length
either. For instance, if you have two records with strings of different
lengths, the serialized versions would be of different sizes.

The good thing about writing your own reading/writing of objects is:

1) It's very flexible.
2) You can guarantee to understand the file on inspection. No magic
involved.
3) It can be more efficient than a more generalised mechanism (in terms
of both space and time).
 
K

Kevin Spencer

If you want to use pointers (and it seems like you may), you will want to
use C#.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.

Gary Bond said:
Hi Jon,

Thanks for taking the time to reply - I do appreciate it.

The amount of detail in your answer is fine - thats exactly how I would
have
done it in an older language like C/Delphi, etc. Sort of ...

"I want the 67th record, so I multiply the record size by (67 -1), add on
any header size my file/stream probably contains, 'seek' to that location
and read a record. No worries"

I suppose in the world of the CTS and type safety everywhere, I was
looking
for a way that did not involve playing around with the size of types, or
pointers, or unsafe code, (if I am going to use 'sizeof' for instance); a
more object oriented way if you will.

Does FileGetObject function et al in VB.Net fulfill this function? (I am
not
too familiar with VB.Net, but a [very] quick look at the help file seemed
to
say that it did).

Anyhow, if you have time, (and only if - this is only a hobby/fun project
for me), then I would be glad to hear your thoughts.

Thanks in any case for the help,
regards,
Gary.

Jon Skeet said:
You would need to work out the format for the records yourself,
computing the size required for each of the fields etc. You could then
use FileStream (and possibly BinaryReader for convenience), setting the
Position property of the stream to seek from one record to another.

Now, that's not a lot of detail, but hopefully it's enough to get you
started - let me know if you need more help.
 
J

Jon Skeet [C# MVP]

Kevin Spencer said:
If you want to use pointers (and it seems like you may), you will want to
use C#.

I'm not sure why pointers would particularly be useful here. For most
of the time, when people talk about how they would have used pointers
in C/C++, references work perfectly well.

It's *possible* that pointers will be useful, but I doubt it.
 
K

Kevin Spencer

It's *possible* that pointers will be useful, but I doubt it.

I agree, Jon.

He started out by saying that he was doing this for self-education, which is
an admirable purpose, of course, and mentioned that he had previously used
pointers.

I agree that pointers have very limited use in .Net, and in fact, if I were
designing a database application "from scratch" I would not use the old
methods, for a number of reasons. In fact, I would probably use some form of
XML as the basis for something like this. XML can be compiled or pure text,
and with the speed of computing available today, the loss in terms of pure
speed would be more than made up for by the adherence to a standard that is
so extensible and cross-platform-compatible.

At any rate, it seemed like you had, for the most part, taken over the
topic, and I was merely making mention of the fact that, if he is accustomed
to using pointers, he might be better off to work with C#, which I prefer
for several reasons, possibly the least of which (but in my case absolutely
necessary at times) is pointer availability. I didn't want to go into gory
detail about it.

I still don't (I'm a bit snowed these days), so I'm happy to leave him in
your capable hands!

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.
 
V

Vadym Stetsyak

Hello, Jon!

JSC> I'm not sure why pointers would particularly be useful here. For most
JSC> of the time, when people talk about how they would have used pointers
JSC> in C/C++, references work perfectly well.

JSC> It's *possible* that pointers will be useful, but I doubt it.

Maybe, he ment, that pointers will be used to access data by address ( offset ) in the memory.

IMO if working with raw data, then pointers can become usefull.
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
G

Guest

Hi All,

First - thanks again for the help in my efforts to learn a bit.
I think I am getting somewhere, thanks to all your comments: For instance,
taking a 'duff test class' like this

[Serializable]
public class DuffClass
{
private string colour;
public string Colour
{
get { return colour; }
set { colour = value; }
}

public DuffClass(string startColour)
{
colour = startColour;
}
}

whose length could vary from one instantiation to the next, I can save it to
a file like this:

DuffClass dc = new DuffClass("This one is blue");
using (FileStream fs = new FileStream("Gaz.tst", FileMode.CreateNew))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
using (MemoryStream ms = new MemoryStream())
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(ms, dc);
int StrmLength = (int)ms.Length;
// ms.Length gives size of serialized object here
// now save to the file
fs.Write(ms.ToArray(), 0, StrmLength);
}
}
}


Here I can find the length of bytes that represent the serialized object,
via putting it into a MemoryStream first, and I don't have to make
assumptions about the size of the underlying primitives/size of strings etc.
This was the bit that I could not 'get'. (Is there an easier way of doing
this?).

So, I can save lots of different size objects, into one storage file, and
get them back out again via the usual deserialize, since I know where they
start in the file, (assuming I keep track of their start positions via an
index of some description).

// open the file stream, seek to position 'xyz', then deserialize
using (FileStream fs = new FileStream("Gaz.tst", FileMode.Open))
{
fs.Seek(xyz, SeekOrigin.Begin);
IFormatter formatter = new BinaryFormatter();
DuffClass dc2 = (DuffClass)formatter.Deserialize(fs);
}

where 'xyz' is the start position of the object in the storage file.

I know this raises loads more questions than it answers, like

1) Variable length record managers are not easy to write, (what about
deleting records?, do I store the variable length records in lots of small,
fixed length, possibly non-contigous chunks or one big chunk?, what about a
'free-list' of available space in the file where deletions have occurred?,
etc)
2) I haven't even started on the B-Tree class yet, (which is where I started
all this)

but, at least your comments sort of broke my mental deadlock. So, what I am
going to do is:

1) See if I can really understand what is happening when you serialize
something, and maybe implement my own scheme to get 'stuff' into/out of a
stream, (ref Jon's comment on serialization above)
2) Actually write some code to to implement a variable record manager class
along these lines.

Could I come back with some code in the future, (might be a while), on this
topic for your comments?, or is there somewhere more appropriate if no?


In any case, thanks again all for giving me some of your time,
regards,
Gary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top