Questions about building a data engine

A

AMDRIT

Occasionally I try my hand at a simple data storage engine. Today I ran
across an article on the web
http://msdn2.microsoft.com/en-us/library/aa289151(vs.71).aspx, and it got me
thinking again. There doesn't seem to be a definative source, at least on
in layman terms, on this subject.



Here is what I believe to know so far. (As if I was to model after SQL
Server).



Data is stored in pages in blocks of 8Kb in a heap manner.



A single page of data contains a header section, rows of data, and a footer
section

In the header, you will find the page number, its previous page number and
the next page number.

This is a linked list for the owning "table" or object.

The header also contains the pointer the owning object, the pointer to
where the first row of data starts and

how much free space there is for row data.



The footer contains an array of row lengths of each row of data.



A single row of data is varying in length but has specific pieces.

Row Number

Row Type

Row Redirection Pointer

Field Count

Null Rows

Field Lengths

Field Data



As pages of data are needed, they are created in extents (8 pages at a time)
* the growth rate (chunk size) of the data file.



There are two types of indexes, clustered and non-clustered.



A clustered index is the data, meaning that the data is not stored in a heap
but rather stored as the index is specified. Because of the data storage
requirement, there can be only one clustered index per table.



Meanwhile with non-clustered indexes can be numerous, and they themselves
take up data row space. A single index may consist of up to 16 columns in
ascending or descending order and may optionally be case sensitive. Certain
columns are precluded from being part of an index such as Bit, Text, NText,
Binary and so on.



The index itself is supposedly a hybrid AVL tree. Meaning that it is self
balancing, but it can use a redirection pointer to another row of data in
the tree rather than continually compacting the data rows and pages. This
redirection is know as fragmentation.



Essentially the way the index works is by creating a node of a (key and a
pointer value) and adding it to an AVL tree. The key is comprised of some
or all of each column's value. The value is the pointer to the row of data
in the table of data.



Armed with that much information, here is a great place to get started
playing. Then I realize, I don't know where to start. I get stuck at the
same place each time.

Obviously I cannot house all the potential data in memory so some of this
data has to live on the hard drive. That said, how does one balance a tree
that is not in memory?



Since there are potentially multiple tasks going on in this data file at
once, especially if it is for multiple users, how does one perform
simultaneous reads and writes against a single file while managing the all
important locking?



Since there is potentially need for a lot of data to be available to the
users, it would be prudent to load and retain as much of the being accessed
into memory. But how does one limit the amount of resources the data engine
actually consumes?



At some point I would like to use SQL queries to actually work against the
data engine. I know that for this I should consider BNF parser. Does
anyone have ideas on how to get the SQL92 syntax in full BNF form? I have
found a parsing solution "Gold Parser", anyone know if there is a SQL92 BNF
script for Gold, or have another solution to look at?



Anyway, this is a long post and really a curiosity that I am toying with. I
appreciate an assistance any of you may offer. My goal is to stick with
VB.Net as this whole thing started in VB 5 a long, long time ago. In a
pinch, should I need to multicast delegates, I am willing to employ C#.
 
R

Robert Dufour

I can't help you but I admire your curiosity. My approach has been to use
what there is (like sql server) and not ry to reinvent the wheel, but I
guess if everyone did that we'd still be living in caves :).
Best of luck.
Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top