Please advise on "normalizing" my data sets

S

Stephen Walch

We have an ADO.NET in our project which unfortunately is not quite
normalized. In particular, there are columns who's values are string
arrays. An example from the schema:


<xs:schema id="MyDataSet" ...>
<xs:element name="MyDataSet" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Item">
<xs:complexType>
<xs:sequence>
<xs:element name="itemid" type="xs:string" />
<xs:element name="categories" msdata:DataType="System.String[]"
type="xs:string" />
....

For a number of reasons (including the fact that XML Serialization does not
seem to work very well with this type of situation) I would like look at
normalizing this data set. That is, create a separate table(s) and create
the appropriate data relations.

I am no database normalization expert, but I am sure I can (eventually)
construct a proper schema and write a routine to convert the data for this
particular case. But there are a lot of cases like this and it would be
great to have a general facility for doing such things. Has anyone
encountered a class/tool/sample/paper that helps for taking an ADO.NET data
set and normalizes it?

Thanks,

-Steve
 
K

Kevin Yu [MSFT]

Hi Stephen,

Based on my understanding, your typed dataset doesn't work properly during
XML Serialization. I think there might be some problem with the data type
of the column. The data type of a column cannot be a string array. Please
try to check the following link for supported .NET Framework types in a
data column.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemdatadatacolumnclassdatatypetopic.asp

If you have to put several strings in a single column, would you try to
link them into a string with separators?

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
S

Stephen Walch

Yes, I know it does not work! That is why I am am looking for tools,
examples, or information on converting to a normalized DataSet.

-Steve
 
K

Kevin Yu [MSFT]

Hi Stephen,

What I suggest is to use two related DataTables to achieve this.

For example, we're creating a DataSet containing information about authers
and books. One auther can have serveral books. Originally, we put all his
book names in a string array. However, string array is not a valid type for
DataColumn.DataType property.

Original table structure:

tbl_AuthorBook:
AuthorID
AuthorName
Phone
BookNames
......

A record will look like this:
1 John Smith 555-1234 BookName1; BookName2;
BookName3

Now, we can design the tables like this. The first DataTable contains the
columns about AutherID, AutherName and other information.

tbl_Author:
AuthorID
AuthorName
Phone
......

The second table contains book information and the AuthorD

tbl_Book:
BookID
BookName
AuthorID

Here the AuthorID is called a foreign key. The value must be one in the
AuthorID column in the tbl_Author. We can use a DataRelation object to link
these to tables. The DataRelation can either be created in the design view
of a typed DataSet or in code. Please refer to the following link for more
information about the DataRelation class:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemdatadatarelationclasstopic.asp

The tbl_Author is called Parent Table and tbl_Book is called Child Table.
Now the records in the table looks like this:

tbl_Author:
1 John Smith 555-1234
2 Keanu Reeves 987-6543

tbl_Book:
1 Programming ADO.NET 1
2 Essential ADO.NET 1
3 Matrix One 2
4 Matrix Reloaded 2

Thus creates a hierarchical DataSet.

<xs:schema id="Dataset1" targetNamespace="http://tempuri.org/Dataset1.xsd"
elementFormDefault="qualified"
attributeFormDefault="qualified" xmlns="http://tempuri.org/Dataset1.xsd"
xmlns:mstns="http://tempuri.org/Dataset1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="Dataset1" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="tbl_Author">
<xs:complexType>
<xs:sequence>
<xs:element name="AuthorID" type="xs:int" />
<xs:element name="AuthorName" type="xs:string" minOccurs="0" />
<xs:element name="Phone" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="tbl_Book">
<xs:complexType>
<xs:sequence>
<xs:element name="BookID" type="xs:int" minOccurs="0" />
<xs:element name="BookName" type="xs:string" minOccurs="0" />
<xs:element name="AuthorID" type="xs:int" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="Dataset1Key1" msdata:primaryKey="true">
<xs:selector xpath=".//mstns:tbl_Author" />
<xs:field xpath="mstns:AuthorID" />
</xs:unique>
<xs:key name="tbl_BookKey">
<xs:selector xpath=".//mstns:tbl_Book" />
<xs:field xpath="mstns:BookID" />
</xs:key>
<xs:keyref name="tbl_Authortbl_Book" refer="Dataset1Key1">
<xs:selector xpath=".//mstns:tbl_Book" />
<xs:field xpath="mstns:AuthorID" />
</xs:keyref>
</xs:element>
</xs:schema>

Now if we're trying to get the books of an author, we can use the
DataRow.GetChildRows method to do this. Here's an example. (Suppose we have
the parent row reference "parentrow")

DataRow[] dr = parentrow.GetChildRows(ds.Relations["tbl_Authortbl_Book"]);

For more information about DataRow.GetChildRows method, please refer to the
following link:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemdatadatarowclassgetchildrowstopic.asp

Does this answer your question? If anything is unclear, please feel free to
reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
S

Stephen Walch

Yes this pretty much tells me what I need to do. My only remaining question
is whether there are any tools out there that will automate the process of
reading one data set and creating another one with normalized tables. At
this point, my guess is "probabaly not".

Thanks for your help.

Kevin Yu said:
Hi Stephen,

What I suggest is to use two related DataTables to achieve this.

For example, we're creating a DataSet containing information about authers
and books. One auther can have serveral books. Originally, we put all his
book names in a string array. However, string array is not a valid type for
DataColumn.DataType property.

Original table structure:

tbl_AuthorBook:
AuthorID
AuthorName
Phone
BookNames
......

A record will look like this:
1 John Smith 555-1234 BookName1; BookName2;
BookName3

Now, we can design the tables like this. The first DataTable contains the
columns about AutherID, AutherName and other information.

tbl_Author:
AuthorID
AuthorName
Phone
......

The second table contains book information and the AuthorD

tbl_Book:
BookID
BookName
AuthorID

Here the AuthorID is called a foreign key. The value must be one in the
AuthorID column in the tbl_Author. We can use a DataRelation object to link
these to tables. The DataRelation can either be created in the design view
of a typed DataSet or in code. Please refer to the following link for more
information about the DataRelation class:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemdatadatarelationclasstopic.asp

The tbl_Author is called Parent Table and tbl_Book is called Child Table.
Now the records in the table looks like this:

tbl_Author:
1 John Smith 555-1234
2 Keanu Reeves 987-6543

tbl_Book:
1 Programming ADO.NET 1
2 Essential ADO.NET 1
3 Matrix One 2
4 Matrix Reloaded 2

Thus creates a hierarchical DataSet.

<xs:schema id="Dataset1" targetNamespace="http://tempuri.org/Dataset1.xsd"
elementFormDefault="qualified"
attributeFormDefault="qualified" xmlns="http://tempuri.org/Dataset1.xsd"
xmlns:mstns="http://tempuri.org/Dataset1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="Dataset1" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="tbl_Author">
<xs:complexType>
<xs:sequence>
<xs:element name="AuthorID" type="xs:int" />
<xs:element name="AuthorName" type="xs:string" minOccurs="0" />
<xs:element name="Phone" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="tbl_Book">
<xs:complexType>
<xs:sequence>
<xs:element name="BookID" type="xs:int" minOccurs="0" />
<xs:element name="BookName" type="xs:string" minOccurs="0" />
<xs:element name="AuthorID" type="xs:int" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="Dataset1Key1" msdata:primaryKey="true">
<xs:selector xpath=".//mstns:tbl_Author" />
<xs:field xpath="mstns:AuthorID" />
</xs:unique>
<xs:key name="tbl_BookKey">
<xs:selector xpath=".//mstns:tbl_Book" />
<xs:field xpath="mstns:BookID" />
</xs:key>
<xs:keyref name="tbl_Authortbl_Book" refer="Dataset1Key1">
<xs:selector xpath=".//mstns:tbl_Book" />
<xs:field xpath="mstns:AuthorID" />
</xs:keyref>
</xs:element>
</xs:schema>

Now if we're trying to get the books of an author, we can use the
DataRow.GetChildRows method to do this. Here's an example. (Suppose we have
the parent row reference "parentrow")

DataRow[] dr = parentrow.GetChildRows(ds.Relations["tbl_Authortbl_Book"]);

For more information about DataRow.GetChildRows method, please refer to the
following link:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemdatadatarowclassgetchildrowstopic.asp

Does this answer your question? If anything is unclear, please feel free to
reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
K

Kevin Yu [MSFT]

Hi Stephen,

As far as I know, there isn't such a tool available. You will need to do
this manually. However, let's wait to see if there's any newsgroup members
who knows such tools.

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top