Parsing Data Problem ...

shapper · Nov 3, 2010

Hello,

I am trying to parse some data that comes in the following format:

One DataEntry contains N Dimensions. In this case N = 2;

Dimension class contains two properties: Name and Value:

IList<Object> dims = new List<Object>();

foreach (DataEntry entry in feed.Entries) {

Int32 i = entry.Dimensions.Count; // Equals 2

foreach (Dimension dim in entry.Dimensions)
dims.Add(new { Name = dim.Name, Value = dim.Value });

}

When I loop through dimensions, e.g. "foreach (Dimension dim in
entry.Dimension)", I have the following:

1st item: ROW 1 with Dim1.Name, Dim1.Value

2nd item: ROW 1 with Dim2.Name, Dim2.Value

3st item: ROW 2 with Dim1.Name, Dim1.Value

4nd item: ROW 2 with Dim2.Name, Dim2.Value

This is why I am having troubles in parsing ...

This is because the data should be:

ROW 1: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value

ROW 2: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value

My objective would be to have N columns (2 in this case)

Column 1: Dimension 1

Column 2: Dimension 2

....

And, for example, in Dimension 1 column each row would be an anonymous
object:

{ Name = Dimension.Name, Value = Dimension.Value }

Just like matrix ...

And I need to use this as follows:

I am doing this on my class constructor.

So I would like to save this data on a property or variable on the
same class so that the methods of the class can access it.

How can I do this?

Thank You,

Miguel

shapper · Nov 5, 2010

Hello,

Click to expand...

I am trying to parse some data that comes in the following format:

Click to expand...

One DataEntry contains N Dimensions. In this case N = 2;

Click to expand...

Dimension class contains two properties: Name and Value:

Click to expand...

IList<Object> dims = new List<Object>();

Click to expand...

List<T> where T is System.Object is pointless. The whole point of using
a generic type like List<T> is so that you can specify a real type. If
the type is System.Object, then anything can go into the list, and you
have no type safety at all.

foreach (DataEntry entry in feed.Entries) {

Click to expand...

Int32 i = entry.Dimensions.Count; // Equals 2

Click to expand...

foreach (Dimension dim in entry.Dimensions)
dims.Add(new { Name = dim.Name, Value = dim.Value });

}

Click to expand...

[...]
This is because the data should be:

Click to expand...

ROW 1: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value

Click to expand...

ROW 2: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value

Click to expand...

I assume you mean, for example, "Dim1.Name + Dim1.Value AND Dim2.Name +
Dim2.Value", whatever that means.

It's hard to say for sure, but it looks to me as though you really
should be using a projection and then a GroupBy().

For example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestGroupByRowNumber
{
class Program
{
class Dimension
{
public string Name { get; set; }
public int Value { get; set; }

public Dimension(string name, int value)
{
Name = name;
Value = value;
}
}

class DataEntry
{
public List<Dimension> Dimensions { get; private set; }

public DataEntry(IEnumerable<Dimension> dimensions)
{
Dimensions = new List<Dimension>(dimensions);
}
}

static void Main(string[] args)
{
List<DataEntry> entries = new List<DataEntry>
{
new DataEntry(new Dimension[] { new Dimension("A", 1),
new Dimension("B", 2) }),
new DataEntry(new Dimension[] { new Dimension("C", 3),
new Dimension("D", 4) }),
new DataEntry(new Dimension[] { new Dimension("E", 5),
new Dimension("F", 6) }),
new DataEntry(new Dimension[] { new Dimension("G", 7),
new Dimension("H", 8) })
};

var result = entries.SelectMany((entry, i) =>
entry.Dimensions.Select(dim => new { Row= i, Name =
dim.Name, Value = dim.Value }))
.GroupBy(a => a.Row, (row, g) =>
{
StringBuilder sb = new StringBuilder();

sb.AppendFormat("Row: {0}", row);
foreach (var a in g)
{
sb.AppendFormat(";{0}+{1}", a.Name, a.Value);
}

return sb.ToString();
});

foreach (string str in result)
{
Console.WriteLine(str);
}

Console.ReadLine();
}
}

}

The above just emits an IEnumerable<string>, but of course you can
construct the GroupBy() result selector to return whatever kind of
object that aggregates the data the way you want.

All that said, it's not clear why the LINQ would be preferable to just
enumerating the data and building up whatever data you want. I find the
LINQ less readable than something more straight-forward, such as:

StringBuilder sb = new StringBuilder();
List<string> output = new List<string>(entries.Count);

for (int index = 0; index < entries.Count; index++)
{
DataEntry entry = entries[index];

sb.AppendFormat("Row: {0}", index);

foreach (Dimension dim in entry.Dimensions)
{
sb.AppendFormat("; {0}+{1}", dim.Name, dim.Value);
}

output.Add(sb.ToString());
sb.Remove(0, sb.Length);
}

My objective would be to have N columns (2 in this case)

Click to expand...

Column 1: Dimension 1

Click to expand...

Column 2: Dimension 2

Click to expand...

So, where above I construct a string, you should just create an instance
of whatever your "row" object is, with each column populated by the data
in the appropriate Dimension object.

....

Click to expand...

And, for example, in Dimension 1 column each row would be an anonymous
object:

Click to expand...

{ Name = Dimension.Name, Value = Dimension.Value }

Click to expand...

Just like matrix ...

Click to expand...

Using anonymous types in your row object is going to severely limit the
utility of the output. You can do it using the techniques described in
this post and in the other thread, but if you're dealing with a data
table, having anonymous data inside it is going to be awkward unless you
can do everything in a single method.

Doing the initialization in a constructor and then expecting it to be
useful later on is going to be a real headache. C# doesn't have type
inference for generic types yet.

Pete

Hello Pete,

I was trying anonymous types but after trying other options.

The first one I tried and that seemed more logic was an Array and then
Dictionary ... With arrays I always get that problem of not being able
to use a .Add method.

I was looking at this and I think I can give a better an example.
Consider the following:

foreach (Row in data.Rows) {

foreach (DimensionColumn in data.Dimensions) {
}

foreach (MetricColumn in data.Metrics) {
}

}

Each DimensionColumn has 2 properties: Name and Type
Each MetricColumn has 2 properties: Name and Value

All rows have the same number of Dimension Columns and Metric Columns.

I would like to create some kind of data variable that would contains
all this values.

Dictionary, Array, etc?

Consider I have two Rows, One Dimension Column and Two Metrics
columns.

ROW 1:

DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

ROW 2:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

Basically that is it ...

Of course I could have 4 dimensions and 10 metrics.

Thank You,
Miguel

shapper · Nov 5, 2010

On 11/2/10 5:27 PM, shapper wrote:

Click to expand...

List<T> where T is System.Object is pointless. The whole point of using
a generic type like List<T> is so that you can specify a real type. If
the type is System.Object, then anything can go into the list, and you
have no type safety at all.

foreach (DataEntry entry in feed.Entries) {
Int32 i = entry.Dimensions.Count; // Equals 2
foreach (Dimension dim in entry.Dimensions)
dims.Add(new { Name = dim.Name, Value = dim.Value });
}
[...]
This is because the data should be:
ROW 1: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value
ROW 2: Dim1.Name + Dim1.Value AND Dim2.Name and Dim2.Value

Click to expand...

Click to expand...

I assume you mean, for example, "Dim1.Name + Dim1.Value AND Dim2.Name +
Dim2.Value", whatever that means.

Click to expand...

It's hard to say for sure, but it looks to me as though you really
should be using a projection and then a GroupBy().

Click to expand...

For example:

Click to expand...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

Click to expand...

namespace TestGroupByRowNumber
{
class Program
{
class Dimension
{
public string Name { get; set; }
public int Value { get; set; }

Click to expand...

public Dimension(string name, int value)
{
Name = name;
Value = value;
}
}

Click to expand...

class DataEntry
{
public List<Dimension> Dimensions { get; private set; }

Click to expand...

public DataEntry(IEnumerable<Dimension> dimensions)
{
Dimensions = new List<Dimension>(dimensions);
}
}

Click to expand...

static void Main(string[] args)
{
List<DataEntry> entries = new List<DataEntry>
{
new DataEntry(new Dimension[] { new Dimension("A", 1),
new Dimension("B", 2) }),
new DataEntry(new Dimension[] { new Dimension("C", 3),
new Dimension("D", 4) }),
new DataEntry(new Dimension[] { new Dimension("E", 5),
new Dimension("F", 6) }),
new DataEntry(new Dimension[] { new Dimension("G", 7),
new Dimension("H", 8) })
};

Click to expand...

var result = entries.SelectMany((entry, i)=>
entry.Dimensions.Select(dim => new { Row = i, Name =
dim.Name, Value = dim.Value }))
.GroupBy(a => a.Row, (row, g) =>
{
StringBuilder sb = newStringBuilder();

Click to expand...

sb.AppendFormat("Row: {0}", row);
foreach (var a in g)
{
sb.AppendFormat("; {0}+{1}", a.Name, a.Value);
}

Click to expand...

return sb.ToString();
});

Click to expand...

foreach (string str in result)
{
Console.WriteLine(str);
}

Click to expand...

Console.ReadLine();
}
}

The above just emits an IEnumerable<string>, but of course you can
construct the GroupBy() result selector to return whatever kind of
object that aggregates the data the way you want.

Click to expand...

All that said, it's not clear why the LINQ would be preferable to just
enumerating the data and building up whatever data you want. I find the
LINQ less readable than something more straight-forward, such as:

Click to expand...

StringBuilder sb = new StringBuilder();
List<string> output = new List<string>(entries.Count);

Click to expand...

for (int index = 0; index < entries.Count;index++)
{
DataEntry entry = entries[index];

Click to expand...

sb.AppendFormat("Row: {0}", index);

Click to expand...

foreach (Dimension dim in entry.Dimensions)
{
sb.AppendFormat("; {0}+{1}",dim.Name, dim.Value);
}

Click to expand...

output.Add(sb.ToString());
sb.Remove(0, sb.Length);
}

Click to expand...

So, where above I construct a string, you should just create an instance
of whatever your "row" object is, with each column populated by the data
in the appropriate Dimension object.

Click to expand...

Using anonymous types in your row object is going to severely limit the
utility of the output. You can do it using the techniques described in
this post and in the other thread, but if you're dealing with a data
table, having anonymous data inside it is going to be awkward unless you
can do everything in a single method.

Click to expand...

Doing the initialization in a constructor and then expecting it to be
useful later on is going to be a real headache. C# doesn't have type
inference for generic types yet.

Click to expand...

Pete

Click to expand...

Hello Pete,

I was trying anonymous types but after trying other options.

The first one I tried and that seemed more logic was an Array and then
Dictionary ... With arrays I always get that problem of not being able
to use a .Add method.

I was looking at this and I think I can give a better an example.
Consider the following:

foreach (Row in data.Rows) {

foreach (DimensionColumn in data.Dimensions) {
}

foreach (MetricColumn in data.Metrics) {
}

}

Each DimensionColumn has 2 properties: Name and Type
Each MetricColumn has 2 properties: Name and Value

All rows have the same number of Dimension Columns and Metric Columns.

I would like to create some kind of data variable that would contains
all this values.

Dictionary, Array, etc?

Consider I have two Rows, One Dimension Column and Two Metrics
columns.

ROW 1:

DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

ROW 2:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

Basically that is it ...

Of course I could have 4 dimensions and 10 metrics.

Thank You,
Miguel

I am trying to use Dictionary. Does it make sense?

My idea would be something like:

Dictionary<String, IDictionary<String, String[]>
So the first String is "Metric" or "Dimension". This would separate
both types.

Consider just the code for Metric.

IDictionary<String, IDictionary<String, String[]>> data;
data.Add("Metrics", new Dictionary<String, String[]>());

foreach (DataEntry entry in feed.Entries) {
foreach (Metric metric in entry.Metrics) {
data["Metrics"].Add( // ...
}
}

Inside the Metric loop I would like to check if the metric's name
(metric.Name) exists in the data["Metrics"].

If it does not exist then add a new Dictionary<String, String[]> to
data["Metrics"] where:

String = metric.Name and String[] will have one va,lue:
metric.Value;

If there is already a Dictionary inside data["Metrics"] that contains
a key = metric.Name then:

Just add metric.Value to the existing array.

How can I do this?

Thank You,
Miguel

shapper · Nov 5, 2010

[...]

Consider I have two Rows, One Dimension Column and Two Metrics
columns.
ROW 1:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
ROW 2:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
[...]

Click to expand...

Click to expand...

I am trying to use Dictionary. Does it make sense?

Click to expand...

My idea would be something like:

Click to expand...

Dictionary<String, IDictionary<String, String[]>
So the first String is "Metric" or "Dimension". This would separate
both types.

[...]

Click to expand...

How can I do this?

Click to expand...

I guess I don't really understand the point of the dictionaries. In
your previous post, you wrote:

> Consider I have two Rows, One Dimension Column and Two Metrics
> columns.
>
> ROW 1:
>
> DIMENSION METRIC 1 METRIC 2
> DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
>
> ROW 2:
> DIMENSION METRIC 1 METRIC 2
> DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

Why are all the data not just stored in each row object (whatever that
might be)? For example:

class RowObject
{
public Dimension Dimension { get; set; }
public Metric[] Metrics { get; set; }
}

Or something like that.

If you do see a need to use dictionaries, why have a dictionary of
dictionaries? Is there ever actually going to be more than just the
"dimension" and "metric" dictionaries? If not, why not just have two
variables, one for each kind of dictionary?

Pete

Let me explain. This data is coming from Google Data API, in this case
Google Analytics. Google delivers this data into a XML format with
much more information than this.

The C# library that Google supports transforms the XML note into
Entries (Row) and for each row I have N metrics and M Dimensions.

This format is quite strange since what matters is all row values from
one metric to display in a chart.

I have a Service that on the constructor authenticates do Google and
runs a query that gets the data. Then I want to transform the data as
described (by column) and hold it in a private field of the service
class. Then each method will use part of the data. For example:
GetVisits will return all the values from Dimension "Date" and from
the Metric "Visit".

So I would access:

String[] dates = data["Dimensions"]["Dates"];
String[] visits = data["Metrics"]["Visits"];

Does this make sense?

shapper · Nov 5, 2010

On 11/5/10 11:10 AM, shapper wrote:

[...]
Consider I have two Rows, One Dimension Column and Two Metrics
columns.
ROW 1:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
ROW 2:
DIMENSION METRIC 1 METRIC 2
DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
[...]
I am trying to use Dictionary. Does it make sense?
My idea would be something like:
Dictionary<String, IDictionary<String, String[]>
So the first String is "Metric" or "Dimension". This would separate
both types.
[...]
How can I do this?

Click to expand...

Click to expand...

I guess I don't really understand the point of the dictionaries. In
your previous post, you wrote:

Click to expand...

> Consider I have two Rows, One Dimension Column and Two Metrics
> columns.
>
> ROW 1:
>
> DIMENSION METRIC 1 METRIC 2
> DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val
>
> ROW 2:
> DIMENSION METRIC 1 METRIC 2
> DIM.Name,DIM.Type MET1.Name,MET1.Val MET2.Name,MET2.Val

Click to expand...

Why are all the data not just stored in each row object (whatever that
might be)? For example:

Click to expand...

class RowObject
{
public Dimension Dimension { get; set; }
public Metric[] Metrics { get; set; }
}

Click to expand...

Or something like that.

Click to expand...

If you do see a need to use dictionaries, why have a dictionary of
dictionaries? Is there ever actually going to be more than just the
"dimension" and "metric" dictionaries? If not, why not just have two
variables, one for each kind of dictionary?

Click to expand...

Pete

Click to expand...

Let me explain. This data is coming from Google Data API, in this case
Google Analytics. Google delivers this data into a XML format with
much more information than this.

The C# library that Google supports transforms the XML note into
Entries (Row) and for each row I have N metrics and M Dimensions.

This format is quite strange since what matters is all row values from
one metric to display in a chart.

I have a Service that on the constructor authenticates do Google and
runs a query that gets the data. Then I want to transform the data as
described (by column) and hold it in a private field of the service
class. Then each method will use part of the data. For example:
GetVisits will return all the values from Dimension "Date" and from
the Metric "Visit".

So I would access:

String[] dates = data["Dimensions"]["Dates"];
String[] visits = data["Metrics"]["Visits"];

Does this make sense?

Hello,

Did my previous explanation helped out?

I created the following code that is working:

IDictionary<String, IDictionary<String, IList<String>>> data =
new Dictionary<String, IDictionary<String, IList<String>>>();
data.Add("Dimensions", new Dictionary<String,
IList<String>>());
data.Add("Metrics", new Dictionary<String, IList<String>>());

foreach (DataEntry entry in feed.Entries) {

foreach (Dimension dimension in entry.Dimensions) {
if (!data["Dimensions"].ContainsKey(dimension.Name))
data["Dimensions"].Add(dimension.Name, new
List<String>());
data["Dimensions"][dimension.Name].Add(dimension.Value);
}

foreach (Metric metric in entry.Metrics) {
if (!data["Metrics"].ContainsKey(metric.Name))
data["Metrics"].Add(metric.Name, new List<String>());
data["Metrics"][metric.Name].Add(metric.Value);
}

}

Then I could, for example, access a metric as follows: data["Metrics"]
["Visits"]

What do you think about my code?

Any suggestion just let me know.

Thank You,
Miguel

shapper · Nov 6, 2010

So, does that mean that even though your examples all have just one
dimension in a row, a given row could have multiple dimensions?

Yes, it could have 10 dimensions and 20 metrics.

I wrote at start that it could have N dimensions ... But I admit that
my post was a little bit confusing as myself has been trying to figure
this out along the way.

If so, does that also mean that the metrics within that row are each
associated with one specific dimension in the row?

Yes, one row, e.g. XML node contains the value for each dimension and
for each metric.

[...]
I created the following code that is working:

Click to expand...

IDictionary<String, IDictionary<String, IList<String>>> data =
new Dictionary<String, IDictionary<String, IList<String>>>();
data.Add("Dimensions", new Dictionary<String,
IList<String>>());
data.Add("Metrics", new Dictionary<String, IList<String>>());

Click to expand...

foreach (DataEntry entry in feed.Entries) {

Click to expand...

foreach (Dimension dimension in entry.Dimensions) {
if (!data["Dimensions"].ContainsKey(dimension.Name))
data["Dimensions"].Add(dimension.Name, new
List<String>());
data["Dimensions"][dimension.Name].Add(dimension.Value);
}

Click to expand...

foreach (Metric metric in entry.Metrics) {
if (!data["Metrics"].ContainsKey(metric.Name))
data["Metrics"].Add(metric.Name, new List<String>());
data["Metrics"][metric.Name].Add(metric.Value);
}

Click to expand...

}

Click to expand...

Then I could, for example, access a metric as follows: data["Metrics"]
["Visits"]

Click to expand...

What do you think about my code?

Click to expand...

Any suggestion just let me know.

Click to expand...

It seems fine to me, but I still (as in my previous reply) don't
understand why you want to store the dimension and metric dictionaries
in another dictionary. Why not just have two variables, referencing the
dictionaries for dimension and metric?

Yes, that seems logic. But Dimensions and Metrics are not all there
is.
I think, I didn't read a lot about this part, that there are
aggregates stats and others.

So instead of adding more fields I just have one Dictionary.
For me it seems better ... But again this is just a personal opinion.

The other thing I still don't understand is that your description seems
to imply that there would be some correlation between dimension and
metric, but the above code doesn't address that at all. Are metrics not
associated with dimensions in any way?

Yes, when you send this query:
Dimensions = "ga:date",
Metrics = "ga:visits,ga:bounces",

You will get Visits and Bounces by Date.
More more metrics and dimensions can be added but at the end the
number of rows will be the same for all.

So for each XML Node, e.g., Data Entry I will have one date (Dimension
ga:date), and two metrics (ga:visits and ga:bounces)
And this will repeat for each Data Entry.

What is confusing is, at least for me, I would expect to get 3
vectors: one for ga_date, one for ga:visits and one for ga:bounces.

Wouldn't you?

But now I figured this out and it makes some sense since this coming
from a XML file.

Problem with Pivot table reports	2	Dec 10, 2003
In .xml parsing how to skip subnodes and move to next node usingXmlTextReader	3	Oct 9, 2008
Problem with reading .xml file using Dataset method	1	Oct 8, 2008
ListView SubClass Problem	1	Feb 27, 2007
Simple question Related to Generic	1	Nov 27, 2007
Anonymous Types, LINQ and Generics	3	Feb 9, 2009
Why is my GetFields(BindingFlags.Instance \| BindingFlags.Public) not working? (Sample working code i	3	Jun 11, 2010
Merge data from two separate tables	1	Feb 22, 2010

Parsing Data Problem ...

shapper

shapper

shapper

shapper

shapper

shapper

Ask a Question

Similar Threads