Need help with algorithm

M

MJ

Hello, I need some suggestions on how to do this.

I am reading through a large iis loge. For each line, I get three datapoints
(sessionID, Time, Stage). I need to store these values into some collection,
but not sure which one. Here is how they are related.

For each sessionID, there is one or more Time points. For each Time point
related to the sessionID, there is one Stage.

So we could have somehting like this:

Sid, Time, Stage
123, 12:00, search
543, 12:00, addToCart
123, 12:01, addToWishlist
657, 12:03, fillForm
123, 12:06, logIn
543, 12:06, logOut

In the end, I need this:

Sid: 123
search->addToWishlist->logIn

Sid: 543
addToCart->logOut

Sid: 657
fillForm

What I want to do is use the Time to sort the stages in order that the user
performed the steps. I'm up to the point where I looping through each line
of the log and then use regular expressions to get the 3 data points. Now
how should I store them? Something tells me a hashTable is the way to go,
but I don't know how to implement it using 3 data points. I think Sid+Time
should be the unique key with the stage being the value for the hash.

So far I've tried creating two hashTables (X,Y). X has Time as key and stage
was value. Y has SessionID as key and X as value, but this isn't working
because Time is not unique since other users are doing stuff at the same
time.

Any suggestions?
 
J

Jon Shemitz

MJ said:
For each sessionID, there is one or more Time points. For each Time point
related to the sessionID, there is one Stage.
Sid: 123
search->addToWishlist->logIn

Sid: 543
addToCart->logOut

Sid: 657
fillForm

This suggests that what you want is a HashTable (or Dictionary), keyed
by session ID, and holding ArrayList-s (or List<>-s) of a Time/Stage
pair.

Dictionary<SessionID, List<TimeStagePair>>
 
O

Otis Mukinfus

Hello, I need some suggestions on how to do this.

I am reading through a large iis loge. For each line, I get three datapoints
(sessionID, Time, Stage). I need to store these values into some collection,
but not sure which one. Here is how they are related.

For each sessionID, there is one or more Time points. For each Time point
related to the sessionID, there is one Stage.

So we could have somehting like this:

Sid, Time, Stage
123, 12:00, search
543, 12:00, addToCart
123, 12:01, addToWishlist
657, 12:03, fillForm
123, 12:06, logIn
543, 12:06, logOut

In the end, I need this:

Sid: 123
search->addToWishlist->logIn

Sid: 543
addToCart->logOut

Sid: 657
fillForm

What I want to do is use the Time to sort the stages in order that the user
performed the steps. I'm up to the point where I looping through each line
of the log and then use regular expressions to get the 3 data points. Now
how should I store them? Something tells me a hashTable is the way to go,
but I don't know how to implement it using 3 data points. I think Sid+Time
should be the unique key with the stage being the value for the hash.

So far I've tried creating two hashTables (X,Y). X has Time as key and stage
was value. Y has SessionID as key and X as value, but this isn't working
because Time is not unique since other users are doing stuff at the same
time.

Any suggestions?

Based on what I am seeing in your example of the data, there appears to be no
unique identifier for any of the three data elements.

If you are only going to use the list to step through a loop, using the Time
element would make sense. However, using the SID and Time for the key might not
work unless Time has more resolution than to the minute. Users can do a lot of
things with a computer within one minute, so you would probably end up with
duplicate keys if the resolution were only one minute. To use that combination
of key it MIGHT be safe if the resolution of the log is in ticks or
microseconds.
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
 
M

MS

Thank Otis. The time actually has the seconds as well. I just neglected to
put it in my example. So the format is hh:mm:ss. The only way a user can do
the same action at the same second is if they accidentally double-click on a
link. I would like to ignore these.
 
M

MS

Thanks Jon. I'll look into this.


Jon Shemitz said:
This suggests that what you want is a HashTable (or Dictionary), keyed
by session ID, and holding ArrayList-s (or List<>-s) of a Time/Stage
pair.

Dictionary<SessionID, List<TimeStagePair>>

--

.NET 2.0 for Delphi Programmers
www.midnightbeach.com/.net
What you need to know.
 
M

MS

By the way, I was able to do this with Perl with someone's help. I'm just
trying to convert it to a .net console app.

For those here that know Perl, it might shed some light on what I'm trying
to duplicate:

#Add the unique combinations into a hash
($userHash{$iisSID}{$iisTime}{$iisStageID})++;

From what I've been told, the "++" at the end is supposed to get rid of
duplicate combinations of sid,time, and stage. In other words, get rid of
the double-clicks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top