|
-
November 29th, 2008, 11:04 AM
#1
Finding most repeated sequence
Dear all,
I have computed users sessions of a website. This is, I can represent each user behaviour as a sequence of requested pages.
What I would like now is to be able to recognise the most repeated sequences of pages. Can you give me hints on the best way to implement this? Is there any open source library for helping me in coding this problem? (I'm programming ing Java).
I guess using a tree structure will help. However, if I end up having a weighted Tree where each edge has a frequency regarding the number of user' sequences that have passed through it, how may I discover the most repeated path?
Thank you in advance!
-
November 30th, 2008, 06:50 AM
#2
Re: Finding most repeated sequence
Why do you want this information?
It won't be a tree, simply because people don't always browse linearly - it is, after all, a web of hypertext not a linear document. So expect branches and cycles (go to a forum index, open some posts to read in tabs, read the posts, reply to some, go back to forum index, repeat).
So it could be quite a complex path, more than can be represented by just incrementing the weight on each link/edge between pages/nodes.
On the other hand, a simple Markov model might be all you need - what is the most likely transition from a given page. Which might tell you whether users can find links between your pages, rather than relying on google, but isn't quite the same as detailed record of how a user browses the site.
-
November 30th, 2008, 09:18 AM
#3
Re: Finding most repeated sequence
If all you are interested in is the series of links a user clicked while browsing the site, not the actual activity the user has performed, you can code each page (or link if that's what your'e interested in) with a distinct ID, like
1,2,3, ... and so on.
Now, you can code each users path (even a circular path) in the website by a string that may look like:
"1,3,25,4,4,5,1,3,22,25"
Each path is guaranteed to be unique, and you can insert each user's path into a hash, where the keys are the paths and the values are the number of times the specific path was used. If you are interested in all of the sub paths as well, you can insert all of them into the hash as well. The frequency of a single path is simply the number of times a path was walked in - it's value in the hash.
Regards,
Zachm
Last edited by Zachm; November 30th, 2008 at 09:21 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|