[UAI] Re: DM: Doubts about user-session estimation

From: Ronny Kohavi (ronnyk@bluemartini.com)
Date: Thu Oct 04 2001 - 09:17:04 PDT

  • Next message: Jorge Moraleda: "[UAI] Special offer on new Bayesian Network software"

    Richard Dybowski wrote:

    > I have recently come across two on-line articles on Web-usage analysis that
    > throw a lot of doubt on the validity of attempting to identify user
    > sessions from the type of data that is currently recorded in Web server
    > logs. User-session identification is made difficult by a number of causes,
    > including caching, load balancing (which assigns multiple IP addresses
    > during the same user session), and the use of spiders. One of these
    > critical articles is by Stephen Turner (Cambridge University) [1], the
    > other is from Susan Haigh and Janette Megarity (National Library of Canada)
    > [2].
    >
    > Does anyone know if such a validation has been done?
    >

    One way to avoid the issue is to log at the application server layer,
    so you are guaranteed to be consistent with the sessions that the user
    has at the level above the webserver layer. For example, see

        http://robotics.Stanford.EDU/~ronnyk/goodBadUglyKDDItrack.pdf
        http://robotics.Stanford.EDU/~ronnyk/integratingEcom.pdf

    Sessionizing from weblogs is impossible to do perfectly. If you can't
    log at the application server layer, you might try client-side logs or
    use heuristics. There are several articles on the topic at the WEBKDD
    workshops

       http://robotics.Stanford.EDU/~ronnyk/WEBKDD2001/index.html
       http://robotics.Stanford.EDU/~ronnyk/WEBKDD2000/index.html
    A good article at the SIAM workshop is

      Measuring the Accuracy of Sessionizers for Web Usage Analysis
    Berent, Mobasher, Spiliopoulou, and Wiltshire, in the Proceedings of
    the Web Mining Workshop at the First SIAM International Conference on
    Data Mining, 2001

      -- Ronny



    This archive was generated by hypermail 2b29 : Thu Oct 04 2001 - 09:22:42 PDT