[UAI] Doubts about user-session estimation

From: Richard Dybowski (rdybowski@btinternet.com)
Date: Tue Oct 02 2001 - 14:17:27 PDT

  • Next message: iat01@kis.maebashi-it.ac.jp: "[UAI] WI/IAT-01 Final Program and CFP"
  • Next message: Ian Miguel: "[UAI] CP 2001 Programme"

    I have recently come across two on-line articles on Web-usage analysis that
    throw a lot of doubt on the validity of attempting to identify user
    sessions from the type of data that is currently recorded in Web server
    logs. User-session identification is made difficult by a number of causes,
    including caching, load balancing (which assigns multiple IP addresses
    during the same user session), and the use of spiders. One of these
    critical articles is by Stephen Turner (Cambridge University) [1], the
    other is from Susan Haigh and Janette Megarity (National Library of Canada)
    [2].

    Haigh & Megarity have described user-session estimations as "at best, gross
    estimates". It seems to me that what is needed is a systematic validation
    of the efficacy of the various Web-analysis algorithms currently available.
    This could be done by simulating log-file data from known transactions and
    comparing how well an algorithm is able to recover the transactions from
    the data. This should be repeated using a wide range of hypothetical
    scenarios, such as very frequent load balancing (as occurs in reality with
    AOL users).

    Does anyone know if such a validation has been done?

    Richard

    References
    ---------------

    [1] S. Turner. "Analog 5.03: How the Web Works".
    http://www.analog.cx/docs/webworks.html [7 July 2001]

    [2] S. Haigh, J. Megarity. "Measuring Web Site Usage: Log File Analysis".
    http://www.nlc-bnc.ca/9/1/p1-256-e.html [4 August 1998]

    -------------------------------
    Richard Dybowski, 143 Village Way, Pinner, Middlesex HA5 5AA, UK
    Tel (mobile): 079 76 25 00 92



    This archive was generated by hypermail 2b29 : Tue Oct 02 2001 - 14:26:58 PDT