Re: [UAI] web text classification applications in real life

From: Kevin S. Van Horn (ksvhsoft@xmission.com)
Date: Sun Mar 26 2000 - 08:22:40 PST

Next message: Denver Dash: "Calculating joint over arbitrary sets of variables"

Previous message: Russ Greiner: "[UAI] Variance (ErrorBars) in Belief Nets"
Maybe in reply to: Haipeng Guo: "[UAI] web text classification applications in real life"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    Does anyone know where to find the information about web text
   classification applications in Yahoo! or Excite? How do they
   automatically do the text categorizing? How did they do it in the very
   beginning? How's their history and also current evolving?

I can tell you something about Excite -- I was employee #10 there and the
technical lead for NewsTracker (in the beginning, I more or less was the entire
NewsTracker project). I left in April of 1997, but when I was there,
both searching and the NewsTracker news story classifications were based on a
variant of the vector-space model. In the vector-space model you represent a
document by a normalized vector of word counts in the document, with
less-common words weighted more heavily. A query has the same representation,
and you score documents by the angular distance between the query and the
document. Boolean queries are basically a combination of a filtering
predicate (the Boolean part), with the words in the query used to construct
the vector used to rank those documents that make it through the filter.

There's a book by Salton (don't recall the title just now) that's a good
reference on this stuff.

Next message: Denver Dash: "Calculating joint over arbitrary sets of variables"
Previous message: Russ Greiner: "[UAI] Variance (ErrorBars) in Belief Nets"
Maybe in reply to: Haipeng Guo: "[UAI] web text classification applications in real life"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Mar 26 2000 - 08:32:05 PST