Document Type

Conference Paper

Department/Unit

Department of Computer Science

Title

Mining Web site's clusters from link topology and site hierarchy

Language

English

Abstract

Foraging information in large and complex Web sites simply using keyword search usually results in unpleasant experience due to the overloaded search results. To support more effective information search, some descriptive abstractions of the Web sites (e.g., sitemaps) are mostly needed. However, their creation and maintenance normally requires recurrent manual effort due to the fast-changing Web contents. We extend the HITS algorithm and integrate hyperlink topology and Web site hierarchy to identify a hierarchy of Web page clusters as the abstraction of a Web site. As the algorithm is based on HITS, each identified cluster follows the bipartite graph structure, with an authority and hub pair as the cluster summary. The effectiveness of the algorithm has been evaluated using three different Web sites (containing /spl sim/6000-14000 Web pages) with promising results. Detailed interpretation of the experimental results as well as qualitative comparison with other related works are also included.

Keywords

Topology, Clustering algorithms, Iterative algorithms, Web pages, Bipartite graph, Algorithm design and analysis, Search engines, Sun, Computer science, Keyword search

Publication Date

10-2003

Source Publication Title

Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI’03)

Start Page

271

End Page

277

Conference Location

Halifax, Canada

Publisher

IEEE

Peer Reviewed

1

Funder

This research is supported by UST AoE-IT Grant UST/AOE/01-02/1.

DOI

10.1109/WI.2003.1241204

Link to Publisher's Edition

http://dx.doi.org/10.1109/WI.2003.1241204

ISBN (print)

9780769519326

This document is currently not available here.

Share

COinS