Document Type

Journal Article

Department/Unit

Department of Computer Science

Title

Identifying a hierarchy of bipartite subgraphs for web site abstraction

Language

English

Abstract

The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.

Keywords

Web structure mining, web site abstraction, HITS algorithm, knowledge discovery

Publication Date

2007

Source Publication Title

Web Intelligence and Agent Systems

Volume

5

Issue

3

Start Page

343

End Page

355

Publisher

IOS Press

Peer Reviewed

1

Link to Publisher's Edition

https://content.iospress.com/articles/web-intelligence-and-agent-systems-an-international-journal/wia00120

ISSN (print)

15701263

This document is currently not available here.

Share

COinS