Department of Computer Science
Learning hidden Markov model topology based on KL divergence for information extraction
To locate information embedded in documents, information extraction systems based on rule-based pattern matching have long been used. To further improve the extraction generalization, hidden Markov model (HMM) has recently been adopted for modeling temporal variations of the target patterns with promising results. In this paper, a state-merging method is adopted for learning the topology with the use of a localized Kullback Leibler (KL) divergence. The proposed system has been applied to a set of domain-specific job advertisements and preliminary experiments show promising results.
Hide Markov Model, Information Extraction, Kullback Leibler, Target Pattern, Information Extraction System
Source Publication Title
Advances in Knowledge Discovery and Data Mining 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26-28, 2004. Proceedings
Dai, Honghua ; Srikant, Ramakrishnan ; Zhang, Chengqi
Lecture notes in computer science, 3056.; Lecture notes in computer science., Lecture notes in artificial intelligence.
© Springer-Verlag Berlin Heidelberg 2004
Link to Publisher's Edition
Au, Kwok-Chung, and Kwok-Wai Cheung. "Learning hidden Markov model topology based on KL divergence for information extraction." Advances in Knowledge Discovery and Data Mining 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26-28, 2004. Proceedings (2004): 590-594.