Document Type

Journal Article

Department/Unit

Department of Mathematics; Department of Computer Science; Institute of Computational and Theoretical Studies

Language

English

Abstract

Background: In a medical data set, data are commonly composed of a minority (positive or abnormal) group and a majority (negative or normal) group and the cost of misclassifying a minority sample as a majority sample is highly expensive. This is the so-called imbalanced classification problem. The traditional classification functions can be seriously affected by the skewed class distribution in the data. To deal with this problem, people often use a priori cost to adjust the learning process in the pursuit of optimal classification function. However, this priori cost is often unknown and hard to estimate in medical decision making. Methods: In this paper, we propose a new learning method, named RankCost, to classify imbalanced medical data without using a priori cost. Instead of focusing on improving the class-prediction accuracy, RankCost is to maximize the difference between the minority class and the majority class by using a scoring function, which translates the imbalanced classification problem into a partial ranking problem. The scoring function is learned via a non-parametric boosting algorithm. Results: We compare RankCost to several representative approaches on four medical data sets varying in size, imbalanced ratio, and dimension. The experimental results demonstrate that unlike the currently available methods that often perform unevenly with different priori costs, RankCost shows comparable performance in a consistent manner. Conclusions: It is a challenging task to learn an effective classification model based on imbalanced data in medical data analysis. The traditional approaches often use a priori cost to adjust the learning of the classification function. This work presents a novel approach, namely RankCost, for learning from medical imbalanced data sets without using a priori cost. The experimental results indicate that RankCost performs very well in imbalanced data classification and can be a useful method in real-world applications of medical decision making.

Keywords

Classification, Imbalanced data, Medical decision making, Partial ranking

Publication Date

12-2014

Source Publication Title

BMC Medical Informatics and Decision Making

Volume

14

Start Page

111

Publisher

BioMed Central

Peer Reviewed

1

Copyright

© 2014 Wan et al.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Funder

This work was supported by Hong Kong Baptist University Strategic Development Fund, Hong Kong Baptist University grant FRG1/12-13/065, Hong Kong Research grant HKBU202711, and Hong Kong Research grant HKBU12202114.

DOI

10.1186/s12911-014-0111-9

Link to Publisher's Edition

http://dx.doi.org/10.1186/s12911-014-0111-9

ISSN (print)

14726947

ISSN (electronic)

14726947

Share

COinS