Extracting Rules for Diagnosis of Diabetes Using Genetic Programming
Background: Diabetes is a global health challenge that cusses high incidence of major social and economic consequences. As such, early prevention or identification of those people at risk is crucial for reducing the problems caused by it. The aim of study was to extract the rules for diabetes diagnosing using genetic programming.
Methods: This study utilized the PIMA dataset of the University of California, Irvine. This dataset consists of the information of 768 Pima heritage women, including 500 healthy persons and 268 persons with diabetes. Regarding the missing values and outliers in this dataset, the K-nearest neighbor and k-means methods are applied respectively. Moreover, a genetic programming model (GP) was conducted to diagnose diabetes as well as to determine the most important factors affecting it. Accuracy, sensitivity and specificity of the proposed model on the PIMA dataset were obtained as 79.32, 58.96 and 90.74%, respectively.
Results: The experimental results of our model on PIMA revealed that age, PG concentration, BMI, Tri Fold Thick and Serum Ins were effective in diabetes mellitus and increased risk of diabetes. In addition, the good performance of the model coupled with the simplicity and comprehensiveness of the extracted rules is also shown by the experimental results.
Conclusions: GPs can effectively implement the rules for diagnosing diabetes. Both BMI and PG Concentration are also the most important factors to increase the risk of suffering from diabetes.
Keywords: Diabetes, PIMA, Genetic programming, KNNi, K-means, Missing value, Outlier detection, Rule extraction.
The Copyright Form should be downloaded and signed by corresponding author in the fourth step "upload supplementary files" during submission process.
After acceptance, copyright form should be downloaded and signed by all authors one by one ( "summery --> supp. file" part and click on "add a supplementary file" link).