Real-data comparison of data mining methods in prediction of coronary artery disease in Iran
Introduction: Cardiovascular diseases are currently of broad prevalence and constitute one of the major causes of mortality in different societies. Angiography is one of the most accurate methods to diagnose heart diseases; it incurs high expenses and comes with side effects. Data mining is intended to enable timely prognosis of diseases with the least expenses possible, making use of the patients’ information. The present study aims to provide replies for the question whether it is possible to predict coronary artery diseases with higher efficiency and fewer errors and identify the factors impacting the disease using data mining techniques.
Method: In this study, the data under investigation was collected from a number of 303 persons referring to the heart unit in Shahid Rajaie hospital (Iranian hospital) from 2011 to 2013. It included 54 features. Attempts are made to take advantage of a higher number of characteristics which are helpful for diagnosis of diseases. In addition, Information Gain, Gini, and SVM methods were applied to select influential features, and variables with higher weights were chosen for modeling purposes. In the modeling phase, a combination of classification algorithms and ensemble methods was applied to develop a prediction with fewer errors. Rapid Miner Software was adopted to conduct this study.
Results: Findings of this research indicated that the suggested model, if weighted by SVM index, had the highest efficiency, i.e. 95.83%. This model, moreover, was able to accurately predict all patients with coronary artery disease in Iran. According to the proposed model and obtained accuracies, weighting with SVM was found to be the most effective filtering method, and age as well as typical and atypical chest pain were identified to be the most effective features of coronary artery disease. (Graph 3)
Conclusion: This study can contribute to the diagnosis of influential factors which lead to cardiovascular disease in Iran. Comparison of influential variables showed that chest pain (in its two typical and atypical modes) and patient’s age had the highest weight in this study. It demonstrates that coronary artery disease is more likely to happen in older ages. High blood pressure is also an important factor in outbreak of this disease. That is why measures have to be taken to prevent such occurrence. Diabetes constitutes another influential factor in the outbreak of coronary artery disease to which attention should be paid in primary tests.
Keywords: Cardiovascular disease, Coronary Artery Disease, Angiography
Nahar J, Imam T, Tickle KS, Chen Y-PP. Association rule mining to detect factors which contribute to heart disease in males and females. Expert Systems with Applications. 2013;40(4):1086-93.
Alizadehsani R, Habibi J, Hosseini MJ, Mashayekhi H, Boghrati R, Ghandeharioun A, et al. A data mining approach for diagnosis of coronary artery disease. Comput Methods Programs Biomed. 2013 Jul;111(1):52-61.
Han J, Pei J, Kamber M. Data mining: concepts and techniques: Elsevier; 2011.
Manimekalai K. Prediction of Heart Diseases using Data Mining Techniques. International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 2016;4.
Cheung N. Machine learning techniques for medical analysis: University of Queenland.
Polat K, Gunes S. A hybrid approach to medical decision support systems: combining feature selection, fuzzy weighted pre-processing and AIRS. Comput Methods Programs Biomed. 2007 Nov;88(2):164-74.
Padmavathi K, Ramakrishna KS. Detection of Atrial Fibrillation using Autoregressive modeling. International Journal of Electrical and Computer Engineering (IJECE). 2015;5(1):64-70.
Bhalerao S, Gunjal DB. Survey Of Heart Disease Prediction Based On Data Mining Algorithms. Vol-2 Issue-2. 2016.
Yilmaz N, Inan O, Uzer MS. A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases. J Med Syst. 2014 May;38(5):48.
Patel J, Makwana A. Decision Support System for Heart Disease Prediction using Data Mining Techniques. International Journal of Computer Applications. 2015;117(22):1-5.
Akyol K, Çalik E, Bayir Ş, Şen B, Çavuşoğlu A. Analysis of Demographic Characteristics Creating Coronary Artery Disease Susceptibility Using Random Forests Classifier. Procedia Computer Science. 2015;62:39-46.
Georgeena T, Thomas S, Siddhesh S, Budhkar, Siddhesh K, Cheulkar, et al. Heart Disease Diagnosis System Using Apriori Algorithm. 2015;5(2).
Paliwal P, Malviya M. An efficient method for predicting heart disease problem using fitness value. International Journal of Computer Science and Information Technologies. 2015;6(2):1290-3.
Abdar M, Kalhori SRN, Sutikno T, Subroto IMI, Arji G. Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer Engineering (IJECE). 2015;5(6):1569-76.
Dineshgar GP. A Review on Data Mining For Heart Disease Prediction ISSN: 2278 - 909X. International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE). 2016;5(2).
Verma L, Srivastava S, Negi PC. A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data. J Med Syst. 2016 Jul;40(7):178.
Zamanpoor S, Shamsi M. Comparing datamining’s algorithms validity in predicting heart disease. 4th Iranian Conference on Electrical and Electronics Engineering (ICEEE2012)2012.
Safdari R, Ghazi Saeedi M, Gharooni M, Nasiri M, Arji G. Comparing performance of decision tree and neural network in predicting myocardial infarction. Journal of Paramedical Science and Rehabilitation (JPSR). 2014;2(3).
Keyvanpour M, Khalatbari L. Comparing classification algorithms in diagnosing diabetes and heart disease. 3rd datamining conference; Tehran2010.
Rao VS, Kumar MN. A new intelligence-based approach for computer-aided diagnosis of Dengue fever. IEEE Trans Inf Technol Biomed. 2012 Jan;16(1):112-8.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.
pISSN: 2322-1097 eISSN: 2423-5857