An Analysis of Gene Expression Variations in Lymphoma, Using a Fuzzy Classification Model

Document Type : Articles

Authors

Abstract

Introduction: Cancer is a major cause of mortality in the modern world, and one of the most important health problems in societies. During recent years, research on cancer as a system biology disease is focused on molecular differences between cancer cells and healthy cells. Most of the proposed methods for classifying cancer using gene expression data act as black boxes and lack biological interpretability. The goal of this study is to design an interpretable fuzzy model for classifying gene expression data of Lymphoma cancer.Method: In this research, the investigated microarray contained 45 samples of lymphoma. Total number of genes was 4026 samples. At first, we offer a hybrid approach to reduce the data dimension for detecting genes involved in lymphoma cancer. In lymphoma microarray, six out of 4029 genes were selected. Then, a fuzzy interpretable classifier was presented for classification of data. Fuzzy inference was performed using two rules which had the highest scores. Weka3.6.9 software was used to reduce the features and the fuzzy classifier model was implemented in MATLAB R2010a. Results of this study were assessed by two measures of accuracy and precision.Results: In pre-processing stage, in order to classify gene expression data of Lymphoma, six out of 4026 genes were identified as cancer- causing genes, and then the fuzzy classifier model was applied on the obtained data. The accuracy of the results of classification was 96 percent using 10 rules with the highest scores and that using 2 rules with the highest scores was about 98 percent.Conclusion: In the proposed approach, for the first time, a fully fuzzy method named a minimal rule fuzzy classification (MRFC) was introduced for extracting fuzzy rules with biological interpretability and meaning extraction from gene expression data. Among the most outstanding features of this method is the ability of extracting a small set of rules to interpret effective gene expression in cancer patients. Another result of this approach is successfully addressing the problem of disproportion between the number of samples and genes in microarrays with the proposed Filter-Wrapper Feature Selection method (FWFS).Keywords: Lymphoma Cancer, Cancer Diagnosis, Microarray, Gen Expression, Fuzzy Classifier

  1. Petersen PE. Oral cancer prevention and control–The approach of the World Health Organization. Oral oncology.
  2. ;45(4):454-60.
  3. Cabanes A, Vidal E, Aragones N, Perez-Gomez B, Pollan M, Lope V, et al. Cancer mortality trends in Spain: 1980–2007. Annals of Oncology. 2010; 21(3); 14-20.
  4. Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ.
  5. Cancer statistics, 2007. CA: a cancer journal for clinicians.
  6. ;57(1):43-66.
  7. P, G., Aruldoss, T., Devaraj, D., & Renukadev, M. Design of fuzzy Expert system for microarray data ‎classification,using a novel Genetic Swarm Algorithm. Expert Systems with Applications, 39, 2012: 1811-1821.
  8. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal. 2015;13:8-17.
  9. Fodor IK. A survey of dimension reduction techniques.
  10. Technical Report UCRL-ID-148494, Lawrence Livermore
  11. National Laboratory; 2002: 457-45.
  12. Kira K, Rendell LA, editors. The feature selection problem:
  13. Traditional methods and a new algorithm. AAAI; 1992.
  14. Chen K-H, Wang K-J, Wang K-M, Angelia M-A. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Applied Soft Computing. 2014;24:773-80.
  15. Parry R, Jones W, Stokes T, Phan J, Moffitt R, Fang H, et al. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. The pharmacogenomics journal. 2010;10(4):292-309.
  16. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association.
  17. ;97(457):77-87.
  18. Soares Servulo de Oliveira F, Oseas de Carvalho Filho A, Corrêa Silva A, Cardoso de Paiva A, Gattass M. Classification of breast regions as mass and non-mass based on digital mammograms using taxonomic indexes and SVM. Computers in Biology and Medicine. 2015;57:42-53.
  19. Deepa T, Sathiyabhama B, Akilandeswari J, Gopalan NP.
  20. Action fuzzy rule based classifier for analysis of dermatology databases. International Journal of Biomedical Engineering and Technology. 2014;15(4): 360-379.
  21. Lai DTC, Garibaldi JM. A preliminary study on automatic breast cancer data classification using semi-supervised fuzzy c-means. International Journal of Biomedical Engineering and Technology. 2013;13(4):303-322.
  22. Amini M, Rezaeenour J, Hadavandi E. A Neural Network Ensemble Classifier for Effective Intrusion Detection Using Fuzzy Clustering and Radial Basis Function Networks. International Journal on Artificial Intelligence Tools.
  23. ;25(02):1550033.
  24. Jiang W, Shen Q, Chen M, Wang Y, Zhou Q, Zhu X, et al. Levonorgestrel-releasing intrauterine system use in premenopausal women with symptomatic uterine leiomyoma: a systematic review. Steroids. 2014 Aug;86:69-78.
  25. Zięba M, Tomczak JM, Lubicz M, Świątek J. Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied Soft Computing. 2014;14:99-108.
  26. Schaefer G, Nakashima T, Yokota Y, Ishibuchi H, editors. Fuzzy classification of gene expression data. 2007 IEEE International Fuzzy Systems Conference; 2007: IEEE.
  27. Grzes M, Towski MK. DecisionTreeApproach to Microarray
  28. Data Analysis. Biocybernetics and Biomedical Engineering,
  29. ; 27(3): 29-42.
  30. Torgo L. Controlled redundancy in incremental rule learning. 93
  31. Proceeding of the Eurpean Conference on Machine Learning,
  32. ;667:185-95.
  33. Brazdil P, Torgo L. Knowledge acquisition via knowledge integration. Current Trends in Knowledge Acquisition. IOS Press, 1990;8:90.
  34. Bruha I. Quality of decision rules: Definitions and classification schemes for multiple rules. Machine learning and statistics, the interface. New York: John wiley, 1997:107-31.
  35. Kononenko I, Bratko I. Information-based evaluation criterion
  36. for classifier’s performance. Machine Learning. 1991;6(1):67-
  37. Roozbahani Z, Katanforoush A. Classification of Gene Expression Data using Multiple Ranker Evaluators and Neural Network. CICIS’ 12, IASBS, Zanjan, Iran, 2012: 29-31
  38. Onan A. A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Systems with Applications. 2015;42(20):6844-52.
  39. Vinterbo SA, Kim E-Y, Ohno-Machado L. Small, fuzzy and
  40. interpretable gene expression based classifiers. Bioinformatics.
  41. ;21(9):1964-70.
  42. Schaefer G, Nakashima T. Data mining of gene expression data by fuzzy and hybrid fuzzy methods. IEEE transactions on information technology in biomedicine. 2010;14(1):23-9.
  43. Kim Y, Kwon S, Heun Song S. Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data. Computational Statistics & Data Analysis.
  44. ;51(3):1643-55.
  45. Amini M, Rezaeenour J. Early Fraud Detection for Online Auctions Using A Multiple-phased Modeling Method with a Neural Networks Ensemble Classifier, Applied mathematics in Engineering, Management and Technology, 560-567:2014
  46. Rezaeenour J, Eili MY, Roozbahani Z, Ebrahimi M. Prediction of Protein Thermostability by an Efficient Neural Network Approach. Journal of Health Management and Informatics.
  47. ;3(4):102-10.
  48. Zainuddin Z, Ong P. Reliable multiclass cancer classification of microarray gene expression profiles using an improved wavelet neural network. Expert Systems with Applications. 38(11);
  49. : 13711-13722.