Data Quality Assessment and Recommendations to Improve the Quality of Hemodialysis Database

Neda Firouraghi, Shahrokh Ezzatzadegan Jahromi, Ashkan Sami, Mohamad Reza Morvaridi, Roxana Sharifian


Introduction: Since clinical data contain abnormalities, quality assessment and reporting of data errors are necessary. Data quality analysis consists of developing strategies, making recommendations to avoid future errors and improving the quality of data entry by identifying error types and their causes. Therefore, this approach can be extremely useful to improve the quality of the databases. The aim of this study was to analyze hemodialysis (HD) patients’ data in order to improve the quality of data entry and avoid future errors.

Method: The study was done on Shiraz University of Medical Sciences HD database in 2015. The database consists of 2367 patients who had at least 12 months follow up (22.34±11.52 months) in 2012-2014. Duplicated data were removed; outliers were detected based on statistical methods, expert opinion and the relationship between variables; then, the missing values were handled in 72 variables by using IBM SPSS Statistics 22 in order to improve the quality of the database. According to the results, some recommendations were given to improve the data entry process.

Results: The variables had outliers in the range of 0-9.28 percent. Seven variables had missing values over 20 percent and in the others they were between 0 and 19.73 percent. The majority of missing values belong to serum alkaline phosphatase, uric acid, high and low density lipoprotein, total iron binding capacity, hepatitis B surface antibody titer, and parathyroid hormone. The variables with displacement (the values of two or more variables were recorded in the wrong attribute) were weight, serum creatinine, blood urea nitrogen, systolic and diastolic blood pressure. These variables may lead to decreased data quality.

Conclusion: According to the results and expert opinion, applying some data entry principles, such as defining ranges of values, using the relationship between hemodialysis features, developing alert systems about empty or duplicated data and entering directly HD data or lab results into the database can improve the data quality drastically. Experts' opinion in detecting outliers as a complement to statistical methods can have an effective role in detection of real outliers. For the analysis of HD databases, the relationship between the variables because of their effect on the quality should be focused more to improve the quality of the database.

Keywords: Database, Data entry, Hemodialysis, Data Quality, Outliers, Missing values

Full Text:



Srinivas K, Rani BK, Govrdhan A. Applications of data mining techniques in healthcare and prediction of heart attacks. International Journal on Computer Science and Engineering (IJCSE). 2010;2(02):250-5.

Martin GS. The essential nature of healthcare databases in critical care medicine. Critical Care. 2008;12(5):176.

Shortliffe EH, Cimino JJ. Biomedical informatics: computer applications in health care and biomedicine: Springer Science & Business Media; 2013.

Yeh J-Y, Wu T-H, Tsao C-W. Using data mining techniques to predict hospitalization of hemodialysis patients. Decision Support Systems. 2011;50(2):439-48.

Goldman L, Schafer AI. Goldman's Cecil Medicine E-Book: Elsevier Health Sciences; 2011.

Titapiccolo JI, Ferrario M, Cerutti S, Barbieri C, Mari F, Gatti E, et al. Artificial intelligence models to stratify cardiovascular risk in incident hemodialysis patients. Expert Systems with Applications. 2013;40(11):4679-86.

Arts DG, De Keizer NF, Scheffer G-J. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. Journal of the American Medical Informatics Association. 2002;9(6):600-11.

Titapiccolo JI, Ferrario M, Cerutti S, Signorini MG, Barbieri C, Mari F, et al., editors. Mining medical data to develop clinical decision making tools in hemodialysis. Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on; 2012: IEEE.

Razavi A, Gill H, Åhlfeldt H, Shahsavar N. A data pre-processing method to increase efficiency and accuracy in data mining. Artificial Intelligence in Medicine. 2005:434-43.

Kantardzic M. Data mining: concepts, models, methods, and algorithms: John Wiley & Sons; 2011.

Somasundaram R, Nedunchezhian R. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. International Journal of Computer Applications, Vol21. 2011;21(10).

Rhee CM, Ravel VA, Ayus JC, Sim JJ, Streja E, Mehrotra R, et al. Pre-dialysis serum sodium and mortality in a national incident hemodialysis cohort. Nephrology Dialysis Transplantation. 2015;31(6):992-1001.

Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS medicine. 2005;2(10):e267.

Wagner MM, Hogan WR. The accuracy of medication data in an outpatient electronic medical record. Journal of the American Medical Informatics Association. 1996;3(3):234-44.

Floege J, Gillespie IA, Kronenberg F, Anker SD, Gioni I, Richards S, et al. Development and validation of a predictive mortality risk score from a European hemodialysis cohort. Kidney international. 2015;87(5):996-1008.

Hasan S, Padman R, editors. Analyzing the effect of data quality on the accuracy of clinical decision support systems: a computer simulation approach. AMIA annual symposium proceedings; 2006: American Medical Informatics Association.

Han J, Pei J, Kamber M. Data mining: concepts and techniques: Elsevier; 2011.

Cody R, Johnson R. Data cleaning 101. Robert Wood Johnson Medical School, Piscataway, NJ. 2008.

Berner ES, Kasiraman RK, Yu F, Ray MN, Houston TK, editors. Data quality in the outpatient setting: impact on clinical decision support systems. AMIA Annual Symposium Proceedings; 2005: American Medical Informatics Association.

Wagner M, Ansell D, Kent DM, Griffith JL, Naimark D, Wanner C, et al. Predicting mortality in incident dialysis patients: an analysis of the United Kingdom Renal Registry. American Journal of Kidney Diseases. 2011;57(6):894-902.

van Diepen M, Schroijen MA, Dekkers OM, Rotmans JI, Krediet RT, Boeschoten EW, et al. Predicting mortality in patients with diabetes starting dialysis. PloS one. 2014;9(3):e89744.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

pISSN: 2322-1097        eISSN: 2423-5857