Uncovering Diagnostic Patterns: Exploratory Machine Learning Approaches for Medical Condition Classification
Main Article Content
Abstract
Machine learning has huge prospects in disease early detection and enhanced treatment of patients. The present paper presents a predictive model of medical states by use of a Kaggle healthcare dataset of demographic, clinical, and lifestyle variables. The most significant health indicators will be age, glucose level, blood pressure, BMI, oxygen saturation, cholesterol, triglycerides, HbA1c, and behavioral indicators, such as smoking, alcohol use, and physical exercises. Four classification models were taken into account to identify the most effective classification model; these were Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree and Random Forest. The model performance metrics were accuracy, F1-Score and ROC-AUC. The precision obtained with the Logistic Regression was 91.37%, Decision Tree 84.16%, Random Forest 91.63% and KNN 81.53%. ROC-AUC results indicated a high predictive value among all the models and in most circumstances of significance An AUC value above 0.95 in Logistic Regression and KNN. Random Forest also gave values of AUC exceeding 0.95 in all disorders of significance and more challenging to identify multi-class disorders, such as cancer, diabetes and asthma. The findings indicate that the ensemble-based methods can be viable when compared to the traditional classifiers, when dealing with healthcare data, which is both non-linear and high-variance. Overall, the proposed predictive models can have great potential when used as a source of clinical decisions and preventive healthcare.