New Horizons in Diabetes Prediction: Comparative Machine Learning Models Using Orange Data Mining

Diabetes Prediction and Machine Learning

Authors

DOI:

https://doi.org/10.5281/zenodo.17643641

Keywords:

Diabetes Mellitus, Risk Assessment, Machine Learning, Data Mining, Artificial Intelligence

Abstract

Background: Diabetes mellitus remains a growing global health concern. Early prediction based on clinical and metabolic parameters may improve prevention and management strategies. This study aims to compare the performance of different supervised machine learning models for diabetes prediction using the Pima Diabetes dataset, implemented through the Orange Data Mining platform, a no-code visual analytics environment.
Methods: The Pima Indians Diabetes Dataset was originally developed by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) in the United States. It includes data collected from female patients of Pima Indian heritage, aged 21 years or older, living near Phoenix, Arizona. The Pima Diabetes dataset was analyzed in Orange, involving data preprocessing (missing value imputation, normalization), stratified train/test splitting, and model training through cross-validation. Supervised learning algorithms—including Logistic Regression, Neural Network, Random Forest, Naïve Bayes, k-Nearest Neighbors, and AdaBoost were compared. Model evaluation was based on ROC-AUC as the primary metric, along with PR-AUC, F1-score, sensitivity, specificity, and calibration metrics (Brier score and reliability plots).
Results: Among the six supervised models tested, Logistic Regression and Neural Network achieved the best overall performance with AUC values of 0.835 and 0.816, respectively. Both models showed balanced accuracy and good calibration, while AdaBoost performed the weakest (AUC = 0.655). The Calibration Plot confirmed that Logistic Regression provided the most reliable probability estimates, consistent with its lower Brier score.
Conclusions: Orange Data Mining enabled an easy and reproducible comparison of supervised learning algorithms for diabetes prediction. Logistic Regression and Neural Network models showed the most reliable and well-calibrated performance, indicating that accurate prediction can be achieved even in a no-code visual environment.

References

American Diabetes Association. Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 2014;37(suppl 1):S81-S90.

International Diabetes Federation. IDF Diabetes Atlas. 10th ed. Brussels, Belgium: International Diabetes Federation; 2021.

World Health Organization. Global Report on Diabetes. Geneva, Switzerland: World Health Organization; 2023.

Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317-1318.

Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proc Annu Symp Comput Appl Med Care. 1988:261-265

Dua D, Graff C. UCI Machine Learning Repository: Pima Indians Diabetes Dataset. University of California, Irvine; 2019.

Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Stajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B. Orange: Data Mining Toolbox in Python. J Mach Learn Res. 2013;14:2349-2353.

Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Stajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B. Orange: Data Mining Toolbox in Python. J Mach Learn Res. 2013;14:2349-2353.

Downloads

Published

2025-11-19

How to Cite

Gören, Y. (2025). New Horizons in Diabetes Prediction: Comparative Machine Learning Models Using Orange Data Mining: Diabetes Prediction and Machine Learning. Avicenna Anatolian Journal of Medicine, 2(2), 36–41. https://doi.org/10.5281/zenodo.17643641

Issue

Section

Original Article

Categories

Similar Articles

You may also start an advanced similarity search for this article.