Breast Cancer Dataset from Coimbra: Pre‑Ratings of Its Value to Machine Learning and Diagnosis-Scilight

Digital Technologies Research and Applications

Article

Breast Cancer Dataset from Coimbra: Pre‑Ratings of Its Value to Machine Learning and Diagnosis

Downloads

Chuiko, G., & Honcharov, D. (2025). Breast Cancer Dataset from Coimbra: Pre‑Ratings of Its Value to Machine Learning and Diagnosis. Digital Technologies Research and Applications, 4(2), 182–193. https://doi.org/10.54963/dtra.v4i2.1348

Authors

  • Gennady Chuiko

    Department of Computer Engineering, Petro Mohyla Black Sea National University, 54003 Mykolaiv, Ukraine
  • Denis Honcharov

    Department of Computer Engineering, Petro Mohyla Black Sea National University, 54003 Mykolaiv, Ukraine

Received: 25 June 2025; Revised: 22 July 2025; Accepted: 30 July 2025; Published: 19 August 2025

This study aimed to evaluate a relatively new dataset developed to facilitate the primary diagnosis of breast cancer, collected by the University Hospital Centre of Coimbra in Portugal. Based on these assessments, the authors sought to develop a clear visual classifier to assist medical professionals in prediction and monitoring. This classifier utilizes routine blood test results along with physical data, offering a more straightforward and cost‑effective alternative to traditional mammographic studies. The Coimbra Breast Cancer Dataset (CBCD) includes the following attributes: Age, Body Mass Index (BMI), Glucose, Insulin, Homeostatic Model Assessment for Insulin Resistance (HOMA‑IR), Leptin, Adiponectin, Resistin, and Monocyte Chemoattractant Protein‑1 (MCP1). The visual classifier was designed using Java‑based machine learning algorithms within the Java‑based WEKA software (version 3.9.6). Its well‑designed interface enables clinicians, even those without expertise in machine learning, to use these algorithms effectively. The nine attributes of the CBCD were statistically categorized into three subsets based on their relevance to the overall model. This organization may help reduce the dimensionality of the diagnostic dataset while allowing specific classifiers to exhibit their unique preferences. A properly tuned JRip classifier demonstrated acceptable performance with the entire dataset and was effective in reducing it to six or even four attributes. The primary advantage of this classifier lies in its decision rules, which are easy for medical professionals to interpret and apply.

Keywords:

Breast Cancer Machine Learning Biomarkers Visual Classifying Diagnostics

References

  1. The World Bank. Breast Cancer in Ukraine: The Continuum of Care and Implications for Action; The World Bank: Washington, DC, USA. 2018. DOI: https://doi.org/10.1596/30144
  2. Abdulkareem, A.H.; Kasapbaşı, M.C. Enhancing Detection Method of Breast Cancer Using Coimbra Dataset. İstanbul Ticaret Üniversitesi Teknoloji ve Uygulamalı Bilimler Dergisi 2020, 3, 51–59. Available from: https://dergipark.org.tr/tr/pub/icujtas/issue/57160/824195
  3. Zwitter, M.; Soklic, M.; 1988. Breast Cancer. UCI Machine Learning Repository. Available from: https://archive.ics.uci.edu/dataset/14/breast+cancer
  4. Mohamed, T.S.; Khalifah, S.M. Breast Cancer Prediction: The Classification of Non-Recurrence-Events and Recurrence-Events Using Functions Classifiers. In Proceedings of the 3rd Information Technology to Enhance e-Learning and Other Applications (IT-ELA 2022); Baghdad, Iraq, 27–28 December 2022; pp. 55–60. DOI: https://doi.org/10.1109/IT-ELA57378.2022.10107927
  5. Street, W.; Wolberg, W.; Mangasarian, O. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. 1995. Available from: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic
  6. BreakHis. Breast Cancer Histopathological Database (BreakHis). Available from: https://www.kaggle.com/datasets/ambarish/breakhis
  7. Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; et al. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. DOI: https://doi.org/10.1109/TBME.2015.2496264
  8. Patrcio, M.; Pereira, J.; Crisstomo, J.; et al., 2018. Breast Cancer Coimbra. UCI Machine Learning Repository. Available from: https://archive.ics.uci.edu/dataset/451/breast+cancer+coimbra
  9. Austria, Y.D.; Goh, M.L.; Maria, L.B.S., Jr.; et al. Comparison of Machine Learning Algorithms in Breast Cancer Prediction Using the Coimbra Dataset. Int. J. Simul. Syst. Sci. Technol. 2019, 20, 233–240. DOI: https://doi.org/10.5013/ijssst.a.20.s2.23
  10. Yue, J.; Zhao, N.; Liu, L. Prediction and Monitoring Method for Breast Cancer: A Case Study for Data from the University Hospital Centre of Coimbra. Cancer Manag. Res. 2020, 12, 1887–1893. DOI: https://doi.org/10.2147/CMAR.S242027
  11. Alfian, G.; Syafrudin, M.; Fahrurrozi, I.; et al. Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method. Computers 2022, 11, 1367. DOI: https://doi.org/10.3390/computers11090136
  12. Kayaalp, F.; Basarslan, M.S. Performance Analysis Of Filter Based Feature Selection Methods On Diagnosis Of Breast Cancer And Orthopedics. In Proceedings of the 6th International Congress on Fundamental and Applied Sciences (ICFAS 2019); Tirana, Albania, 18–20 June 2019; pp. 1–11. Available from: https://www.researchgate.net/publication/334401380_Performance_Analysis_Of_Filter_Based_Feature_Selection_Methods_On_Diagnosis_Of_Breast_Cancer_And_Orthopedics#fullTextFileContent
  13. Salad, Z.; Singh, Y. A Five-Year (2015 to 2019) Analysis of Studies Focused on Breast Cancer Prediction Using Machine Learning: A Systematic Review and Bibliometric Analysis. J. Public Health Res. 2020, 9, 65–75. DOI: https://doi.org/10.4081/jphr.2020.1772
  14. Patrício, M.; Pereira, J.; Crisóstomo, J.; et al. Using Resistin, Glucose, Age and BMI to Predict the Presence of Breast Cancer. BMC Cancer 2018, 18, 29. DOI: https://doi.org/10.1186/s12885-017-3877-1
  15. Bouckaert, R.R.; Frank, E.; Kirkby, R.; et al. WEKA Manual for Version 3-9-5. The University of Waikato; 2020. Available from: https://sourceforge.net/projects/weka/
  16. Chuiko, G.P.; Darnapuk, Y.S.; Dvornik, O.V.; et al. Efficacy of Weka for Medical Data Mining: Ambulatory Blood Pressure Monitoring as a Case-Study. Online J. Cardiol. Res. Rep. 2023, 7, 7–9. DOI: https://doi.org/10.33552/ojcrr.2023.07.000661
  17. Srikanth, K.; Zahoor, S.; Huq, U.L.; et al. Analysis, Implementation, and Comparison of Machine Learning Algorithms on Breast Cancer Dataset Using WEKA Tool. Int. J. Recent Technol. Eng. 2019, 7, 330–333. Available from: https://www.researchgate.net/publication/333115193_Analysis_implementation_and_comparison_of_machine_learning_algorithms_on_breast_cancer_dataset_using_WEKA_tool
  18. Sun, J., Ed. Volume 171. Progress in Molecular Biology and Translational Science. The Microbiome in Health and Disease; Academic Press: Cambridge, MA, USA, 2020. p. 397.
  19. Rahman, M.G.; Islam, M.Z.; Bossomaier, T.; et al. CAIRAD: A Co-Appearance Based Analysis for Incorrect Records and Attribute-Values Detection. In Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN); Brisbane, Australia, 10–15 June 2012; pp. 1–10. DOI: https://doi.org/10.1109/IJCNN.2012.6252669
  20. Thornton, C.; Hutter, F.; Hoos, H.H.; et al. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Chicago, IL, USA, 11–14 August 2013; pp. 847–855. DOI: https://doi.org/10.1145/2487575.2487629
  21. Ying, X. An Overview of Overfitting and Its Solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. DOI: https://doi.org/10.1088/1742-6596/1168/2/022022
  22. Cohen, W.W. Fast Effective Rule Induction. In Proceedings of the 12th International Conference on Machine Learning; Tahoe City, CA, USA, 9–12 July 1995; pp. 115–123. Available from: https://dl.acm.org/doi/abs/10.5555/3091622.3091637
  23. Adebayo, O.J.; Omotayo, O.A.; Olaleye, I. Enhanced Breast Cancer Prediction Using ADASYN and Optimized LightGBM. FECCUPIT Bull. 2024, 2, 11–20. Available from: https://bulletin.feccupit.ro/archive/pdf/20240202.pdf