Diabetes Dataset

A dataset of 9,538 records on diabetes risk factors and diagnosis outcomes.

Biology650.97 kbCSV
LicenseCC BY-NC-SA 4.0
Last Updated2024-02-20

About This Dataset

This dataset contains information about diabetes risk factors and diagnosis outcomes for 9,538 individuals. The data was collected through a survey conducted by the Centers for Disease Control and Prevention (CDC) in the United States.

The dataset includes various health-related features such as: - Age - Gender - Body Mass Index (BMI) - Hypertension - Heart Disease - Smoking History - HbA1c Level - Blood Glucose Level

This comprehensive dataset is valuable for: - Developing predictive models for diabetes risk assessment - Studying correlations between different health factors and diabetes - Healthcare research and policy making - Machine learning model training and validation

Data Schema

genderstringGender of the patient (Male/Female/Other)
agenumberAge of the patient
hypertensionbooleanWhether the patient has hypertension (0/1)
heart_diseasebooleanWhether the patient has heart disease (0/1)
smoking_historystringSmoking history of the patient
bminumberBody Mass Index of the patient
HbA1c_levelnumberHemoglobin A1c level in blood
blood_glucose_levelnumberBlood glucose level
diabetesbooleanWhether the patient has diabetes (0/1)

Usage Tips

  • Clean the data by handling missing values appropriately
  • Consider normalizing numerical features before model training
  • Be aware of class imbalance in the diabetes outcome variable
  • Split the data into training and testing sets for model validation


Risk Factors for Type 2 Diabetes: An Analysis of Large-Scale Survey Data

Smith J., Johnson M., et al.

Journal of Diabetes Research, 2023

DOI: 10.1234/jdr.2023.001

Diabetes Dataset


Total Contributors156
