Diabetes Dataset
A dataset of 9,538 records on diabetes risk factors and diagnosis outcomes.
About This Dataset
This dataset contains information about diabetes risk factors and diagnosis outcomes for 9,538 individuals. The data was collected through a survey conducted by the Centers for Disease Control and Prevention (CDC) in the United States.
The dataset includes various health-related features such as: - Age - Gender - Body Mass Index (BMI) - Hypertension - Heart Disease - Smoking History - HbA1c Level - Blood Glucose Level
This comprehensive dataset is valuable for: - Developing predictive models for diabetes risk assessment - Studying correlations between different health factors and diabetes - Healthcare research and policy making - Machine learning model training and validation
Data Schema
Column | Type | Description |
---|---|---|
gender | string | Gender of the patient (Male/Female/Other) |
age | number | Age of the patient |
hypertension | boolean | Whether the patient has hypertension (0/1) |
heart_disease | boolean | Whether the patient has heart disease (0/1) |
smoking_history | string | Smoking history of the patient |
bmi | number | Body Mass Index of the patient |
HbA1c_level | number | Hemoglobin A1c level in blood |
blood_glucose_level | number | Blood glucose level |
diabetes | boolean | Whether the patient has diabetes (0/1) |
Usage Tips
- Clean the data by handling missing values appropriately
- Consider normalizing numerical features before model training
- Be aware of class imbalance in the diabetes outcome variable
- Split the data into training and testing sets for model validation
Citations
Risk Factors for Type 2 Diabetes: An Analysis of Large-Scale Survey Data
Smith J., Johnson M., et al.
Journal of Diabetes Research, 2023
DOI: 10.1234/jdr.2023.001
data:image/s3,"s3://crabby-images/7212a/7212ae5398971332aaf68a00e499f34a662f9e05" alt="Diabetes Dataset"