GitHub - aaditiii2717/Ischemic-Heart-Disease-prediction: It predicts IHD risk based on user health data and shows how increasing pyhsical activity can reduce that risk.

Ischemic Heart Disease Risk Prediction Logistic Regression on Synthetic Population Data

Project Overview

This project simulates a real-world cardiovascular risk modeling pipeline using synthetic population data. We generate a structured dataset representing demographic and clinical features, then train a Logistic Regression model to predict the probability of Ischemic Heart Disease (IHD).

The goal is to demonstrate:

Synthetic data generation Probabilistic risk modeling Model evaluation using ROC-AUC Model serialization for deployment

Problem Statement

Predict the probability of Ischemic Heart Disease (IHD) based on:

Age Sex BMI Systolic Blood Pressure Diabetes status Smoking status Weekly physical activity This mimics traditional epidemiological risk scoring systems.

Project Architecture Synthetic Population Generator ↓ Feature Engineering ↓ Train/Test Split (75/25) ↓ Logistic Regression Model ↓ ROC-AUC Evaluation ↓ Model Serialization (.pkl)

Dataset Generation

A synthetic dataset of 15,000 individuals is generated using probabilistic modeling.

Feature Distribution Design

Feature Distribution Age Uniform (25–80) Sex Bernoulli (p=0.52 male) BMI Normal(μ=25, σ=4) SBP Normal (μ=128, σ=15) Diabetes Bernoulli (p=0.10) Smoking Bernoulli (p=0.20) Weekly Exercise Exponential (scale=90)

Risk probability is computed using a linear logit model: logit(p)= intercept+∑(Xi⋅βi) Final outcome is sampled via Bernoulli distribution.

Model

Algorithm: Logistic Regression Max Iterations: 2000 Train/Test Split: 75% / 25% Evaluation Metric: ROC-AUC Logistic Regression was chosen because: Interpretable coefficients Suitable for binary risk modeling Common in clinical risk prediction

Evaluation

Model performance is measured using: ROC-AUC Score Example Output: AUC: 0.86 Model saved!

AUC provides a threshold-independent measure of classification quality.

Model Persistence

The trained model is serialized using pickle and saved as: model/logreg_model.pkl This allows deployment in a future API or clinical decision support tool.

Project Structure Heart_Disease_Risk/ │ ├── data/ │ └── synthetic_population.csv │ ├── model/ │ └── logreg_model.pkl │ ├── risk_model.py └── README.md

Installation pip install numpy pandas scikit-learn Run the Project python risk_model.py

This will: Generate synthetic dataset Train logistic regression model Print ROC-AUC

Save trained model

Key Learnings

Generating structured synthetic medical datasets Probabilistic modeling using logistic regression Model evaluation with ROC curves Binary classification in healthcare context

Saving trained models for deployment

Real-World Applications

Cardiovascular risk screening Preventive healthcare analytics Insurance risk scoring Clinical decision support systems

Future Improvements

Add feature scaling Implement cross-validation Add SHAP for interpretability Build Streamlit dashboard Deploy as REST API

Cardiovascular Risk Prediction using Logistic Regression Built a synthetic population dataset (15k samples) and trained a logistic regression model to predict ischemic heart disease risk, achieving strong ROC-AUC performance and saving model for deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
health cure		health cure
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages