Skip to content

aaditiii2717/Ischemic-Heart-Disease-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Ischemic Heart Disease Risk Prediction Logistic Regression on Synthetic Population Data

Project Overview

This project simulates a real-world cardiovascular risk modeling pipeline using synthetic population data. We generate a structured dataset representing demographic and clinical features, then train a Logistic Regression model to predict the probability of Ischemic Heart Disease (IHD).

The goal is to demonstrate:

Synthetic data generation Probabilistic risk modeling Model evaluation using ROC-AUC Model serialization for deployment

Problem Statement

Predict the probability of Ischemic Heart Disease (IHD) based on:

Age Sex BMI Systolic Blood Pressure Diabetes status Smoking status Weekly physical activity This mimics traditional epidemiological risk scoring systems.

Project Architecture Synthetic Population Generator ↓ Feature Engineering ↓ Train/Test Split (75/25) ↓ Logistic Regression Model ↓ ROC-AUC Evaluation ↓ Model Serialization (.pkl)

Dataset Generation

A synthetic dataset of 15,000 individuals is generated using probabilistic modeling.

Feature Distribution Design

Feature Distribution Age Uniform (25–80) Sex Bernoulli (p=0.52 male) BMI Normal(μ=25, σ=4) SBP Normal (μ=128, σ=15) Diabetes Bernoulli (p=0.10) Smoking Bernoulli (p=0.20) Weekly Exercise Exponential (scale=90)

Risk probability is computed using a linear logit model: logit(p)= intercept+∑(Xi​⋅βi​) Final outcome is sampled via Bernoulli distribution.

Model

Algorithm: Logistic Regression Max Iterations: 2000 Train/Test Split: 75% / 25% Evaluation Metric: ROC-AUC Logistic Regression was chosen because: Interpretable coefficients Suitable for binary risk modeling Common in clinical risk prediction

Evaluation

Model performance is measured using: ROC-AUC Score Example Output: AUC: 0.86 Model saved!

AUC provides a threshold-independent measure of classification quality.

Model Persistence

The trained model is serialized using pickle and saved as: model/logreg_model.pkl This allows deployment in a future API or clinical decision support tool.

Project Structure Heart_Disease_Risk/ │ ├── data/ │ └── synthetic_population.csv │ ├── model/ │ └── logreg_model.pkl │ ├── risk_model.py └── README.md

Installation pip install numpy pandas scikit-learn Run the Project python risk_model.py

This will: Generate synthetic dataset Train logistic regression model Print ROC-AUC

Save trained model

Key Learnings

Generating structured synthetic medical datasets Probabilistic modeling using logistic regression Model evaluation with ROC curves Binary classification in healthcare context

Saving trained models for deployment

Real-World Applications

Cardiovascular risk screening Preventive healthcare analytics Insurance risk scoring Clinical decision support systems

Future Improvements

Add feature scaling Implement cross-validation Add SHAP for interpretability Build Streamlit dashboard Deploy as REST API

Cardiovascular Risk Prediction using Logistic Regression Built a synthetic population dataset (15k samples) and trained a logistic regression model to predict ischemic heart disease risk, achieving strong ROC-AUC performance and saving model for deployment.

About

It predicts IHD risk based on user health data and shows how increasing pyhsical activity can reduce that risk.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors