Predicting Heart Disease using Machine Learning

In this blog I will be providing a brief overview into my approach on how I used a supervised classification machine learning approach to predict whether or not someone in the future will have heart disease.

So let’s get started!

My detailed work is shown in this notebook.

Problem Statement

Given the dataset, can we predict whether or not someone will have heart disease.

I followed a 6 step approach, as shown below:

First, I analyzed the data to understand the various features provided and their relationship with one another. This is called Exploratory Data Analysis (EDA).

Next, I checked if there was any missing data and the different data types of the all the labels. I filled the missing values and converted any non-numeric dtype label to numeric value. The reason being, the RandomForestClassifier model (used in this project) cannot work with non-numeric data.

The model was chosen based on scikit-learn’s model estimator map shown below:

This roadmap helped determine which model to use for solving the problem. Once the modeling was completed, we looked at different hyperparameters. The tuning of the hyperparameters is explained well in this article.

Once hyperparameter tuning was completed, the best parameters were used to create an ideal model. This model was used to predict whether or not someone will have heart disease in the future.

If you would like more details regarding the project or have suggestions on how to improve this project, please reach out!