This project aims to predict the occurrence of diabetes in patients based on various features using the K-Nearest Neighbors (KNN) algorithm from the scikit-learn library.
- Python (version 3.6 or higher)
- pandas
- numpy
- scikit-learn
The dataset used in this project is the diabetes.csv
file, which should be placed in the C:\Users\Admin\OneDrive\Documents\VSCode Practice\Practice\ML\
directory. This file contains various features related to diabetes, including glucose, blood pressure, skin thickness, insulin, BMI, and more.
-
The code starts by importing the necessary libraries: pandas, numpy, and several modules from scikit-learn.
-
The
diabetes.csv
file is read into a pandas DataFrame usingpd.read_csv()
. -
Any zero values in the specified columns (
Glucose
,BloodPressure
,SkinThickness
,Insulin
,BMI
) are replaced with the mean value of the respective column. -
The features (X) and target variable (y) are separated from the DataFrame.
-
The dataset is split into training and testing sets using
train_test_split()
from scikit-learn, with a test size of 20%. -
Feature scaling is performed using
StandardScaler()
from scikit-learn to ensure all features are on the same scale. -
The KNN classifier is instantiated with
n_neighbors=11
(number of neighbors),p=2
(Minkowski distance metric), andmetric="euclidean"
(Euclidean distance). -
The classifier is trained on the training data using
fit()
. -
Predictions are made on the test set using
predict()
. -
The confusion matrix, F1-score, and accuracy score are calculated and printed using functions from
sklearn.metrics
.
- Ensure that you have the required Python libraries installed.
- Place the
diabetes.csv
file in the specified directory (C:\Users\Admin\OneDrive\Documents\VSCode Practice\Practice\ML\
). - Run the code script.
- The output will display the confusion matrix, F1-score, and accuracy score for the predictions made by the KNN classifier.
Contributions to this project are welcome. If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.