Decision tree using sklearn
What is a Decision tree algorithm?
The decision tree Algorithm belongs to the family of supervised machine learning algorithms. It can be used for both a classification problem as well as for regression problem.
The goal of this algorithm is to create a model that predicts the value of a target variable, for which the decision tree uses the tree representation to solve the problem in which the leaf node corresponds to a class label and attributes are represented on the internal node of the tree.
We can use Scikit learn for the decision tree which makes it very easy to implement.
CODE:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
data = pd.read_csv("diabetes.csv") #Dataset
data.head()
Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | Pedigree | Age | Outcome | |
---|---|---|---|---|---|---|---|---|---|
6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 | |
1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 | |
8 | 183 | 64 | 0 | 0 | 23.3 | 0.672 | 32 | 1 | |
1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | 0 | |
0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | 1 |
features=['Pregnancies', 'Insulin', 'BMI', 'Age','Glucose','BloodPressure','Pedigree']
X = data[featurs]
y = data.Outcome
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
>>Accuracy: 0.6796536796536796
No comments
If you have any doubts, Please let me know