PyCaret

PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights. It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding. PyCaret allows you to go from preparing your data to deploying your model within seconds using your choice of notebook environment.

Install PyCaret

pip install pycaret

setup()

This is the first thing you HAVE to initialize and sets up all of the data transformations that you might use on your models. The only necessary parameters are data and target.

Classification

clf1 = setup(data = train,

target = 'Survived',

numeric_imputation = 'mean',

categorical_features = ['Sex','Embarked'],

ignore_features = ['Name','Ticket','Cabin'],

silent = True)

target: What value are we trying to predict

numeric_imputation: If we're missing numerical values, what do we replace them with

categorical_features: Which features (columns) are categorical

ignore_features: What features would you like to ignore

silent: When true, confirmation of data types is not necessary, preprocessing will be performed automatically

Regression

reg = setup(data = train,

target = 'SalePrice',

numeric_imputation = 'mean',

categorical_features = ['MSZoning','Exterior1st','Exterior2nd','KitchenQual','Functional','SaleType',

'Street','LotShape','LandContour','LotConfig','LandSlope','Neighborhood',

'Condition1','Condition2','BldgType','HouseStyle','RoofStyle','RoofMatl',

'MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond',

'BsmtExposure','BsmtFinType1','BsmtFinType2','Heating','HeatingQC','CentralAir',

'Electrical','GarageType','GarageFinish','GarageQual','GarageCond','PavedDrive',

'SaleCondition'] ,

ignore_features = ['Alley','PoolQC','MiscFeature','Fence','FireplaceQu','Utilities'],

normalize = True,

silent = True)

target: What value are we trying to predict

numeric_imputation: If we're missing numerical values, what do we replace them with

categorical_features: Which features (columns) are categorical

ignore_features: What features would you like to ignore

normalize: Normalizes your numerical values. You can define the normalization method using normalize_method

silent: When true, confirmation of data types is not necessary, preprocessing will be performed automatically

compare_models()

Classification

Sample Output

Accuracy - How many classifications were accurate
**AUC** - The area under the ROC curve which plots True Positive Rates vs False Positive Rates

AUC is scale-invariant because it measures the rankings of predictions, not absolute values
It's classification-threshold-invariant because it measures the quality of predictions irrespective of what classification threshold is used

You don't always want this if the classification-threshold is important

Recall - Also known as the True Positive Rate it is the number of true positives divided by the true positives + false negatives

$TPR = \frac{TP}{TP + FN}$

Precision - The proportion of identified positive cases that were correctly identified

$Precision = \frac{TP}{TP+FP}$

F1 - Harmonic mean of the precision and recall

$2*\frac{precision * recall}{precision + recall}$

Kappa - Compared the Observed Accuracy with the Expected Accuracy (random chance)

$kappa = \frac{observed acc - expected acc}{1 - expectedacc}$

MCC - Produces a high score only if the prediction obtained good results in all four confusion matrix categories

$MCC=\frac{TPTN-FPFN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$

Regression

compare_models()

Example output of Regression Compare Models

MAE - Mean Absolute Error. This does not penalize large errors.
MSE - Mean Squared Error. This is penalizes large errors
RMSE - Root Mean Squared Error. This penalizes large errors.
R2 - Measures the strength of the relationship between the independent variables and the dependent variables by measuring the part of the variance of the dependent variable explained by the independent variable
RMSLE - Root Mean Squared Log Error
MAPE - Mean Absolute Percentage Error

create_model()

Classification

lgbm = create_model('lightgbm')

Regression

lb = create_model('lightgbm') # CatBoost was not available for some reason

tune_model()

Tunes the hyperparameters of a model and scores it using Stratified Cross Validation

Classification

tuned_lightgbm = tune_model(lgbm) # You have to pass in a model object, not the name of a model like he does in his workbook

Regression

tuned_lbb = tune_model(lb)

plot_model()

Classification

Learning Curve

plot_model(estimator = tuned_lightgbm, plot = 'learning')

AUC Curve

plot_model(estimator = tuned_lightgbm, plot = 'auc')

Confusion Matrix

plot_model(estimator = tuned_lightgbm, plot = 'confusion_matrix')

Feature Importance

plot_model(estimator = tuned_lightgbm, plot = 'feature')

Regression

evaluate_model()

Classification

evaluate_model(tuned_lightgbm)

Regression

evaluate_model(tuned_lb)

interpret_model()

The interpret_model() method helps you analyze a model by telling you what's important for the model. It will output SHAP (SHapely Additive exPlanations) values into a graph.

Classification

interpret_model(tuned_lightgbm)

Regression

interpret_model(tuned_lb)

Summary Plot

interpret_model(tuned_lb, plot = 'reason', observation = 10)

predict_model()

Classification

predict_model(tuned_lightgbm, data=test)

Regression

predictions = predict_model(tuned_cb, data = test)

__________________________________________________________________________________

Google Colab LINK

Kaggle LINK

Official Site here

Recent Posts

PyCaret

PyCaret

Install PyCaret

setup()

Classification

Regression

compare_models()

Regression

create_model()

Classification

Regression

tune_model()

Classification

Regression

plot_model()

Classification

Regression

evaluate_model()

Classification

Regression

interpret_model()

Classification

Regression

predict_model()

Classification

No comments

Personal Finance

Popular Posts

Recent Posts

Comments

Blog Archive

Lables

Contact Form

Total Pageviews

Personal Finance

MISC.

Labels