Fausto Pucheta Fortin
Data Enthusiast | Data Practitioner
Telecom Customer Churn Prediction
An end to end ML Project
Overview
The project follows a modularized and systematic approach, emphasizing clarity, maintainability, and collaborative development. The iterative nature of the project ensures that it evolves based on insights gained during exploratory data analysis and model training.
Dataset Used
Unfortunately, this dataset is no longer available on Kaggle. You can still download the zip file below:
Dataset Overview:
Columns: 12
Categorical Variables: 'Gender', 'Subscription Type', 'Contract Length'
Continuous Variables: 'CustomerID', 'Age', 'Tenure', 'Usage Frequency', 'Support Calls', 'Payment Delay', 'Total Spend', 'Last Interaction'
Target Variable: Churn
Memory Usage: 50.1+ MB
Outliers: None.
Null Values: 0,00198% evenly distributed across all columns.
Class Imbalance: None.
Proportion of target variable:
44,5% (No churn)
55.5% (Churn)
Key Highlights
Exploratory data analysis to uncover patterns and features influencing customer churn.
Feature engineering and selection to create predictive indicators.
Model training and selection based on Precision Score using Randomized Search CV, Cross-Validation, and Hyperparameter tuning.
Feature importance analysis for further iterations.
Model Characteristics
The model for this project is an XGBoostClassifier, and it was chosen by iterating and experimenting with RandomizedSearchCV, Cross-validation, and Hyperparameter tuning.
It delivers high precision and a strong ROC AUC score. (0.928 for Precision-Recall Curve and 0.920 for ROC Curve)
The top features influencing churn are 'high_support_calls', 'low_spender', and 'high_payment_delay'.
Flask App
By running the app.py script users can interact with the ML algorithm:
Issue & Limitations
While this project serves primarily as an educational endeavor, certain industry-standard practices remain unaddressed.
Notably, the absence of a well-designed Continuous Integration/Continuous Deployment (CI/CD) pipeline using GitHub actions limits exposure to crucial practices for real-world development workflows.
Additionally, a robust deployment strategy for the predictive model is lacking, preventing the transition from development to production. These omissions limit the project's agility, hindering the implementation of real-time updates and model utilization in operational environments.
Addressing these aspects would further align the project with best practices in the data science and machine learning industry.