For the best viewing experience, please open this page on a desktop or laptop.
Deep dive into the data science methodology, feature engineering, and machine learning models
5,630 customer records with 20 features from e-commerce platform
Statistical analysis, outlier detection, missing value handling
Created tenure buckets, interaction terms, behavioral scores
Decision Tree, Logistic Regression, hyperparameter tuning
While the model achieves 89.3% overall accuracy, the 52% churner recall means we successfully identify about half of all customers who will churn. In business terms, this translates to catching approximately 490 out of 947 churners, allowing for proactive intervention. The high non-churner recall (96%) minimizes false alarms, ensuring retention efforts are efficiently targeted.
Understanding which customer attributes drive churn predictions
Key implementation code from the analysis notebook
# Import libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
# Prepare features and target
X = df_cleaned.drop(['Churn', 'CustomerID'], axis=1)
y = df_cleaned['Churn']
# Split data with stratification
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Initialize and train Decision Tree
dt_model = DecisionTreeClassifier(
max_depth=8,
min_samples_split=50,
class_weight='balanced',
random_state=42
)
# Train model
dt_model.fit(X_train, y_train)
# Make predictions
y_pred = dt_model.predict(X_test)
y_pred_proba = dt_model.predict_proba(X_test)[:, 1]
# Evaluate performance
accuracy = dt_model.score(X_test, y_test)
auc_score = roc_auc_score(y_test, y_pred_proba)
print(f"Accuracy: {accuracy:.3f}")
print(f"AUC-ROC: {auc_score:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Feature importance
feature_importance = pd.DataFrame({
'feature': X.columns,
'importance': dt_model.feature_importances_
}).sort_values('importance', ascending=False)
print("\nTop 5 Features:")
print(feature_importance.head())