Bank Marketing Analysis#
by Runtian Li, Rafe Chang, Sid Grover, Anu Banga
Repo Link: UBC-MDS/dsci_522_group_8.git
Summary#
Here we build a model of balanced SVC to try to predict if a new client will subscribe to a term deposit. We tested five different classification models, including dummy classifier, unbalanced/balanced logistic regression, and unbalanced/balanced SVC, and chose the optimal model of balanced SVC based on how the model scored on the test data; the model has the highest test recall score of 0.82, which indicates that the model makes the least false negative predictions among all five models.
The balanced support vector machines model considers 13 different numerical/ categorical features of customers. After hyperparameter optimization, the model’s test accuracy increased from 0.82 to 0.875. The results were somewhat expected, given SVC’s known efficacy in classification tasks, particularly when there’s a clear margin of separation. The high recall score of 0.875 indicates that the model is particularly adept at identifying clients likely to subscribe, which was the primary goal. It’s noteworthy that such a high recall was achieved, as it suggests the model is highly sensitive to true positive cases.
Introduction#
Term deposit is valuable to banks because it ensures a stable stream of income that banks can utilize. Banks usually invest in higher-return financial products or lend money to other customers with a higher interest rate to make a profit. With term deposits, banks can better predict their cash flow.
While banks’s marketing strategies nowadays are usually focused on attracting new customers, the banks must target the right potential customers. This research is aimed at identifying the correct audience for banks to further design marketing strategies [Dooley, 2023].
Background#
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed. The data set used in this project was created by Moro, S. and Rita, P. and Cortez, P [Moro et al., 2012].It was sourced from the UCI Machine Learning Repository [Moro et al., 2012]. We will be using bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs). The raw data file can be found here.
Research Question#
We are working on a binary classification model. The classification goal is to predict if the client will subscribe a term deposit: “yes” for will subscribe and “no” for won’t subscribe.
Analysis#
Data Preprocessing#
Initially, we ensured our data was complete by dealing with missing values and removing unnecessary columns like “contact,” “day,” and “month.” This streamlined the dataset, making it ready for analysis. We used StandardScaler to standardize numerical features such as “age” and “balance” and applied one-hot encoding for categorical attributes, making the data compatible with different machine learning models.
Model Selection and Evaluation#
We used five models for classification, starting with a basic Dummy Classifier. Following models included Logistic Regression and Support Vector Classifier (SVC), each showing strengths in accuracy and recall [Moro et al., 2014]. Notably, our balanced models—Balanced Logistic Regression and Balanced Support Vector Classifier (svc_bal)—performed best, especially in identifying clients likely to subscribe to a term deposit.
Model Comparison#
An extensive evaluation, considering accuracy, precision, recall, and F1 scores, highlighted the Balanced Support Vector Classifier (svc_bal) as the standout performer. This model excelled with high recall, crucial for identifying potential term deposit subscribers in our specific context.
Hyperparameter Optimization#
Optimizing model performance, especially for the Support Vector Classifier (SVC) using a reduced dataset, resulted in a final model with an impressive 86% accuracy and a notable recall of 87%. This optimization strategy enhances efficiency and fine-tunes the model for better results.
Recall - The Preferred Metric#
In our bank marketing dataset, we prioritize recall. Recall indicates the model’s ability to identify true positive cases—clients subscribing to a term deposit. In our context, missing a potential positive case is more significant than false positives, leading to potential losses and missed opportunities. Prioritizing recall ensures a finely tuned model capturing all potential clients interested in term deposits, aligning with our main goal.
Modeling and Results#
Exploratory Data Analysis#
According to the discussion above, we decided to keep the following features as numerical features: “age”, “balance”, “duration”, “campaign”, “pdays”, “previous”. See the distribution as below: Figure 1. [Vajiramedhin and Suebsing, 2014].

Fig. 1 Distribution of all the numerical features after feature selection#
We decided to keep the following features as categorical features: “job”, “marital”, “education”, “default”, “housing”, “loan”, “poutcome”. See the distribution as below: Figure 2

Fig. 2 Distribution of all the categorical features after feature selection#
In the plot below, we explore the spearman correlation between numerical features. Figure 3

Fig. 3 Correlation Matrix of numerical features#
Preprocessing#
Since there is no missing values in our dataset, we don’t need to do imputation or drop NAs.
We are going to drop “contact”, “day” and “month” column here since they are not helping us in identifying useful underlying pattern in the model.
We take “age”, “balance”, “duration”, “campaign”, “pdays”, “previous” as numerical features and we are doing StandardScaler transformation on them.
We take “job”, “marital”, “education”, “default”, “housing”, “loan”, “poutcome” as categorical features and we are doing one hot encoding on them. We dropped columns only if the categorical is binary.
The transformed dataframe after doing StandardScale
on numerical features and OneHotEncoder
on categorical features is shown as below. The number of features after preprocessing is 32.
age | balance | duration | campaign | pdays | previous | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | ... | education_secondary | education_tertiary | education_unknown | default_yes | housing_yes | loan_yes | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.941018 | -0.261980 | 0.584440 | -0.246523 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
1 | -0.175099 | 0.036315 | -0.041580 | -0.574519 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
2 | -1.036757 | -0.480295 | -0.333467 | 1.721452 | -0.413281 | -0.300644 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
3 | 1.452477 | -0.521486 | 2.900331 | -0.246523 | -0.413281 | -0.300644 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
4 | -0.079360 | 2.253443 | 1.821118 | -0.574519 | 0.531537 | 0.205109 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9037 | 0.878038 | 0.779477 | 2.408732 | 0.081473 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
9038 | -0.462319 | -0.263696 | 0.384728 | -0.574519 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
9039 | 0.207860 | 0.776044 | 0.058276 | -0.574519 | -0.413281 | -0.300644 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
9040 | 1.356737 | 2.465236 | -0.222089 | -0.246523 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
9041 | 0.495079 | 0.257719 | -0.659919 | 0.409469 | -0.413281 | -0.300644 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
9042 rows × 32 columns
Model Selection#
scoring | dummy | dummy.1 | logreg | logreg.1 | svc | svc.1 | logreg_bal | logreg_bal.1 | svc_bal | svc_bal.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | mean | std | mean | std | mean | std | mean | std | mean | std |
1 | fit_time | 0.176 | 0.012 | 1.495 | 0.722 | 17.859 | 0.319 | 1.394 | 0.238 | 30.267 | 0.39 |
2 | score_time | 0.136 | 0.019 | 0.176 | 0.044 | 2.164 | 0.111 | 0.156 | 0.009 | 3.583 | 0.072 |
3 | test_accuracy | 0.887 | 0.0 | 0.906 | 0.007 | 0.904 | 0.005 | 0.837 | 0.01 | 0.821 | 0.008 |
4 | train_accuracy | 0.887 | 0.0 | 0.907 | 0.001 | 0.916 | 0.001 | 0.839 | 0.001 | 0.843 | 0.002 |
5 | test_precision | 0.0 | 0.0 | 0.667 | 0.062 | 0.662 | 0.057 | 0.388 | 0.02 | 0.365 | 0.011 |
6 | train_precision | 0.0 | 0.0 | 0.669 | 0.007 | 0.754 | 0.009 | 0.393 | 0.003 | 0.41 | 0.003 |
7 | test_recall | 0.0 | 0.0 | 0.34 | 0.033 | 0.316 | 0.029 | 0.777 | 0.05 | 0.794 | 0.036 |
8 | train_recall | 0.0 | 0.0 | 0.344 | 0.005 | 0.377 | 0.009 | 0.785 | 0.006 | 0.892 | 0.003 |
9 | test_f1 | 0.0 | 0.0 | 0.45 | 0.039 | 0.426 | 0.032 | 0.518 | 0.025 | 0.5 | 0.013 |
10 | train_f1 | 0.0 | 0.0 | 0.454 | 0.006 | 0.503 | 0.009 | 0.523 | 0.003 | 0.562 | 0.003 |
Dummy Classifier
has low accuracy and zero precision, recall, and F1 scores, indicating it never predicts the positive class (in this case the client subscribed a term deposit). This is expected as it always predicts the most frequent class.
logreg
shows improved accuracy over the dummy model. However, its recall is low, suggesting it misses a significant number of true positive cases. svc
performed almost the same as logistic regression model among all metrics.
logreg_bal
and svc_bal
have lower accuracy compared to their unbalanced counterparts but significantly higher recall. This indicates they are better at identifying positive cases but at the cost of making more false positive errors.
Given the context of our bank marketing data set, we aim to detect the clients who will subscribe a term deposit given the features. Missing a potential “yes” could be more costly than false positives, as it represents a lost opportunity for the sales team to transform this potential customer. Therefore, we chose svc_bal
as the model has the highest test_recall
score.
train_accuracy | test_accuracy | train_precision | test_precision | train_recall | test_recall | train_f1 | test_f1 | fit_time | score_time | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.842734 | 0.814261 | 0.409378 | 0.366394 | 0.890196 | 0.786601 | 0.56084 | 0.499926 | 36.729734 | 186.583015 |
Hyperparameter Optimization#
Optimizing hyperparameters in SVC with a smaller sample size of 10,000 instances is a strategy aimed at enhancing computational efficiency. This approach expedites the exploration of hyperparameter possibilities, aiding in the discovery of potential configurations. While the outcomes validate the concept, it’s crucial to recognize and manage the constraints stemming from the smaller dataset size when interpreting the results.
svc__C | svc__gamma | svc__kernel |
---|---|---|
4.331065 | 0.099076 | rbf |
Fig. 4 Best C, gamma and kernel parameters for svc_balanced model#
Random tested the SVC model with C values ranging from 0.1 to 10, gamma values ranging from 0.001 to 0.1, and kernels of rbf, sigmoid, and linear to Random tested the SVC model with C values ranging from 0.1 to 10, gamma values ranging from 0.001 to 0.1, and kernels of rbf, sigmoid, and linear to optimize the model’s performance. With 25 random combinations with 5 folds of cross-validation, the best hyperparameter combination is approximately 4.33 for C, and approximately 0.01 for gamma, with the rbf kernel.
Test results after hyperparameter optimization#
Accuracy | Recall |
---|---|
0.834250 | 0.772312 |
Fig. 5 Accuracy and recall metrics on test data#
After fitting the model with the training data, and optimizing it with the hyperparameters found above, the model is used to score on the test data. The accuracy of the model is 0.86 while the recall ( True Positive / Actual Positive ) is 0.88. With optimization, the model performed well on unseen data.
Discussions#
Key Findings#
In this bank marketing analysis project, we aimed to develop a binary classification model to predict client subscription to term deposits. We tested Logistic Regression and Support Vector Classifier (SVC) models, focusing on recall as a key performance metric. The SVC model outperformed Logistic Regression in recall, and after hyperparameter optimization, it achieved a recall score of 0.875 on the test dataset, which is quite promising!
Reflection on Expectations#
The results were somewhat expected, given SVC’s known efficacy in classification tasks, particularly when there’s a clear margin of separation. The high recall score of 0.875 indicates that the model is particularly adept at identifying clients likely to subscribe, which was the primary goal. It’s noteworthy that such a high recall was achieved, as it suggests the model is highly sensitive to true positive cases.
Impact of Finding#
The high recall score of this model has significant implications for targeted marketing strategies. It suggests that the bank can confidently use the model’s predictions to focus its marketing efforts on clients predicted to subscribe, potentially increasing the efficiency and effectiveness of its campaigns [Moura et al., 2020]. This targeted approach could lead to higher conversion rates with lower marketing expenses. However, it’s important to balance such a high recall with precision to ensure that the bank doesn’t unnecessarily target unlikely prospects.
Future Improvements#
The success of this model leads to several potential areas for further exploration:
Balancing Precision and Recall: Investigating methods to enhance precision without substantially reducing recall.
Feature Analysis: Identifying which features most significantly influence subscription predictions. Model Interpretability: Improving the model’s interpretability to better understand the basis for its predictions.
Temporal Adaptability: Assessing the model’s adaptability to evolving trends and customer behaviors over time.
Testing Alternative Models: Exploring whether ensemble methods or more advanced machine learning algorithms could yield better or comparable results.
Customer Segmentation: Evaluating the model’s performance across different customer segments to tailor more specific marketing strategies.
References#
- Doo23
R. Dooley. What’s wrong with bank marketing? Forbes, 2023. https://www.forbes.com/sites/rogerdooley/2023/04/24/whats-wrong-with-bank-marketing/?sh=49b8c3241fef.
- MCR14
S. Moro, P. Cortez, and P. Rita. A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst., 62:22–31, 2014.
- MRC12(1,2)
S. Moro, P. Rita, and P. Cortez. Bank marketing. UCI Machine Learning Repository, 2012. https://doi.org/10.24432/C5K306.
- MPN+20
A.F. Moura, C.M. Pinho, D.M. Napolitano, F.S. Martins, and J.C. Fornari Junior. Optimization of operational costs of call centers employing classification techniques. Research, Society and Development, 2020.
- VS14
C. Vajiramedhin and A. Suebsing. Feature selection with data balancing for prediction of bank telemarketing. Applied mathematical sciences, 8:5667–5672, 2014.