4.1. Exercises
Probabilities and Logistic Regression
We are trying to predict if a job applicant would be hired based on some features contained in their resume.
Below we have the output of .predict_proba() where column 0 shows the probability the model would predict “hired” and column 1 shows the probability the model would predict “not hired”.
array([[0.04971843, 0.95028157],
[0.94173513, 0.05826487],
[0.74133975, 0.25866025],
[0.13024982, 0.86975018],
[0.17126403, 0.82873597]])
Use this output to answer the following questions.
Question 2
['hired', 'hired', 'hired', 'not hired', 'not hired']
True or False: predict_proba
Applying predict_proba
Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.
When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.
Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line won’t execute and you can test your code after each step.
Let’s keep working with the Pokémon dataset. This time let’s do a bit more. Let’s hyperparameter tune our C and see if we can find an example where the model is confident in its prediction.
Tasks:
- Build and fit a pipeline containing the column transformer and a logistic regression model that uses the parameter
class_weight="balanced"andmax_iter=1000(max_iterwill stop a warning from occuring) . Name this pipelinepkm_pipe. - Perform
RandomizedSearchCVusing the parameters specified inparam_grid. Usen_iterequal to 10, 5 cross-validation folds and return the training score. Setrandom_state=2028and set your scoring argument tof1. Name this objectpmk_search. - Fit your
pmk_searchon the training data. - What is the best
Cvalue? Save it in an object namepkm_best_c. - What is the best f1 score? Save it in an object named
pkm_best_score. - Find the predictions of the test set using
predict. Save this in an object namedpredicted_y. - Find the target class probabilities of the test set using
predict_proba. - Save this in an object named
proba_y. - Take the dataframe
lr_probsand sort them in descending order of the model’s confidence in predicting legendary Pokémon. Save this in an object namedlegend_sorted.