Skip to article frontmatterSkip to article content

Metrics Evaluation Lab

Metrics Evaluation Lab

Throughout your early career as a Data Scientist you’ve spent most your time cleaning data, but now you are starting to build models and have come to realize the most important part about understanding any machine learning model (or any model, really) is understanding it’s weakness and vulnerabilities.

In doing so you’ve decided to practice on a dataset about mushrooms, because after all if you don’t know how to evaluate a model thoroughly you’ll be in real truffle (ha...ha) and use a approach to which you are familiar, kNN.

Part 1. Using the mushroom dataset, Define a question that can be answered using classification, specifically kNN.

Part 2. Build a kNN model, make sure to calculate the prevalence to provide a reference for some of these measures and to properly clean and prepare the data ahead of building the model. Evaluate the model using using the metrics discussed in class (Accuracy, TPR, FPR, F1, Cross Entropy, and ROC/AUC). What do these metrics tell you about the model?

Part 3. In consideration of all the metrics you just used are there a few that seem more important given the question you are asking? Why?

Part 4. Consider where classification errors are occurring, is there a pattern? If so discuss this pattern and why you think this is the case.

Bonus. Use a metric we did not discuss in class (reference the sklearn model metrics documentation). Once you have the output, summarize in a sentence or two what the metric is and what it means in the context of your question.

Keys to Success: