As more and more companies start to sell how they are using Machine Learning disguised as Artificial Intelligence, it is important that we as consumers need to look under the hood of machine learning and understand a lot better how it works, especially supervised learning that what most business are doing and selling it as a product.
Briefly speaking, supervised learning is a form of machine learning that is allowing us to find the signals among the noises so that we can have some idea about an outcome we are very interested in.
Example: I will like to borrow money from you. The interested outcome for you as a lender is, “You’ll like to have an indication of PROBABILITY, on me returning the money (or principal at least) before considering other terms. You will want some data from me to make an assessment for instance, my family background, where I stay, what is my occupation. From there, you make an assessment on the likelihood of me returning the money.
Supervised Learning is similar but larger scale, imagine a bank that disburse thousands of loans. They can use past data and outcome to learn the probability of each borrower.
Now since there are so many of these loans, naturally the model will be like humans where we will make some right or wrong predictions.
Correct answers will be what we called True Positives and True Negatives. Wrong answers will be False Positives and False Negatives. See diagram below. This diagram is called a Confusion Matrix.
Now in an academic setting True Negatives and True Positives are going to be what most of the focus will be on. However, in a business setting, things are going to be tremendously different.
Let’s continue with the loan example. True negatives does not bring any profit to the banks. It’s just an avoidance of loss rather. False Positives (Predicted Positives, Actual Negatives) means you may unnecessary disturbed a loan holder that you suspected is going to default but in actual fact it plans on paying the loan.
How about False Negatives (Predicted Negatives, Actual Positives)? They are NOT going to pay down the loan but we ignore them because the model says they are going to pay down. Loss, depending on the loan quantum, is made here.
To summarize quickly, in a machine learning and business setting, False Negatives and False Positives are going to cost the business and there is a good chance one of them is going to cost more than the other i.e. either loss from FN >> loss from FP or vice versa. You cannot run away from dealing with FN and FP once you move to a Machine Learning setting.
Thus my question, do you have the necessary process to deal with FN and FP when it happens? While getting the TP to be as high as possible, FN and FP are unavoidable if Machine Learning is adopted. A truly data mature company will have thought about this and set up the necessary process to deal with it, imo.
What are your thoughts on this?
I hope you enjoy my article, do support my newsletter by considering making a “book” donation here. Greatly appreciated! :)
It's always good to refresh these very foundational concepts in probabilities and predictions under uncertainty. Once you understand the principles behind it, you can twist it for innovation. For example, in predicting product affinity / conversion, you can choose to optimise for false positives as these can be your universe expansion opportunities – they are false positives on-you but may not be universally false positives.