I am often asked the above question. Thought I share it in a newsletter issue on my thoughts and my answer to it.
Asking this question is wrong…why wrong?
Firstly, machine learning only solve one set of business problem. We should start with the business question rather and then determine if the data is suitable and whether machine learning is the right tool to move us towards solution.
Secondly, having data does not mean you have RELEVANT data and you need relevant data to answer the business question at hand. For example, you go to see a doctor, you should tell the doctor what are your symptoms for him/her to make the diagnosis. Symptoms of the patients waiting outside, even if you observed it accurately, have no value in how the doctor make your diagnosis.
Asking the question is equivalent to asking “Got tomato, can I make good tomato soup?” You can see with this example, we are ignoring the factors that might impact the cooking process from tomato to tomato soup. For instance, how should we choose the most suitable tomato for the soup, what are the other ingredients we need to select and prepare to give the tomato soup more flavorful, etc. You’ll will see there is a process here, to convert data to insights that are useful in moving us towards the solution.
In data project, it could mean things like what are the relevant data we can use, what are the features we can engineer, the circumstances surrounding the time period when the data is collected. Moving on to computation/algorithm, is machine learning suitable or perhaps simple rules will achieve a more cost-effective results, how are we going to use the insights, is there any compliance and regulation issues we have to deal with, where we placed the machine learning model so we can have the data all ready before it is fed in for scoring, etc.
Saying that you have the data and then saying if we can do machine learning or not is equivalent to putting the cart in front of the horse rather, not realizing what are we trying to do with the horse and cart together. To me it says a lot on the organization the questioner is coming from.
Conclusion is this, there are many considerations moving from data to solutions actually, and this knowledge base is build along with the maturity level of the organization in using data, and machine learning.
So what is my answer, “Got data, MAY do machine learning rather, but let us focus on the business challenge at hand rather and determine how data can assist.”
I hope you enjoy the newsletter! Appreciate any gestures made to promote it and let my sharing impact those you love! :)
What I have seen, is that people don't have any idea what business questions they want answered. They start from the data because someone told them it was important to use it. But their goals are usually very vague. Along the lines of, "How are things going?"
The most important part of the process in that case is to help them create a list of questions they want answers to. Then you can see which ones might be supported by current data, which ones will need new data, and which ones are just hopeless.
This is so true and something I talk to my customers and team about every day.. start with the business issue or problem your trying to solve for, determine if you have the right data or telemetry and lastly quality of data.. Then you can determine the delta / gap and move forward deterministically..