"Let's Build Intelligence Together"

Share this post

Data Munging vs Machine Learning

koopingshung.substack.com

Data Munging vs Machine Learning

Putting the Cart Before the Horse

Koo Ping Shung
Jun 24, 2020
3
2
Share
Share this post

Data Munging vs Machine Learning

koopingshung.substack.com

I came across a LinkedIn post where someone was asking the question, which is more important to learn, “Data Munging or Machine Learning?”

I like to take a step back and say, "Both are important and yes, data munging is more important than Machine Learning, even though most boot-camps that I came across do not cover the topic much. But…”

Actually a lot of courses that you attended have skipped a lot of steps in between. They covered a pre-defined business problem at the start and then showed you the Machine Learning models to use for the problem.

In the harsh business reality, there is NO pre-defined business problem and there is no such thing as “clean” data, at least not like what you see in the boot-camps. The data scientist will need to define the business problem to solve first, where the potential business value from the project is determined, then translate it into a machine learning problem, while checking the suitability of data.

Once the fit between business problems and data is good, and stakeholders (usually business users) are onboard, then we move to Exploratory Data Analysis and Data Cleaning. Now that is the real world, well part of it at least.

As the data scientist, your most important task is to determine the business problem to solve, that is where your value is! Determining the business problem sets the tone for the rest of the project. Data Munging and Machine Learning are just part of your toolkit. Do remember that! :)

Here are a few posts that may be useful.

Data Cleaning for Data Scientist

Exploratory Data Analysis (Non-Visual)

Exploratory Data Analysis (Visual)

Have fun in your data science journey! All the best! :)

Podcast: As many of you may know, I have started my podcasting channel “Symbolic Connection” with a friend of mine, Thu Ya Kyaw. Since the last newsletter, we have published two more episodes with my co-host as guests and my good friend from Bangkok, Charin Polpanumas, Lead Data Scientist with Central Group. :)

Books:

Currently, I am reading the following books:

1) Invisible Women: Data Bias in a World Designed for Men

2) The Robots are Coming: The Future of Jobs in the Age of Automation

3) Seeing What Others Don’t: The Remarkable Ways We Gain Insights

Feedback is most welcomed! Please send them through my LinkedIn or Twitter. Consider sharing the newsletter if you found it to be useful. Just hit that “Share” button! :)

3
2
Share
Share this post

Data Munging vs Machine Learning

koopingshung.substack.com
2 Comments
Anju Rajbangshi
Jun 25, 2020

Great article Koo Ping Shung.I have been looking for something of this sort. Thank you so much for writing this

Expand full comment
Reply
1 reply by Koo Ping Shung
1 more comment…
Top
New
Community

No posts

Ready for more?

© 2023 Koo Ping Shung
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing