
A lot of us have no problem completing the following, "Rubbish In Rubbish…". Following the mantra, people will then say, "Data is our Most Important Asset". Ask yourself, how many does practice what they say, that they are establishing processes to manage it?
Data scientists as the biggest users of data, need to have a good understanding of managing our biggest asset. Managing is not just storage and processing alone, there are many more like Data Governance, Data Quality, Data Lineage, setting up the documentation, etc. Data scientists need not be the ones executing all of the mentioned, but they at least should assist and influence decisions on managing data.
Take the opportunity, while working on your project portfolio put some thoughts on the data for instance, think about how to improve the quality of the data, how to ensure the data can be trusted. Do find books or articles to read up more on the topic.
Data scientist, besides having a deep understanding of Machine Learning models, we need to learn how to manage our most important resource, Data. I have written a post that shares examples of why data collection needs to be thought through. Below is the article:
Thanks for taking the time to read until here. If the newsletter has been beneficial, do consider sharing it.
Here are a few relevant posts you might be interested in:
Exploratory Data Analysis (I) - No Visual
Exploratory Data Analysis (II) - Visual
Side Note: I have written how Covid-19 may impact the Data Science and Artificial Intelligence industry. As subscribers to my newsletter, you will have first look before I share on my other platforms. Here is the post. Thank you for your support! :)
Books I am currently reading:
Upheaval: Turning Points for Nations in Crisis
Rethink: The Surprising History of New Ideas
Human Compatible: Artificial Intelligence and the Problem of Control