Jessie Rudd, Technical Business Analyst at PBT Group
Even though data warehouses and data lakes are considered large [data] storage repositories, this is where the similarities stop. While the latter offers significant business opportunities, not many organisations understand how to effectively unlock its potential.
A data lake is unstructured data that comes direct from the source. The data structures and requirements are not defined in any way until the data is needed. By its nature, it is exceptionally agile and provides data scientists with a platform to extract meaningful insights.
In South Africa, the only companies that have been able to benefit (to a limited extent) from data lakes have been those operating in the telecommunications sector. This is purely based on the amount of data they have at their disposal and their budgets to acquire the resources needed to store it.
However, whether they choose to utilise it or not, remains to be seen. After all, South Africa is still quite traditional in how it approaches data management. So, while telecoms operators understand what data lakes are, their willingness to access it effectively is still up for debate.
Complicating matters is that data analysts are organised and prefer to work in a structured environment whereas data lakes require a more scientific approach based on curiosity. The mindset needed to really do the proverbial wading in to a data lake is quite different to how many organisations and analysts view data currently.
It does not help that many organisations are still up in the air about the benefits of big data. There is still an ongoing debate on the merits of structured versus unstructured data and what the practicalities are for the daily running of a business. Unless a company has a dedicated group of people delving into big data (or even a data lake for that matter), there is not enough leeway to really get the benefits associated to it.
Swimming in the lake
A data lake does offer companies a powerful platform to do a lot of things with data, but it does require a leap of faith in how to access it. If you do not know what you are looking for in the data lake, you are never going to find it. Companies have limited budgets. And in a tough economy, they want to remain focused. For smaller businesses, lakes are simply not a viable option even though it is built on the unstructured data they are leveraging.
An option would be to divide the lake into smaller data ponds where each kind of data is pooled together. This means scientists or analysts can go to the right pond looking for specific data. So, even though the data is still a complete mess, at least it will be of the right kind.
Whether the telecoms operators are wiling to experiment with this and play around with data lakes will drive a lot of the growth potential for the immediate future. But it must be focused on getting the basics right and create a platform from there.
ChatGPT and the importance of data quality management Willem Conradie, CTO of PBT Group It has now been almost a year since OpenAI launched ChatGPT to the public. Since then, adoption has been exceptionally rapid. By February 2023, Reuters reported an estimated 100 million active users in January 2023, which makes it the fastest growing […]