Getting the low-down on the data fabric
Julian Thomas, Principal Consultant at PBT Group
Even though the terms data fabric and data mesh share certain similarities and are often used interchangeably, there are significant differences to consider. While the former has certainly become one of the most popular buzz words in the market currently, not many local organisations have implemented the data fabric beyond isolated use cases.
My own view is that the data fabric is strongly woven into a technology-centric approach. It relies on software like virtualisation that helps connect all the different data in the organisation. Furthermore, it uses automation to minimise the involvement of people when it comes to the processing of that data.
The data fabric is used to inspire data democratisation. It uses virtualisation and automation to work with data wherever it is. Artificial intelligence (AI) and machine learning (ML) help in this regard especially when it comes to driving automation and making recommendations to the data consumers. For instance, on a Monday at 11:00 the head of a department expects reports containing certain elements. Business intelligence (BI) or ETL (Extract, Transform, Load) bots analyse these behaviour patterns to present the data to the user. This is the end state made possible through the data fabric.
For its part, the data mesh is what BI pros tend to work towards without even realising it. In many ways, the mesh is the antithesis of the fabric. While both might use similar technologies, the end goals are different. Mesh is more of an organisational approach. So, instead of eliminating people from the process, the mesh leverages them. It speaks to ownership of data throughout the business. Mesh treats data as a product with each team at a company owning and operating the data, up to and including performing BI on the data. This means that the mesh takes the micro-services concept of data ownership to the next level.
The problem with micro-services in the past is that they focused on data at a product development level. While the data might be owned by a specific product, other people in the business must also be able to access it without being constrained by having the same product. The mesh takes the intention of micro-services and applies it to data and considers what teams and departments are doing with that data. This results in an end-to-end data product that is organisationally owned and delivered. BI and the data warehouse become part of the core product with data democratised from Day One, but as part of each core product and not as a separate standalone service.
Same, but different
While the similarities between the data fabric and data mesh are evident, the mesh is more appropriate for corporates as this thinking is entrenched in their core data approach. What makes the mesh so difficult to manage is that data is still treated as a separate process and is very much a human-driven way of going about analysis. With the mesh, a company must virtually change from the ground up to ensure the best possible implementation.
However, with the right technology in place, the data fabric can get going much faster. Of course, the data fabric has its own challenges, these are more technical in nature. Data as a product remains front and centre. This is where the problem with virtualisation and automation comes in. What salespeople promise will happen and how virtualisation and automation technologies are practically implemented are usually two very different things.
Data analytics is expensive and resource intensive. Furthermore, product owners will not allow it on their live systems. This means it is difficult to even get out the gate with virtualisation and automation. And then there is the consideration of how data must be transformed through ETL. Simply put, a company cannot analyse data in raw format. The result is that ETL ends up being performed on the virtualisation and automation platform. It sees data being ‘sucked’ out of the production environment, put into memory, and then being processed in memory. This requires a massive amount of memory to make it work. Now imagine the licensing fees associated with doing it this way.
Virtualisation and automation complexity
The problem with the data fabric is that it is very reliant on virtualisation and automation platforms. These have not delivered on their promise of analysing data at rest or not requiring ETL. It can also often be prohibitively expensive, due to the expensive out of the gate licensing costs, and subsequent related licensing costs due to infrastructure upgrades.
Herein lies the conundrum. Implementing a data fabric is about moving beyond the blue sky promises and realising the practical side of working with virtualisation and automation software. The concept of the data fabric is still very much in its infancy. There are few recognised experts and no well-defined methodology yet in place. The data fabric is still evolving with people deciding what it is and what it is not.