By Nickie Viljoen, PBT Group Head of Claims
Migrating data from one system to another, whether local migration, on premises to cloud, or between cloud-hosted systems, is often the most underestimated, yet complex part of a new system implementation.
It is frequently left too late and seen as an afterthought – just part of the implementation phase.
However, an organisation’s ability to attain its strategic goals is largely determined by the quality of the data it collects, stores, manipulates, manages, reports from and analyses daily.
Data is so integral to an organisation’s operations that most can’t continue to exist without access to quality, timely data. We live in an age of data. Where oil and energy companies once dominated the most valuable companies list, it is now data companies – like Apple, Amazon Web Services and Microsoft – because of the importance of data.
Data is so valuable it should be treated as the top priority during system transformations and when new systems or products are being rolled out to the market.
The consequences of not doing so are dire.
Hidden costs
It is difficult to accurately pinpoint how many system transformation and implementation projects fall over or have cost blowouts due to data migration failure, as the problem is usually hidden behind other project issues.
Data migration fails lead to:
- Direct impacts through costly delays in project implementation due to failed or late migration pushing back go-live dates, which severely affects the budget and schedule, causing a domino effect that hurts the project and the entire organisation.
- Business impacts through having missing, incorrectly interpreted or wrongly converted data, creating customer-facing PR issues.
- In regulated industries, like health, insurance and finance, if regulatory requirements are not met, organisations risk losing accreditation.
- Bad customer experiences. If the underlying data is incorrect, the new system is useless.
- Flawed decision making through unavailability of historical data. Providing incomplete or incorrect data to decision makers can jeopardise clients’ health and wealth.
- If a new system does not behave as expected or reflect the correct data, it will be deemed untrustworthy and users will lose confidence.
Priority decision
One of the most important, high-priority decisions on day one of any new system or application implementation is how the business wants to transition, migrate and convert data to the new system.
The decision should address risks, and support the organisation’s data-quality goals, functional operations and data management principles. Getting this wrong can cause project failure and risk reputational damage for the entire organisation – even more so if clinical data are at stake, which can lead to the wrong diagnoses or treatment.
Transition, data migration and conversion to a new system can be accomplish in one of four combinations, each with benefits and limitations. The selected option’s cost must be weighed against its risks. Consider the business appetite for risk, cost, and time to realise full benefits before a final decision is made on the technology solution. The solution must support the migration and data strategy and ensure business continuity.
The business will not achieve its goals if the new system is incapable of digesting the business data and presenting it in an understandable format and a timely manner with minimal or no business outage. Address data quality before and during the data migration and transition process to prevent old problems from legacy source systems being propagated and infecting the new system.
This illustration shows the options:
Start the data migration strategy early to avoid common pitfalls, as existing legacy systems are typically under-documented and poorly understood.
Consider:
- Are there sufficient experts who deeply understand the data to enable timelines to be met?
- Historical data can add complexity. What is the tipping point where the cost of the additional complexity does not match the benefit?
- What is the “best” source of the data? Data is often replicated across multiple databases and there are usually differing opinions on the “best” source and quality of the data.
Balanced outcome
When designing the end-to-end migration solution, engage a specialist early who is familiar with the challenges and pitfalls of working with unreliable data, and can assess the most appropriate approach to achieve a balanced outcome so the business doesn’t over or under invest.
Data migration is more than just mapping source to target. Previous practical experience is a tremendous asset in knowing where to apply focus to the ‘right thing’ at the ‘right time’ and must consider the full migration lifecycle from strategy creation to solution design & architecture, implementation design, solution development, quality assurance, cleansing and reconciliation.
Establish a framework that can add rigour and structure to the migration and conversion process and provide:
- Consistency across the migration and mapping.
- Mapping that is readily understood and easy to maintain and expand.
- Optimised performance of the migration steps.
- Reliability.
- Exceptions management, enabling processes to be re-started and errors resolved.
The migration framework must be backed by processes that support:
- An extract, transform, cleanse and load process that will identify failed and successful records.
- An end-to-end reconciliation process.
- Support for the integrity of the end-to-end business record.
- Source data capable of linking to target data values via unique identifiers.
- An automated, repeatable volume-testing capability throughout the entire process.
- The ability to supply clean, optimised data before loading to target systems in a staging area.
- A reversible migration process.
- A migration process that enables repair of source data where needed.
- Robust reconciliation processes between the different migration stages and end to end.
- The ability to test with simulated and production data.
- Robust checking, validation and quality improvement.
The framework needs to de-construct the migration and conversion process into several high-level processes, each with well-defined objectives. The process needs to take the data through a structured journey with major checkpoints along the way. Performance validation is critical and must be done as early as possible to ensure data migration and conversion fit into the required migration window and business outage.
Data migration and conversion is not a one size fits all. Consider these elements:
Customers: Should never be adversely impacted by an oversight in a data conversion. It’s crucial to know the business context of your data, how much history to retain, and the most reliable source of data. The needs analysis and pragmatic decision making is done in collaboration between the business and its internal or external IT experts.
Distributed data models: When data is being migrated in distributed data environments, sometimes over prolonged periods, ‘self-correcting’ data conversions ensure target and source systems are always in sync.
Phasing: Best practice for data migration is to distribute the effort and load across phases.
Data migration case study – OpenEyes
The business challenge
The challenge was to develop a reusable solution capable of migrating ophthalmology electronic health records from legacy applications to OpenEyes, an open-source electronic patient record (EPR) system for ophthalmology. The electronic health records stored on a Microsoft SQL server source database needed migrating to an open-source relational database.
Clinical data stored on both the source and target systems was extensive, including comprehensive testing results and analysis. Being health related, data accuracy was essential and PBT Group’s client could not risk any inaccurate or incomplete transformation of patient data between the systems.
The underlying source structures posed further complexities because, due to differences in the data structures, it was impossible to do direct mapping between the source and the target environments. The translation of legacy medical codes to industry standard codes posed additional difficulties in dealing with the EPR.
Due to the nature of the systems and the underlying risk posed when deploying a new operational EPR system, the migration solution had to be capable of migrating EPR records in phases rather than a big bang, one-off approach.
The solution
PBT developed an automated migration utility capable of extracting health records from the source system translating a complex, fractured data structure to a format that was human readable and understandable. From the staging area, EPR records were translated using an updateable, configurable translation function that could be easily enhanced and expanded, depending on site requirements.
The Pentaho data integration solution developed provided extract, transform, and load capabilities to facilitate moving, cleansing, translating, and loading EPR using a uniform, consistent format.
Where EPR detail couldn’t be migrated as structured data, summary reports from the source system were loaded as PDF documents for reference. A bespoke utility was developed to reuse the underlying system logic to produce summary reports like what was available in the legacy application.
The value proposition
PBT migrated the data in a phased approach. The data was verified by clinical experts to ensure its accuracy and completeness. As part of the initial analysis phase, a full data model and entity relationship diagram was created to understand source and target data structures. The completed solution is portable and can be easily adapted to migrate different components of the source system.
The phased solution supported taking data from alternative sources, enabling migration between different databases and applications to the target.
Loading such huge data sets took full advantage of cloud, clustered and massively parallel processing to ensure minimal outage, because the EPR is critical in supporting the business operation.
Data cleansing with steps ranging from simple to complex transformations enabled EPR migration with guaranteed data integrity and validity.
Data migration case study – large specialist insurer
The business challenge
The challenge was to move all relevant data from the insurer’s mainframe (IDMS database) into a new application on an Oracle database, while archiving all the data into a standalone Oracle relational database for future use to enable decommissioning of the mainframe.
The solution
PBT developed an automated metadata-driven utility that extracted the raw IDMS data and applied transformations to convert each source record into a format ready to be loaded into the target database. Converting the data from IDMS to RDBMS was designed using IDMS data dictionaries provided by the supplier and reverse engineering the defined key attributes into a relational data dictionary with primary and foreign key attributes. The relational data dictionary was then used as an input into generating a staging schema.
Extensive data cleansing was performed before loading into the new application, error handling was implemented during the load, and a reconciliation procedure run to determine and explain any data misalignment. A secondary pass of the data loaded into the staging schema facilitated completing the data migration into the target schema.
As part of the mainframe decommissioning, PBT project managed development of a suite of operational reports replicating the functionality of reports decommissioned with the mainframe in new platforms. That enabled the insurer to access all the mainframe data directly from their own repositories to support FOI requests and future auditing and reporting.
The value proposition
The conversion from IDMS to RDBMS was developed and verified with the insurer’s team to ensure no data was lost and quality was maintained.
PBT leveraged the collateral developed during the data migration phase to archive all the mainframe data into a pre-provisioned Oracle archive database to capture the point-in-time data state of the mainframe on the day of the cut over.
That resulted in a complete field specification of the existing legacy data which was valuable to the insurer, because it could extract legacy data reports from the converted IDMS data without requiring the supplier to extract the report data for it.
PBT Group used the data dictionary in every step of the migration to generate extract, load and test scripts and data quality reports. A by-product of using the data dictionary was generating complete data model diagrams for both the legacy and target systems.
Using the data dictionary-driven generator, PBT eliminated a lot of error-prone manual copying and pasting, and improved productivity considerably by automating generation of scripts used in data migration. The dictionary was also used to generate test cases and test scripts. Using the dictionary saved the insurer many weeks and dollars of development and testing time, and greatly reduced the consulting fees that would have been paid to the supplier.